PHP: Find all images in a HTML string.

This is a guide on how to find and extract all image elements from a string containing HTML. In this tutorial, we will fetch the HTML content of an external web page before extracting all of the images from it. Essentially, we will be scraping the web page for images.

Take a look at the following example:

In the example above, we scraped Wikipedia’s homepage using the file_get_contents function. This function returns the HTML of the page in a string format, which we can then load into the DOMDocument object.

The DOMDocument object allows us to find all img tags without having to resort to using regular expressions. By using the getElementsByTagName function, we can tell it to return a DOMNodeList of the elements that we want.

In the case above, we told the DomDocument object to return all the img tags in the HTML source that we fetched with file_get_contents. We then looped through those img tags while fetching their src, title and alt attributes.

If you run the code above, you should get an output that is similar to this:

Note that in some cases:

  • The title tag will not exist. In that case, a blank string will be returned.
  • The alt tag will also be blank.
  • The URL found in the src attribute of the img tag may be relative. i.e. It might not include the domain name or the HTTP protocol. In those cases, you will have to “fix” the links yourself.

Related: Scrape links with PHP.

Facebook Comments