Download images from any website using PHP

Web scraping or web harvesting is a technique to extract large amounts of data from websites whereby the data is extracted and saved to your computer or server.

Data displayed by any websites can be viewed using a web browser. They may not offer the functionality to download a file for personal use. You have to copy and paste the data which is time consuming and boring stuff. By using web Scraping technique of automation, this process instead of manually copying the data from websites, the Web Scraping software will perform the same task within a fraction of the time.

In this tutorial, we are going to use PHP to download image from any website and store into our local server. Web Scraping is using different languages like Python, PHP, Javascript or Ruby.

So, let's start tutorial by creating file ImageDownload.php and create a class in it.

<?php

class ImageDownload
{
    /**
     * file directory path
     *
     * @return void
     */
    public $folder = 'images';

    /**
     * website link
     *
     * @return void
     */
    public $websitelink;

    /**
     * Create a new class instance.
     *
     * @return void
     */
    function __construct($websitelink)
    {        
        if (!file_exists($this->folder)) {
            mkdir($this->folder, 0777, true);
        }
        $this->websitelink = $websitelink;
    }

    /**
     * save file.
     *
     * @return void
     */
    public function getLinks()
    {
        $html = file_get_contents($this->websitelink);

        preg_match_all("{<img\\s*(.*?)src=('.*?'|\".*?\"|[^\\s]+)(.*?)\\s*/?>}ims", $html, $image_urls, PREG_SET_ORDER);

        return $image_urls;
    }

    /**
     * save file.
     *
     * @return void
     */
    public function saveImage($images)
    {
        foreach ($images as $val) {
            $pos = strpos($val[2],"/");
            
            $link = substr($val[2],1,-1);
            
            if($pos == 1) {
                $site = parse_url($this->$websitelink);

                $image_url = $site['scheme'].'://'.$site['host'].$link;


            } else {
                $image_url = $link;
            }
            $image_name = pathinfo($image_url)['basename'];
            
            copy($image_url, $this->folder.'/'.$image_name);
        }
    }
}

In the above class, getLinks() function returns all image links from any webpage and saveImage() method will save all links one by one with copy() function.

Now we need to create second file which will create object of this class and call methods. Create a file index.php file and include the above class file.

<?php

include "ImageDownload.php";

$website_link = 'https://hackthestuff.com/article/';

$downloader = new ImageDownload($website_link);

$images = $downloader->getLinks();

$downloader->saveImage($images);

Now run PHP server with command php -S 0.0.0.0:8000 and run http://localhost:8000 in your browser.

I hope you liked this article and will help you.

Tags:

Was this article helpful?

0 out of 0 person found this article helpful.

Leave a comment

Or

No Comment