Using Python for downloading images

Downloading images with Python has plenty of applications, including data acquisition for scientific analysis and training machine learning models like image recognition and object detection.

This comprehensive guide explores various methods for downloading and saving images from URLs using Python libraries like shutil, OpenCV, matplotlib, and PIL. You’ll also learn how asynchronous image downloads save time and optimize performance when handling thousands of images.

What you'll need to start saving images with Python

Before you start, make sure you have all the following requirements worked out:

Python installed: Download the latest version from the official Python website. For this blog post, we’re using Python 3.12.2.
Code editor: Choose a code editor, such as Visual Studio Code, PyCharm, or Jupyter Notebook.

Python scraping templates

Try our code templates

Save an image with write()

Using the requests.get() function, you can send an HTTP GET request to a URL and retrieve the response.
Next, open a file in binary mode ('wb') using the open() function. This mode is crucial for binary files such as images, as it ensures the data is written exactly as received, preserving its integrity.
Finally, use the write() method of the open file object to write the retrieved content to the file.

Let's install the requests library in your Python environment. Run the following command in your terminal:

pip install requests==2.31.0

Even though the image URL points to a PNG image, the following code allows you to save the downloaded image in other formats like JPG or JPEG during the saving process. You'll need to specify the desired format when executing the code.

import os
import requests

def download_image(image_url, file_dir):
    response = requests.get(image_url)

    if response.status_code == 200:
        directory = os.path.dirname(file_dir)
        if not os.path.exists(directory):
            os.makedirs(directory)

        with open(file_dir, "wb") as fp:
            fp.write(response.content)
        print("Image downloaded successfully.")
    else:
        print(f"Failed to download the image. Status code: {response.status_code}")

if __name__ == "__main__":
    image_url = "https://assets.stickpng.com/images/630f85bed5353300f4bd328c.png"
    file_dir = "myimage/download.jpg"
    download_image(image_url, file_dir)

Here’s the output:

Apify logo

What if the provided URL doesn't contain a .png or .jpg extension? No worries! You can still download the image and save it to your local system. You can choose from various formats like PNG, JPG, or JPEG. Additionally, you can specify the desired width and height for resizing.

import requests
import os

def download_image(image_url, file_dir):
    response = requests.get(image_url)

    if response.status_code == 200:
        directory = os.path.dirname(file_dir)
        if not os.path.exists(directory):
            os.makedirs(directory)

        with open(file_dir, "wb") as fp:
            fp.write(response.content)
        print("Image downloaded successfully.")
    else:
        print(f"Failed to download the image. Status code: {response.status_code}")

if __name__ == "__main__":
    image_url = "https://images.unsplash.com/photo-1689182339141-ce4448e34bd7"
    # image_url = "https://images.unsplash.com/photo-1689182339141-ce4448e34bd7?w=2000"
    # image_url = "https://images.unsplash.com/photo-1689182339141-ce4448e34bd7?h=300"

    file_dir = "myimage/download.jpeg"
    download_image(image_url, file_dir)

Here’s the output:

Unsplash image of a racing car downloaded with Python

Save an image with shutil

The shutil module offers several high-level operations on files and collections of files. Let's use the requests library to download an image and the shutil module to save it locally in binary mode efficiently. Shutil is a part of the Python standard libraries, so we don't need to explicitly install it.

The stream=True parameter enables efficient handling of large files by reading and writing the content in chunks. This means the entire response content is not downloaded and loaded into memory at once. Instead, it is divided into smaller, non-overlapping "chunks," which is more memory-efficient.

Setting response.raw.decode_content to True ensures the response content is automatically decoded, saving the image correctly. This is crucial when dealing with binary data like images or compressed files.

import requests
import shutil
import os

def download_image(image_url, file_dir):

    response = requests.get(image_url, stream=True)

    if response.status_code == 200:
        directory = os.path.dirname(file_dir)
        if not os.path.exists(directory):
            os.makedirs(directory)

        with open(file_dir, "wb") as fp:
            response.raw.decode_content = True
            shutil.copyfileobj(response.raw, fp)
        print(f"Image downloaded and saved at {file_dir}")
    else:
        raise Exception(f"Failed to download image. Status code: {response.status_code}")


if __name__ == "__main__":
    image_url = "https://images.unsplash.com/photo-1689182339141-ce4448e34bd7?w=2000"
    file_dir = "myImage/download.jpg"

    download_image(image_url, file_dir)

Here’s the output:

Image of a racing car downloaded with shutil

Note that if you set stream to True when making a request, Requests can't release the connection back to the pool unless you consume all the data or call Response.close(). This can lead to inefficiency with connections. While using stream=True, you should make a request within a with statement to ensure it's always closed. Here's the updated code:

import requests
import shutil
import os

def download_image(image_url, file_dir):
    with requests.get(image_url, stream=True) as response:
        if response.status_code == 200:
            directory = os.path.dirname(file_dir)
            if not os.path.exists(directory):
                os.makedirs(directory)

            with open(file_dir, "wb") as fp:
                response.raw.decode_content = True
                shutil.copyfileobj(response.raw, fp)
            print(f"Image downloaded and saved at {file_dir}")
        else:
            print(f"Failed to download image. Status code: {response.status_code}")

if __name__ == "__main__":
    image_url = "https://images.unsplash.com/photo-1689182339141-ce4448e34bd7?w=2000"
    file_dir = "myimage/download.jpg"
    
    download_image(image_url, file_dir)

Save an image with PIL

The Python Imaging Library (PIL), also known as Pillow, is a free and open-source library for manipulating and processing images. It supports various image file formats, including PNG, JPEG, PPM, GIF, TIFF, and BMP. With Pillow, you can perform various operations on images, such as cropping, resizing, adding text, rotating, and converting to grayscale.

Let's install the Pillow library using the following command:

pip install pillow==10.1.0

We'll use the requests library to download an image, and then Pillow to save and display it. Run the following code to display the image on the screen. The show() method from Pillow is used for this purpose.

from PIL import Image
import requests
from io import BytesIO

def download_image(image_url):
    response = requests.get(image_url)
    if response.status_code == 200:
        # Use BytesIO to convert raw content to a bytes-like object
        image_data = BytesIO(response.content)

        image = Image.open(image_data)
        image.show()
    else:
        print(f"Failed to download image. Status code: {response.status_code}")

if __name__ == "__main__":
    image_url = "https://images.unsplash.com/photo-1689182339141-ce4448e34bd7?w=2000"
    download_image(image_url)

io.BytesIO acts as an in-memory buffer, essentially creating a temporary file-like object to hold the downloaded image data. This allows the Image.open() to process the data as if it were a real image file, even though it never needs to be saved to disk.

You can save the processed image to a physical file using the save() method.

from PIL import Image
import requests
from io import BytesIO
import os

def download_image(url, file_dir):
    response = requests.get(url)

    if response.status_code == 200:
        directory = os.path.dirname(file_dir)
        if not os.path.exists(directory):
            os.makedirs(directory)

        image_data = BytesIO(response.content)
        image = Image.open(image_data)

        image.save(file_dir)
        image.show()
        print(f"Image saved successfully at {file_dir}")
    else:
        print(f"Failed to download image. Status code: {response.status_code}")

if __name__ == "__main__":
    image_url = "https://images.unsplash.com/photo-1689182339141-ce4448e34bd7?w=2000"
    file_dir = "myimage/download.jpg"

    download_image(image_url, file_dir)

Save an image with urllib

urllib in Python provides a simple and powerful way to interact with URLs and access resources over the internet. It is a package that contains several modules for working with URLs:

urllib.request for opening and reading URLs.
urllib.error containing the exceptions raised by urllib.request.
urllib.parse for parsing URLs.
urllib.robotparser for parsing robots.txt files.

To download files from URLs and save them locally, use the urllib.request.urlretrieve function. Pass the URL of the file you want to download and the local filename where you want to save the downloaded file.

Urllib is a part of the Python standard libraries, so we don't need to explicitly install it.

import urllib.request
import os

def download_image(image_url, file_dir):
    try:
        directory = os.path.dirname(file_dir)
        if not os.path.exists(directory):
            os.makedirs(directory)
        urllib.request.urlretrieve(image_url, file_dir)
        print("Image saved successfully.")
    except urllib.error.URLError as e:
        print(f"Error: {e}")


if __name__ == "__main__":
    image_url = "https://images.unsplash.com/photo-1561037404-61cd46aa615b?h=500"
    file_dir = "myimage/download.png"
    download_image(image_url, file_dir)

Here’s the output:

Save an image with wget

❗

Wget has been succeeded by Wget2, which offers significant performance enhancements. Wget2 downloads much faster than its predecessor due to HTTP2, HTTP compression, and parallel connections.

Wget hasn't been actively maintained since 2015, but it is still frequently used to download files from the web. Wget allows you to download files from the internet via FTP, HTTP, or HTTPS, including web pages, PDFs, and XML sitemaps.

While it may still function for basic downloads, you're probably better off using an alternative method listed here to save images with Python.

import wget
import os

def download_image(url, save_path):
    try:
        wget.download(url, out=save_path)
        print("\\nImage downloaded successfully!")
    except Exception as e:
        print(f"Error downloading image: {e}")

if __name__ == "__main__":
    image_url = "<https://images.unsplash.com/photo-1561037404-61cd46aa615b?h=200>"
    filepath = "myimage/download.jpg"

    download_image(image_url, filepath)

Here’s the result:

Save an image with OpenCV

OpenCV is a powerful library primarily used for computer vision tasks like image processing. Here, we'll use the Python OpenCV library to decode and display the image. We'll also use NumPy to convert the downloaded data (a byte array) into a NumPy array, which OpenCV can better understand for image processing.

Let's install the OpenCV and NumPy using the following command:

pip install numpy==1.25.2
pip install opencv-python==4.9.0.80

Let's take a look at the process step by step:

np.frombuffer(response.content, np.uint8): Converts the downloaded byte array to an 8-bit unsigned NumPy array (suitable for image data).
cv2.imdecode(image_array, cv2.IMREAD_COLOR): Decodes the NumPy array into a color image object using OpenCV.
cv2.imwrite(save_path, image): Saves the decoded image to the specified path.
cv2.imshow('Cat Image', image): Displays the image in a window.
cv2.waitKey(0): Waits for a key press before closing the window.
cv2.destroyAllWindows(): Closes all OpenCV windows for proper resource management.

Here’s the code:

import cv2
import requests
import numpy as np
import os

def downloadSaveImage(url, fileDir):
    # Download the image from the URL
    response = requests.get(url)

    # Convert the content to a NumPy array
    imageArray = np.frombuffer(response.content, np.uint8)

    # Decode the NumPy array into an image
    image = cv2.imdecode(imageArray, cv2.IMREAD_COLOR)

    directory = os.path.dirname(fileDir)
    if not os.path.exists(directory):
        os.makedirs(directory)
    
    # Save the image
    cv2.imwrite(fileDir, image)

    # Display the image
    cv2.imshow("Cat Image", image)
    cv2.waitKey(0)
    cv2.destroyAllWindows()


if __name__ == "__main__":
    url = "https://images.unsplash.com/photo-1596854407944-bf87f6fdd49e?h=400"
    fileDir = "myimage/opencv-image.png"
    downloadSaveImage(url, fileDir)

The output is this:

Assuming you’ve already downloaded an image onto your computer, you can load it using the imread() function provided by OpenCV. Here are some second parameters you can pass to the function:

cv2.IMREAD_COLOR (or 1): Loads the image in color (default)
cv2.IMREAD_GRAYSCALE (or 0): Loads the image in grayscale

import cv2

image_cv2 = cv2.imread(r"myimage/opencv-image.png")
cv2.imshow("Cat Image", image_cv2)
cv2.waitKey(0)
cv2.destroyAllWindows()

Save an image with matplotlib

Matplotlib is a powerful library for generating static, animated, and interactive visualizations in Python. For our specific task of downloading and saving images, Matplotlib plays key roles:

Image loading: mpimg.imread from Matplotlib efficiently reads the raw image data from the downloaded response.
Image saving: plt.imsave saves the loaded image data to a specified file path.
Image display (optional): For visual confirmation, you can use plt.imshow and [plt.show](<http://plt.show>) to display the image within a Matplotlib window temporarily.

Let's install the Matplotlib library with the following command:

pip install matplotlib==3.7.3

First, we’ll send a GET request to the URL to retrieve the image content. The response object (response.raw ) will then be used to read the raw image data using mpimg.imread. Following that, the image data will be saved to the specified save_path using plt.imsave.

To display the image data temporarily within the Matplotlib window, use plt.imshow(). Finally, use plt.show() to make the Matplotlib window visible.

import requests
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

def download_save_image(url, save_path):
    response = requests.get(url, stream=True)
    if response.status_code == 200:

        # Read the image from the response content
        img = mpimg.imread(response.raw, format="jpeg")

        # Save the image
        plt.imsave(save_path, img)

        # Display the image
        plt.imshow(img)
        plt.show()
        print("Image downloaded successfully.")
    else:
        print("Failed to download the image.")

if __name__ == "__main__":
    url = "<https://images.unsplash.com/photo-1596854407944-bf87f6fdd49e?h=400>"
    save_path = "myimage/cat.png"
    download_save_image(url, save_path)

The output is:

Assuming you have an image already downloaded on your computer, you can read it using the imread function and display it using plt.show. Here's how:

import matplotlib.pyplot as plt
import matplotlib.image as mpimg

imag = mpimg.imread("myimage/cat.png")
plt.imshow(img)
plt.show()

Downloading images asynchronously

Asynchronous downloading allows you to run multiple tasks concurrently without needing to wait for each one to complete before starting the next. However, it requires more complex implementation compared to synchronous approaches.

Install the aiohttp and aiofiles libraries using the below commands:

pip install aiohttp==3.9.3
pip install aiofiles==23.2.1

In the following code, the download_all asynchronous function schedules multiple download coroutines concurrently using asyncio.gather.

import aiohttp
import asyncio
import aiofiles
import time

async def download(url, filename):
    print(f"Downloading {url}")
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as response:
            async with aiofiles.open(filename, 'wb') as f:
                await f.write(await response.read())
    print(f"\\nDownload completed for {url}.")

async def download_all(urls):
    tasks = []
    for i, url in enumerate(urls):
        tasks.append(download(url, f"image_async_{i}.jpg"))
    await asyncio.gather(*tasks)

urls = [
    "<https://images.unsplash.com/photo-1683009427513-28e163402d16?h200>",
    "<https://images.unsplash.com/photo-1682687220305-ce8a9ab237b1?h=205>",
    "<https://images.unsplash.com/photo-1706612527282-1e7004678ab1?h=210>",
    "<https://images.unsplash.com/photo-1706271625297-cb3c47286e37?h=215>",
    "<https://images.unsplash.com/photo-1701405790155-e0591d01590a?h=220>"

]

start_time = time.time()
asyncio.run(download_all(urls))
end_time = time.time()
print(f"\\nAsynchronous download took {end_time - start_time} seconds.")

Time taken by asynchronous download:

Time taken by synchronous download:

Next steps: using async/await in Python

We've discussed various methods of downloading and saving images from URLs using Python. Each method has its pros and cons, so choose the one that best suits your project's needs. The Requests module has emerged as a popular and reliable choice for downloading content in Python, even gaining recommendations from the urllib documentation as a high-level HTTP client interface.

Next steps? Explore asynchronous methods for downloading and saving images, as they can be more efficient. This tutorial on Python asyncio provides a good starting point for learning how to write concurrent code in Python using the async/await syntax.

🦾

Interested in web scraping with Python? Read our comprehensive guide.

How to save an image with Python

Content

Using Python for downloading images

What you'll need to start saving images with Python

Python scraping templates

Save an image with write()

Save an image with shutil

Save an image with PIL

Save an image with urllib

Save an image with wget

Save an image with OpenCV

Save an image with matplotlib

Downloading images asynchronously

Next steps: using async/await in Python

Get started now

Content

Using Python for downloading images

What you'll need to start saving images with Python

Python scraping templates

Save an image with write()

Save an image with shutil

Save an image with PIL

Save an image with urllib

Save an image with wget

Save an image with OpenCV

Save an image with matplotlib

Downloading images asynchronously

Next steps: using async/await in Python

TAGS

SHARE

Related articles

Sentiment analysis in Python (Complete guide for 2024)

How to scrape Google Finance in Python

Web scraping with Scrapy 101

Get started now