How to handle timeouts in Python Requests

Learn to set connect and read timeouts, handle exceptions, and use timeouts with sessions and in a multithreaded environment.

Hi, we're Apify, a full-stack web scraping and browser automation platform. If you're interested in using Python for web scraping, this tutorial shows how to use timeouts effectively in the Python Requests library for any HTTP request.

You need to use timeouts when making requests in Python. They prevent your program from freezing indefinitely while waiting for a response from external servers or APIs.

In this article, you’ll learn how to use timeouts effectively in the Python Requests library for any HTTP request. You'll learn:

  • how to set connect and read timeouts
  • handle timeout exceptions
  • use timeouts with sessions
  • use timeouts in a multithreaded environment

We'll also provide some practical code examples and explore alternatives to the Python Requests library.

Python Requests

The Python requests library makes sending HTTP requests incredibly easy. As one of the most popular Python packages with over 30 million weekly downloads, Requests is relied upon by over 2.5 million repositories on GitHub. It supports various HTTP methods (GET, POST, PUT, DELETE) to interact with APIs, enabling CRUD operations on API resources. Requests is also a popular tool for web scraping tasks, simplifying the process of fetching website content and allowing you to extract valuable data.

You can install Requests using the following command:

pip install requests

Why implement timeouts in Python Requests?

When making requests to external servers, you have to set timeouts. If a server doesn't respond promptly, it can cause your application to freeze or run indefinitely, which is not only frustrating for users but also wastes resources like CPU cycles, memory, and threads in small amounts.

The requests library doesn't come with default timeouts. So you must explicitly configure them using the timeout parameter to avoid unnecessary delays and resource wastage.

Templates for Requests, Beautiful Soup, Scrapy, Playwright, Selenium

Setting timeouts in Python Requests

To set a timeout for a request, use the timeout parameter. This parameter accepts either an integer or floating-point number, specifying the maximum number of seconds to wait for a response from the server.

Setting timeout for entire Python Requests

Let's make a simple GET request with the timeout parameter set to 3 seconds. This specifies a maximum wait time of 3 seconds for a server response. The timeout applies to the overall request, including connection establishment and receiving the response. If the request doesn't receive a response within 3 seconds, a Timeout exception will be raised

import requests

resp = requests.get("https://httpbin.org", timeout=3)
print(resp)

Setting connect and read timeouts separately

Connect timeout refers to the maximum duration allowed to connect with the server. On the other hand, read timeout specifies the maximum waiting time for a response after the connection has been established.

If you provide a single value to the timeout parameter, it will act as a combined timeout for both connection and receiving the response. However, to set separate timeouts, you can pass a tuple comprising of the connect timeout and read timeout values (connect_timeoutread_timeout) to the timeout parameter.

import requests

resp = requests.get("https://httpbin.org", timeout=(3, 5))
print(resp)

This request will time out if either the connection establishment takes more than 2 seconds or if the server response is slow, taking more than 5 seconds to transmit data.

📌
Note: Timeout is not a time limit on the entire response download. Instead, an exception is raised if the server has not issued a response within the specified period. If no timeout is explicitly specified, requests do not time out.

Handling timeout exceptions

If a request times out (no response within the limit), a Timeout exception is raised. To prevent your program from crashing, handle this exception using a try-except block.

import requests

resp = requests.get("https://httpbin.org", timeout=(0.001, 0.002))
print(resp)

The result is this:

Timeout error

The code is causing a timeout error due to the low connection and read timeouts we’ve set. HTTP requests typically take longer than milliseconds to complete, so reasonable timeouts are essential. Establishing connections and receiving responses takes time, so using low values, like in this example, is not recommended. We only used these low values to demonstrate how timeout exceptions appear.

This error can be easily handled using try-except blocks. The following code will catch the timeout error and print a message instead of letting the program crash.

import requests

try:
    resp = requests.get("https://httpbin.org", timeout=(0.001, 0.002))
except requests.exceptions.Timeout:
    print("The request timed out!")

Implementing timeouts with sessions

To set a global default timeout for all requests made using the session, you can create a custom class inheriting from requests.Session and override the request method.

import requests

class TimeoutSession(requests.Session):
    def request(self, method, url, **kwargs):
        kwargs.setdefault("timeout", 4)
        return super().request(method, url, **kwargs)

session = TimeoutSession()

try:
    response = session.get("https://httpbin.org")
    if response.status_code == 200:
        print("Request successful!")
    else:
        print("Request failed with status code:", response.status_code)
except requests.exceptions.Timeout:
    print("Request timed out!")
except requests.exceptions.RequestException as e:
    print("Request failed:", e)

Here's what the code does:

  • The TimeoutSession class inherits from requests.Session, creating a custom session object for making HTTP requests.
  • It overrides the request method to intercept all HTTP requests made through the session.
  • Within the overridden method, kwargs.setdefault('timeout', 4) sets a default timeout of 4 seconds for any request using this session object.

Timeouts in multi-threaded / asynchronous setups

In multi-threaded programming, handling timeouts properly is crucial to avoid blocking and ensure efficient resource utilization. Imagine you have a program with multiple threads making requests to different websites. If one of these websites is slow or unresponsive and there's no timeout set, the thread making the request will wait indefinitely for a response, effectively blocking that thread from doing anything else.

Here's an example illustrating how to handle timeouts in a multi-threaded setup:

import requests
import concurrent.futures

def make_request(url):
    try:
        response = requests.get(url, timeout=3)
        return response.status_code, response.content
    except requests.exceptions.Timeout:
        return "Timeout occurred for URL:", url
    except requests.exceptions.RequestException as e:
        return "Request failed for URL:", url, e

urls = [
    "https://httpbin.org/",
    "https://scrapeme.live/shop/",
    "https://quotes.toscrape.com/",
]

with concurrent.futures.ThreadPoolExecutor() as executor:
    futures = [executor.submit(make_request, url) for url in urls]
    for future in concurrent.futures.as_completed(futures):
        result = future.result()
        print(result)

The above code uses ThreadPoolExecutor from the concurrent.futures module to execute requests asynchronously across multiple threads. Each request is submitted asynchronously using the executor.submit() method. Once the requests are complete, we iterate over the futures returned by as_completed() to obtain the results and print them.

Timeout best practices

To prevent applications from hanging indefinitely, set timeouts for external server requests. Here are some best practices:

  1. Always set timeouts to avoid indefinite hanging, which is unacceptable.
  2. If you're making requests to a service that you know is quick, a shorter timeout could be appropriate. On the other hand, if the requests involve large data or complex operations, longer timeouts (up to 30 seconds) might be necessary.
  3. It's a good practice to set connect timeouts to slightly larger than a multiple of 3, which is the default TCP packet retransmission window.
  4. Implement proper error handling for Timeout exceptions. This allows your code to react appropriately to timeouts.
  5. Consider retrying requests upon timeout errors or temporary network issues.

Deploy any scraper to Apify

Examples of using timeouts in Python Requests

Let's explore some practical scenarios where timeout can be effective.

Example 1: Scraping data

Let's scrape some data from the YC news website, such as URLs, titles, or rankings. We'll use a 3-second timeout and handle any exceptions that arise due to timeouts.

import requests
from bs4 import BeautifulSoup

try:
    response = requests.get("https://news.ycombinator.com/", timeout=3)
    html = response.text

    soup = BeautifulSoup(html, "html.parser")
    articles = soup.find_all(class_="athing")

    output = []

    for article in articles:
        data = {
            "URL": article.find(class_="titleline").find("a").get("href"),
            "title": article.find(class_="titleline").getText(),
            "rank": article.find(class_="rank").getText().replace(".", ""),
        }
        output.append(data)
    print(output)
except requests.exceptions.Timeout:
    print("The request timed out. Please try again later.")
except requests.exceptions.RequestException as e:
    print("Request failed:", e)

The result is:

yc.png

Example 2: Creating a GitHub repository

This example demonstrates creating a GitHub repository using a POST request. We'll specify a timeout and handle any exceptions that might occur during repository creation.

To interact with the GitHub API, you'll need to create a personal access token. For more information about interacting with the GitHub API using HTTP methods like GET, POST, PATCH, and DELETE, you can refer to How to use the GitHub API in Python.

import requests

base_url = "https://api.github.com"

def create_repo(access_token, repo_name, repo_descr=None):
    url = f"{base_url}/user/repos"

    headers = {
        "Authorization": f"token {access_token}",
    }

    data = {
        "name": repo_name,
        "description": repo_descr,
    }

    try:
        response = requests.post(url, headers=headers, json=data, timeout=2)
        response.raise_for_status()

        if response.status_code == 201:
            repo_data = response.json()
            return repo_data
        else:
            return None
    except requests.exceptions.Timeout:
        print("Request timed out. Please try again later.")
        return None
    except requests.exceptions.RequestException as e:
        print("Request failed:", e)
        return None

access_token = "YOUR_ACCESS_TOKEN"
repo_name = "timeout"
repo_descr = "New repo created using timeout!"

new_repo = create_repo(access_token, repo_name, repo_descr)

if new_repo:
    print(f"New public repo created successfully!")
else:
    print("Failed to create a new repo.")

The result is:

New GitHub repo created using timeout

Example 3: Simple local application

Let's create a simple Flask app. We‘ll intentionally slow down the server by adding a sleep function. Afterward, we’ll initiate a request from the client side and observe if any timeouts occur.

server.py

from flask import Flask
import time

app = Flask(__name__)

@app.route("/order")
def order():
    time.sleep(30)
    return "What do you want to order today?"

if __name__ == "__main__":
    app.run(debug=False, port=8080)

client.py

import requests

server_url = "http://127.0.0.1:8080/order"

try:
    response = requests.get(server_url, timeout=4)
    response.raise_for_status()

    print(response.text)

except requests.exceptions.Timeout:
    print("Request timed out. Please try again later.")

except requests.exceptions.RequestException as e:
    print("Request failed:", e)

Here's the result:

timeout python server

The timeout errors occur, but we’ve handled them gracefully.

Third-party alternatives to Requests

There are several alternatives to the Python Requests library, such as aiohttp, httpx, urllib3, and uplink. But here, we’ll just focus on the httpx and aiohttp libraries.

aiohttp

aiohttp is a widely used Python library that offers both server and client frameworks for asynchronous HTTP communication. It has many capabilities like custom headers, authentication, cookies, and handling redirects. It is built on the asyncio library which enables support for async/await syntax.

You can install it using this command:

pip install aiohttp

The following code handles timeouts by setting them during the HTTP request using aiohttp.ClientTimeout and catching asyncio.TimeoutError exceptions.

import aiohttp
import asyncio

async def fetch_data(url):
    try:
        async with aiohttp.ClientSession() as session:
            async with session.get(
                url, timeout=aiohttp.ClientTimeout(connect=3, sock_read=4)
            ) as response:
                if response.status == 200:
                    return await response.text()
                else:
                    print("Failed to fetch URL. Status code:", response.status)
                    return None
    except asyncio.TimeoutError:
        print("Timeout occurred while fetching the URL.")
        return None
    except aiohttp.ClientError as e:
        print("An error occurred:", e)
        return None

async def main():
    url = "http://httpbin.org"
    content = await fetch_data(url)
    if content:
        print(content)

asyncio.run(main())

httpx

httpx is a Python HTTP client library that supports synchronous and asynchronous operations.

Install it using the following command:

pip install httpx

This code uses httpx.AsyncClient for asynchronous requests and sets a timeout using httpx.Timeout.

import httpx
import asyncio

async def fetch_data(url):
    try:
        async with httpx.AsyncClient() as client:
            timeout = httpx.Timeout(timeout=3)
            response = await client.get(url, timeout=timeout)
            if response.status_code == 200:
                return response.text
            else:
                print("Failed to fetch URL. Status code:", response.status_code)
                return None
    except httpx.TimeoutException:
        print("Timeout occurred while fetching the URL.")
        return None
    except httpx.HTTPError as e:
        print("An HTTP error occurred:", e)
        return None

async def main():
    url = "http://httpbin.org"
    content = await fetch_data(url)
    if content:
        print(content)

asyncio.run(main())

Conclusion

We've seen that timeouts are critical configurations when making requests to unreliable servers and APIs. We've also learned how the requests library provides flexible options for setting timeouts. Now you know how to use Python Requests to handle timeout exceptions, use timeouts with sessions, and use them in a multithreaded environment.

On this page

Build the scraper you want

No credit card required

Start building