How to follow redirects using cURL

What are redirects?

Redirects refer to HTTP responses that indicate that a resource has been moved to another location. When a client (such as cURL) tries to access a resource via a URL that has been redirected, it receives a redirect response with an HTTP status code in the 3xx range, along with a Location header that specifies the new URL where the resource can be found.

Using the -L or --location command line option makes cURL follow redirects.

curl -L [URL]

This tells cURL to follow any redirects it encounters. cURL doesn't do this by default but instead shows the content of the original 3xx response except when the -L option is specified.

Redirects are of various types referenced by different status codes:

301 Permanent Redirect: This status code shows that the requested resource has been permanently moved to a new URL. It is important to use this new URL for all future requests.
302 Found: This status code indicates that the requested resource exists but is temporarily located at a different URL. This means you should continue to use the original URL for future requests.
303 See Other: This status code is an indication that the requested resource has been moved to a new location, but the client should use a different HTTP method (a GET method) to access it, as the status code is typically used in response to a POST request.
307 Temporary Redirect: This is quite similar to how 302 works, as it also indicates that the requested resource is temporarily located at a different URL. The client should continue to use the original URL for future requests, maintaining the request method (POST or GET).
308 Permanent Redirect: Similar to 301, this status code indicates that the requested resource has been permanently moved to a new URL. This means you should use the new URL in the response header for future requests while maintaining the request method (POST or GET).

How to follow HTTP redirects with cURL

HTTP redirects will occur when a server responds with a status code between 300 and 399, indicating that the requested resource has moved to another location. Without proper handling, redirects can lead to errors or missed data.

curl -i -L

We use the -L option in the above example to tell cURL to keep following redirects until it reaches the final destination. cURL treats different types of 3xx redirects similarly when the -L option is used. We have also introduced the -i option which shows information on all redirects that occur during the request and their associated HTTP response headers.

Following multiple redirects with cURL

cURL will follow all 3xx redirects up to the maximum number of redirects allowed, that is, 50. This means you can use the -L option, specified once, to follow all redirects up to 50 redirects by default. Suppose you need to specify different options for redirects, in that case, you can use other options like -H (header), -b (cookie), or -X (request method) multiple times to specify different options for all redirects. Here is an example:

curl -L -H "Accept: application/json" -b "cookie=set_value"

In the example, cURL will follow the first redirect with the Accept header set to application/json, and also follow the subsequent requests (after the first redirect) with the Accept header still set as application/json and cookie set to set_value. So, both the Accept header and the cookie value will be applied to all subsequent requests, including the first redirect.

How to change the maximum allowed cURL redirects

cURL follows redirects up to a maximum of 50 times by default. This means that if a redirect is encountered, cURL will automatically send a new request to the redirected URL. If another redirect is encountered, cURL will follow it again, and so on, up to a maximum of 50 redirects.

Introducing the --max-redirs option

The --max-redirs option allows you to control the maximum number of redirects curl will follow. This option is essential when working with servers that issue multiple redirects or when you need to limit the number of redirects for security or performance reasons.

Here’s how you would use the --max-redirs option with our previous example to limit the redirects to 5.

curl -L --max-redirs 5 -H "Accept: application/json" -L -b "cookie=set_value"

By setting this sort of limit on the number of redirects, we can reduce latency on server response, improve overall performance, and ultimately prevent infinite loops.

How to show redirects with cURL

cURL offers ways to ensure you have information about what's happening with redirects. This information is important for security reasons and also useful for debugging. So, we'll go through the ways that cURL ensures you have information about what's happening with redirects, including some commands (if you’re on Linux) that further improve how redirect information is rendered.

Using the `v` (verbose) option

The -v option in curl stands for verbose and displays comprehensive information about the entire HTTP request and response exchange, including redirects and the response headers.

curl -v -L

This command will show verbose information about the request and response to the console, including the sequence of redirects.

Using `tee` to display and save redirects

The tee command is a Linux command that allows you to display and save output simultaneously. We can use tee with cURL to identify the redirected URL.

curl -L  | tee redirects.txt

This command will display each redirect step and save them to redirects.txt as | tee redirects.txt pipes the output to tee, which displays and saves it to redirects.txt

Recursively follow redirects using bash scripting

If we want to display the entire redirect chain while recursively following redirects, we can refactor our cURL command as a bash script to use a recursive function.

#!/bin/bash

show_redirects() {
    URL=$1
    curl -L -s -o /dev/null -w "%{url_effective}\\n" $URL | tee -a redirects.txt
    NEW_URL=$(curl -L -s -o /dev/null -w "%{url_effective}\\n" $URL)

    # If the new URL is different from the original URL, recursively follow redirects.
    if [ "$NEW_URL" != "$URL" ]; then
      show_redirects $NEW_URL
    fi
}

show_redirects "https://google.com"

The bash script leverages cURL with the -L, -s, -o, and -w options to effectively retrieve the final URL after all redirects, while the tee command appends each step to the redirects.txt file for further analysis.

GET or POST when following redirects with cURL

cURL redirects work with both GET and POST request methods, but there might be cases when you want it to stick with the original request method while carrying out redirects.

The key difference between GET and POST redirects lies in the request method used for the follow-up request:

GET redirect: This is when a GET request is used for the follow-up request, which is the default behavior for most HTTP clients, including web browsers. This is because GET requests are considered safe and idempotent, meaning they don't modify the server in any way.
POST redirect: This is when a POST request is used for the follow-up request, preserving the original request method. This is essential when the original request contains data that needs to be processed by the server, such as form data or file uploads.

cURL will change the request method to GET by default when following redirects. This means that if you initially sent a POST request, the follow-up request will be changed to a GET request. This behavior is in line with the HTTP specification, which recommends using GET for redirects.

How to make cURL not change POST to GET when following 3xx redirects

When submitting a form or uploading a file, changing the request method from POST to GET could result in lost data, incorrect processing, or failed uploads. In such cases, you want to preserve the original POST request method.

To do that, use the -L or --location option with -X POST or --request POST. This will ensure that the follow-up request uses the same POST method as the original request.

curl -X POST -d "username=myusername&password=mypassword" -L -X POST

In the above example, the -d option ensures myusername and mypassword are sent as a part of the request to the target URL. We have also used the -L and -X POST options to ensure we preserve the original POST method when redirects are followed in the request.

Redirecting to other hostnames with cURL

It's also common to encounter scenarios where a website redirects users to a different hostname or domain. This may happen for various reasons, and here are some common scenarios for hostname redirection.

Load balancing: Many websites use load-balancing techniques to distribute incoming traffic across multiple servers. In such cases, the initial request may be redirected to a different hostname or IP address to balance the load.
Content Delivery Networks (CDNs): CDNs are used to serve static content (e.g., images, CSS, JavaScript files) from servers geographically closer to the user, improving performance. Requests for these resources may be redirected to a CDN hostname.
Domain migrations: When a website migrates to a new domain, the old domain may redirect users to the new one.
Subdomains: Some websites use different subdomains for various purposes (e.g., www.apify.com, blog.apify.com), and requests may be redirected between them.

Handling authentication with cURL

In some cases, when a website redirects to a different hostname, you may need to authenticate the user again. This is because the authentication credentials (e.g., cookies, session tokens) are often tied to a specific hostname or domain. When the redirection occurs, the new hostname may not have access to the existing authentication credentials, requiring you to re-authenticate.

If you need to authenticate the user after a redirect, you can use cURL’s built-in authentication options or provide the necessary credentials (e.g., cookies, headers) in the request.

Using basic authentication

curl -L -u username:password

We use the -u option is used to provide the username and password for basic authentication. If the website redirects to another hostname and requires authentication, cURL will automatically send the provided credentials.

Using cookie credentials

We'll start by sending a request to the example website, retrieve the cookies, and save the cookies received in response to a file named cookies.txt.

curl -c cookies.txt

The -c option tells cURL to save the cookies to the specified file.

Next, we can make subsequent requests to the website using the saved cookies:

curl -L -b cookies.txt

This command sends a request to https://example.com/login and includes the cookies from the cookies.txt file using the -b option.

Since the cookies contain the session cookie, the server and all other hostnames will recognize the request as coming from an authenticated user.

Following redirects with cURL in PHP

When working with web services and APIs, redirects are a common occurrence. However, like the cURL client, the cURL library in PHP does not follow redirects by default, which can lead to unexpected behavior and errors. But to do that, you would need to set the CURLOPT_FOLLOWLOCATION option to true. This tells cURL to follow redirects and return the final response.

In the example, cURL will follow redirects and return the final response, while limiting the number of redirects it will follow by setting the CURLOPT_MAXREDIRS option to 5.

Following redirects with POST request

 'hello'));
$response = curl_exec($ch);
curl_close($ch);
echo $response;
?>

In this example, cURL will follow redirects and send a POST request with data.

How to verify a broken redirect chain

A broken redirect chain occurs when a website or server responds with a series of redirects (HTTP status codes 3xx) that ultimately lead to a dead end, resulting in an error or a loop. This can happen when there are incorrect or outdated redirects, causing a chain reaction of redirects that don't resolve to a final destination.

Here's an example with steps on how to verify a broken redirect chain:

1. Start with the initial URL

curl -I

The -I option shows the HTTP headers only.

2. Follow the redirects

curl -v -L --max-redirs 5

3. Analyze the output

If the redirects lead to a final destination (200 OK status code), the chain is valid. If the redirects loop or result in an error (e.g., 404, 500), the chain is broken.

What to do when cURL doesn’t follow redirects

When cURL doesn't follow redirects even with the usage of the -L option, it is usually because the redirects are implemented using HTML or JavaScript. Here's why cURL can't follow these two types of redirects.

HTML redirects

HTML redirects with the rel attribute set to "noreferrer" are used for privacy reasons, such as preventing the referrer header from being sent. Curl can’t follow these redirects because they are client-side redirects, and cURL is a server-side tool.

JavaScript redirects

JavaScript redirects using JavaScript methods like window.open() and window.location are also client-side redirects. So, cURL can't follow these redirects because it doesn't execute JavaScript and can not interact with client-side code.

To handle these types of redirects, you'll need to use a tool that can parse HTML and execute JavaScript, such as a web browser. Alternatively, you can use a tool like the Crawlee library, which can handle redirects and other complex web scraping tasks.

Parting thoughts

While cURL can't follow HTML redirects, HTML meta refreshes, or JavaScript redirects, it can verify and avoid getting stuck in redirect loops with the --max-redirs option. If cURL is not following redirects as expected, you can double-check that you have included the -L flag, which instructs cURL to follow the redirected URL.

Redirects to different hostnames may require re-authentication, while languages like PHP let us follow redirects by setting the CURLOPT_FOLLOWLOCATION option to true.

Ultimately, by properly configuring cURL, you ensure your scripts handle redirects effectively whenever resources move.