How to use a proxy in Puppeteer (full guide for 2024)

A detailed guide to using proxies with Puppeteer. Learn how to authenticate proxies and avoid common pitfalls.

Content

Hey, we're Apify. You can build, deploy, share, and monitor your scrapers on the Apify platform. Check us out.

Puppeteer is a Node.js library that can interact with headless and headful Chrome browsers. You can use Puppeteer to perform automation tasks such as automating file uploads and downloads, taking screenshots of websites, and generating PDFs.

There's even more you can do with Puppeteer, including adding proxies to your automation scripts.

In this article, you'll learn how to use proxies in Puppeteer, authenticate proxies, and avoid common pitfalls.

What is a proxy in Puppeteer?

proxy acts as an intermediary between your Puppeteer scripts and the target website, hiding your IP address and displaying the proxy address to the website (the target website can still monitor your traffic through cookies).

The proxy allows you to make multiple requests to a website if there’s a need for it without being detected or blocked.

You can also be able to access websites that are limited to a particular region for the purpose of testing how websites are designed or behave in those regions.

In the next section, you will learn how to set up a proxy in your puppeteer script.

How to set up a proxy in Puppeteer

📌
Prerequisite
You need to have Node.js and NPM installed on your computer.

To set up a proxy in your project, create a folder and change your directory into the folder (the folder can be created on your desktop, in documents, or at your desired file location).

mkdir pptProxy && cd pptProxy
npm init -y
npm install puppeteer

The command above will create a folder called pptProxy; it will change your working directory into the folder, and the second command, npm init -y, will create a Node instance for your project.

This Node instance allows you to run any command related to Node using the npm package. The third command will install Puppeteer for your project.

Before you go further, you need to have a proxy.

You can get a proxy from any of the listed free proxy providers here, but we recommend using paid proxy services as a general rule.

Some proxies require no authentication, while others require authentication.

Within your puppeteer.launch function, introduce a flag --proxy-server, which tells your script to use a proxy.

const browser = await puppeteer.launch({
Replace the proxy with the IP address and port
    args: ['--proxy-server=your-proxy-IP:port']
  });

When your script is complete and set to run, all the requests on the target website will be made through the proxy server provided.

To verify your configuration, visit https://httpbin.org/ip before running the script.

Take note of your own IP address, run the script, and take a screenshot of the page.

Compare the output IP to see if it matches the proxy IP.

Example

import puppeteer from "puppeteer";

const browser = await puppeteer.launch({
    args: ["--proxy-server=your-proxy-IP:port"]
});
 const page = await browser.newPage()

 await page.goto('https://httpbin.org/ip');
 
screenshot code, full-page option
await page.screenshot({ path: 'apify.jpeg', fullPage: true });

// close the browser
await browser.close();
When using a proxy server, understand the protocol your proxy is using (either HTTP, HTTPS, or socks5). If you are not using the correct one, you might encounter failure such like the image below.
Proxy error message

Premium proxy providers require you to authenticate your use of their servers; hence, a need for credentials arises. When you use a proxy requiring authentication in a non-headless browser (specifically Chrome), you'll be required to add credentials into a popup dialog that looks like this:

Proxy authentication required pop up message

The problem with running in headless mode is that this dialog never exists, as there is no visible UI. This means that you must take another approach to authenticate your proxy.

Example

const browser = await puppeteer.launch({
    args: [`--proxy-server=https://myUsername:myPassword@my.proxy.com:8080`],
});

The above proxy configuration won't work because Chrome doesn't support passing the proxy credentials directly; they'll be ignored. You can solve that problem using a package we'll go into later.

The next section will guide you on how to authenticate a proxy in Puppeteer.

How to authenticate a proxy in Puppeteer

If you're using a proxy server that requires authentication, as stated earlier, you have to pass the login credentials separately, and Puppeteer has provided different ways to perform this operation. Let's go through them.

1. Using the authenticate() method on the puppet page object

The authenticate() method accepts the pair of credentials which will be used for basic HTTP authentication

import puppeteer from "puppeteer"

const proxy = 'https://your.proxy.com:7001';
const username = 'jimmy49';
const password = 'password123';

// Pass proxy URL into the --proxy-server arg
 const browser = await puppeteer.launch({
        args: [`--proxy-server=${proxy}`],
    });

 const page = await browser.newPage()

 Authenticate your proxy before visiting the target website
 await page.authenticate({ username, password });

 await page.goto('https://httpbin.org/ip');
 
//screenshot fullpage option
await page.screenshot({ path: 'apify.jpeg', fullPage: true });

Close the browser
await browser.close();

If you set your proxy URL correctly and ensure your login credentials are accurate, then your script will work accurately.

2. Using the proxy-chain NPM package

The proxy-chain package is an open-source package developed by and maintained by Apify, which provides a different approach with a feature that allows you to easily "anonymize" an authenticated proxy.

This can be done by passing your proxy URL with authentication details into the proxyChain.anonymizeProxy method, then using its return value within the --proxy-server argument when launching Puppeteer.

To use proxy-chain in your project, you need to install the library using npm install proxy-chain

import puppeteer from 'puppeteer';
import proxyChain from 'proxy-chain'

const proxy = 'http://my.proxy.com:3001';
const username = 'jimmy49';
const password = 'password123';

    const originalUrl = `http://${username}:${password}@${proxy}`;

    // Return anonymized version of original URL; it looks like <http://127.0.0.1:45678>
    const newUrl = await proxyChain.anonymizeProxy(originalUrl);

    const browser = await puppeteer.launch({
        args: [`--proxy-server=${newProxyUrl}`],
    });

    const page = await browser.newPage();

    await page.goto('https://httpbin.org/ip');
 
		//screenshot fullpage option
		await page.screenshot({ path: 'apify.jpeg', fullPage: true });

    await browser.close();

    // Close any pending connections
    await proxyChain.closeAnonymizedProxy(newProxyUrl, true);

An important thing to note when using this method is that after closing the browser, it is a good idea to use the closeAnonymizedProxy() method to forcibly close any pending connections that there may be.

This package performs both basic HTTP proxy forwarding, and HTTP CONNECT tunneling to support protocols such as HTTPS and FTP. It also supports many other features, so it is worth looking into it for other use cases.

3. Within ProxyConfigurationOptions in the Apify SDK

The Apify SDK is the most efficient way to write scalable automation and scraping software in Node.js using Puppeteer, Playwright, and Cheerio. If you aren't familiar with it, check out the documentation for either Javascript or Python.

Within the ProxyConfigurationOptions object in which you provide the Apify.createProxyConfiguration() method, there's an option named proxyUrls. This is simply an array of custom proxy URLs which will be rotated. Though it's an array, you can still pass only one proxy URL.

Pass your proxy URL with authentication details into the array, then pass the proxyConfiguration into the options of PuppeteerCrawler, and your proxy will be used by the crawler.

import { Actor } from 'apify';
import { PuppeteerCrawler } from 'crawlee';

await Actor.init();

const proxy = 'http://my.proxy.com:3001';
const username = 'jimmy49';
const password = 'password123';

    const requestList = await RequestList.open([{ url: '<https://google.com>' }]);

    // Pass authenticated proxy URL into proxyUrls
    const proxyConfiguration = await Actor.createProxyConfiguration({ proxyUrls: [`http://${username}:${password}@${proxy}`] });

    const crawler = new PuppeteerCrawler({
        requestList,
        requestQueue,
        // Pass proxyConfiguration into the crawler
        proxyConfiguration,
        async requestHandler({ page }){
            const title = await page.title();
            console.log(title);
        },
    });

await crawler.run();

await Actor.exit();

The advantage of using the Apify SDK for proxies over the first method is that multiple different custom proxies can be inputted, and their rotation will be automatically handled.

4. Setting the Proxy-Authorization header

If all else fails, setting the Proxy-Authorization header for each of your crawler's requests is an option. However, it does have its setbacks. This method only works with HTTP websites and not HTTPS websites.

Similarly to the first method, the proxy URL needs to be passed into the --proxy-server flag within args. The second step is to set an extra authentication header on the page object using the setExtraHTTPHeaders() method.

import puppeteer from "puppeteer"

const proxy = 'http://my.proxy.com:3001';
const username = 'jimmy49';
const password = 'password123';

    // Pass proxy URL into the --proxy-server arg
    const browser = await puppeteer.launch({
        args: [`--proxy-server=${proxy}`],
    });

    const page = await browser.newPage()

    // Pass in our base64 encoded username and password
    await page.setExtraHTTPHeaders({
    'Proxy-Authorization': 'Basic ' + Buffer.from(`${username}:${password}`).toString('base64'),
});

    await page.goto('https://www.google.com');

    await browser.close();

It's important to note that your authorization details must be base64-encoded. This can be done with the Buffer class in Node.js.

Once again, this method only works for HTTP websites, not HTTPS websites.

Using a rotating proxy in Puppeteer

Rotating proxies are the type of proxy server that changes IP address dynamically at a set interval for each website visit.

Rotating proxies provide dynamic flexibility. Their IP address changes over time, making it difficult for a website to detect the exact location from which visitors are coming.

It also allows you to access a website for a longer duration in case there's a restriction on the number of visits per day, as you have multiple IPs getting rotated.

You need to have access to a list of proxies that are reliable.

Create your Puppeteer script following the sample code.

import puppeteer from "puppeteer";

const proxyUrls = [
    'http://my.proxy.com:3001',
    'http://my.proxy.com:85',
    'http://my.proxy.com:8080'
];

    const randomProxyUrl = proxyUrls[Math.floor(Math.random() * proxyUrls.length)];

    const browser = await puppeteer.launch({
        args: [`--proxy-server=${randomProxyUrl}`]
    });

    const page = await browser.newPage();

    await page.goto('<https://httpbin.org/ip>');

		//screenshot code, fullpage option
		await page.screenshot({ path: 'apify.jpeg', fullPage: true });

    await browser.close();

The code above is an illustration of how to use rotating proxies with Puppeteer.

const randomProxyUrl = proxyUrls[Math.floor(Math.random() * proxyUrls.length)];

The line of code above selects a random proxy URL from the proxyUrls array. The Math.random() function generates a random number between 0 and 1, which is then multiplied by the length of the proxyUrls array.

After the calculation, when you run the code, a random proxy URL is used to perform the request.

If the proxies require authentication, use the authenticate() method to provide the credentials for the login.

Common pitfalls to avoid with Puppeteer proxy

When setting up a proxy for use with Puppeteer, you may encounter several issues:

  • Invalid proxy URL: you might be using http to access a proxy that is meant to be accessed through socks5 which will make your request not to go through.
  • Authentication issues: Some proxy providers require you to authenticate or validate your own IP address where you will be making these requests; otherwise, all your requests will fail. Also, be sure the proxy you use requires no authentication, and if it does, provide the appropriate credentials.
  • Proxy server restrictions: You might be using a free proxy that has already been banned by the target website. The best solution is Apify Proxy, an HTTP proxy server endpoint that provides access to Apify’s large pool of proxies.
  • Your hard-coded proxy rotation didn't work: If a single IP is being repeated, you can write code to limit an IP to a particular number of requests or use a loop to rotate the proxies available.

Best proxy for Puppeteer

The following are the best proxy providers that are used with Puppeteer:

1. Apify: Provides web scraping and automation solutions that include proxy services. Apify Proxy smartly rotates between datacenter and residential proxies to get the best performance while saving you money.

2. Oxylabs: Offers residential and data center proxies, and it can be used with Puppeteer.

3. Webshare: Provides a proxy service that allows users to test a limited number of proxies before purchase, and it can be used for Puppeteer tasks.

4. Storm Proxies: Offers a variety of proxy types, including residential and data center proxies that can be used for Puppeteer.

Conclusion

Puppeteer is used for performing various tasks such as automating file uploads/downloads, taking screenshots, and generating PDFs. Integrating proxies with Puppeteer increases its capabilities, helping you bypass geographical restrictions and preventing detection or getting blocked from multiple visits to targeted websites.

This article has shown you several ways to set up and authenticate proxies in Puppeteer, including Apify Proxy. Check it out.

Ayodele Aransiola
Ayodele Aransiola
Ayodele is a Developer Relations engineer with experience in few other tech skills such as frontend, technical writing, early stage startup advisory, product management and consulting.

Get started now

Step up your web scraping and automation