Introducing proxies
Using proxies can come in very handy when you're browsing the internet. Whether you're trying to access content that's not available in your country, stay as anonymous as possible, or scrape some data without getting blocked by the website you're scraping, proxies are here to help you.
But what exactly is a proxy (also known as a proxy server)? In simple terms, it’s a server that's sitting between your computer and the internet. It’s like a middleman that handles requests and responses between you and the websites you visit.
Since it's essentially a server like any other, a proxy server has its own IP address, different from your computer. For that reason, using a proxy as an intermediary when accessing the internet will grant you anonymity since it's hiding your IP address. This will make your web scraping actions harder to trace and block.
In the following sections, you’ll learn more about how to use proxy servers with Axios, one of the most popular HTTP clients for Node.js, so you can scrape the web more effectively and efficiently.
How to make a simple request with Axios?
Now, let’s talk about Axios for a minute. It’s one of the most popular promise-based HTTP clients for Node.js, meaning that it's used for making HTTP requests to REST endpoints.
Before you start using it, you'll just need to add the Axios package to your project using npm
:
npm install axios
Prerequisites: Before you install Axios, you should already have your Node.js project initialized.
Axios is fairly simple to use and supports a variety of HTTP requests out of the box, including GET, POST, PUT, DELETE (and more), making it pretty common in many web development projects.
However, in the world of web scraping, GET requests are the most commonly used. A GET request is used to retrieve data from a server. It’s like saying, “Hey, can I get that information?” to a website.
To illustrate this, let's assume you want to find out some random fact about cats. And, conveniently, there's an API endpoint just for that - https://catfact.ninja/fact
. It gives you a random fact each time you send it a GET request.
The first thing you'll want to do is to import the Axios module into your project:
const axios = require("axios");
After this, all you have to do is make the actual GET request to the API endpoint using axios.get()
method:
axios.get("https://jsonplaceholder.typicode.com/users/1")
.then((result) => console.log(result.data))
.catch((error) => console.log(error));
Assuming you're writing the code in the file called, say, index.js
, you can simply run the code using:
node index.js
This will run the script you've just created. If the request is successful, it will log you one fact to the console:
{ fact: 'Cats can jump up to 7 times their tail length.', length: 46 }
And if there’s an error (like the endpoint doesn’t exist, or we lose our internet connection), we catch the error and log an error message.
Why use proxies with Axios?
Now, here’s where things get interesting - you can use proxy servers with Axios. This means you can make your HTTP requests through a proxy server. This adds an extra layer of anonymity to your web scraping activities.
First of all, let's address the obvious - why should you use proxies with Axios in the first place, especially for web scraping? Well, when you’re scraping a website, you’re making numerous requests to the same server in a short amount of time. This can lead to your IP address being blocked by the website’s server because it might consider these requests as a potential threat.
This is where a proxy server comes into play. By routing your requests through different proxy servers, you can mask your original IP address and make it appear as if the requests are coming from different locations. This can significantly reduce the chances of your IP address being blocked.
Moreover, some websites have geo-restrictions in place, meaning they restrict access to their content based on the user’s geographical location. Proxy servers can help bypass these restrictions by making it appear as if the requests are coming from a permitted location.
How to use a proxy with Axios?
Using a proxy with Axios is straightforward - you just need to add a proxy
configuration to your usual request.
To illustrate this, let's quickly switch to another API - http://ip-api.com/json/
. When you send a GET request to this API endpoint, it will send you the information about your IP address and geolocation. This enables you to see the difference between requests sent with and without proxy.
Now, send a simple GET request to the http://ip-api.com/json/
API:
const axios = require('axios');
axios.get("http://ip-api.com/json/")
.then(response => {
console.log(response.data);
})
.catch(error => {
console.error('Error fetching data: ', error);
});
Sending this request will result in a response containing a somewhat detailed report about your IP, location, ISP, and more:
{
status: 'success',
country: 'Your-Country',
countryCode: 'Your-Country-Code',
region: 'Your-Region-Code',
regionName: 'Your-Region-Name',
city: 'Your-City',
zip: 'Your-ZIP-Code',
lat: Your-Latitude,
lon: Your-Longitude,
timezone: 'Your-Timezone',
isp: 'Your-ISP-Name',
org: '',
as: 'Your-ISP-As',
query: 'Your-IP-Address'
}
Now, let's introduce a proxy configuration here - the next request you'll send will be via a free proxy with an IP address of 13.37.235.66
on port 4000
. To achieve this, you'll just need to pass the information about the proxy server and the request type to the axios.get()
function you've already used in the previous example:
const axios = require('axios');
axios.get("http://ip-api.com/json/", {
proxy: {
protocol: 'http',
// proxy IP address
host: '13.37.235.66',
// proxy port
port: 4000
}
})
.then(response => {
console.log(response.data);
})
.catch(error => {
console.error('Error fetching data: ', error);
});
Sending the GET request via this proxy will result in a response that contains information about the location of the proxy server you used (not about your location):
{
status: 'success',
continent: 'Europe',
continentCode: 'EU',
country: 'France',
countryCode: 'FR',
region: 'NY',
regionName: 'Île-de-Franc',
city: 'Paris',
district: '',
zip: '75000',
lat: 40.674,
lon: -73.9701,
timezone: 'Europe/Paris',
offset: -14400,
currency: 'USD',
isp: 'Amazon.com',
org: 'AS16509 Amazon.com, Inc.',
as: 'AS16509 Amazon.com, Inc.',
asname: 'Amazon.com',
reverse: 'ec2-13-37-235-66.eu-west-3.compute.amazonaws.com',
mobile: false,
proxy: true,
hosting: true,
query: '13.37.235.66'
}
Here, you can see that the proxy server you used is Amazon's server located in Paris, France, and it's essentially hiding your actual location, as we discussed earlier in this article.
In this example, you've sent an HTTP GET request, and Axios supports both HTTP and HTTPS proxies out of the box. However, other types of proxies like SOCKS or PAC are not supported directly. But worry not! We’ll cover how to use these types of proxies with Axios later in the article.
How to set up a proxy with environment variables?
Setting up a proxy directly within the axios.get()
function is a viable solution for small projects with a few developers. But, when you have multiple developers working on the same project, you would probably want to manage proxy configurations in a more systematic manner.
Each developer should only be concerned about the logic they're working on, not about the proxy configuration - not every developer should configure a proxy each time they need to send an Axios request.
Instead, the proxy should be configured once and just used whenever there's a need to do so. That way, you're removing the unnecessary overhead from other developers by providing them only the information they actually need.
That's where the environment variables come into play - they can be a convenient way to manage your proxy settings.
First, you need to set the environment variables. You can do this in your terminal or command prompt:
export HTTP_PROXY=http://13.37.235.66:4000
Here, 13.37.235.66
is the IP address of your proxy server, and 4000
is the port number.
Then, you can run your Node.js script as usual without any proxy configurations:
const axios = require('axios');
axios.get("http://ip-api.com/json/")
.then(response => {
console.log(response.data);
})
.catch(error => {
console.error('Error fetching data: ', error);
});
Now, the GET request is routed through the proxy server specified by the HTTP_PROXY
environment variables, even though there's no explicit proxy configuration in the axios.get()
function. Running this script will return the same information about the proxy server as in the previous example (since the same proxy server was used).
How to use a proxy with authentication in Axios?
If your proxy server requires authentication, you can provide your username and password in the proxy
configuration of your Axios request:
const axios = require('axios');
axios.get("http://ip-api.com/json/", {
proxy: {
protocol: 'http',
// proxy IP address
host: '13.37.235.66',
// proxy port
port: 4000,
auth: {
username: 'your-username',
password: 'your-password'
}
}
})
.then(response => {
console.log(response.data);
})
.catch(error => {
console.error('Error fetching data: ', error);
});
your-username
and your-password
are your username and password for the proxy server. The GET request is routed through the proxy server at the IP address 13.37.235.66
and port 4000
, and the server authenticates the request using the provided username and password.
Also, be careful not to expose your username and password in your code or version control system. To avoid that, you could consider using environment variables to store these sensitive details:
export HTTP_PROXY=http://username:password@13.37.235.66:4000
How to implement rotating proxies in Axios?
Using a proxy to avoid being blocked by the web server you're trying to scrape, but this technique only goes so far since proxies can also get blocked (in the same fashion as your usual IP address).
To help you bypass that, you can implement rotating proxies since they allow you to distribute your requests over multiple IP addresses. That way, you're effectively reducing the risk of being blocked by a server due to too many requests from a single IP address.
The first thing you'd want to do to implement rotating proxies is to create an array of proxies you'll want to use:
// An array of proxies
const proxies = [
{ host: '123.45.67.89', port: 8080 },
{ host: '98.76.54.32', port: 8080 },
// Add more proxies as needed
];
Then, you'd implement a function that chooses a random proxy configuration you'll use to make a request:
// A function to get a random proxy
function getRandomProxy() {
return proxies[Math.floor(Math.random() * proxies.length)];
}
Finally, instead of manually writing a proxy configuration in the axios.get()
function, all you have to do is to call the getRandomProxy()
function:
const axios = require('axios');
// An array of proxies
const proxies = [
{ protocol: 'http', host: '123.45.67.89', port: 8080 },
{ protocol: 'http', host: '98.76.54.32', port: 8080 },
// Add more proxies as needed
];
// A function to get a random proxy
function getRandomProxy(listOfProxies) {
return listOfProxies[Math.floor(Math.random() * listOfProxies.length)];
}
// Make a request with a random proxy
axios.get("http://ip-api.com/json/", {
// Instead of manually writing proxy configuration
proxy: getRandomProxy(proxy)
})
.then(response => {
console.log(response.data);
})
.catch(error => {
console.error('Error fetching data: ', error);
});
Remember, this is a basic example. In a real-world scenario, you might want to add error handling to switch proxies if a request fails or use a more sophisticated method to select proxies.
How to use SOCKS proxy with Axios?
Axios doesn’t support SOCKS proxies out of the box. However, you can use the socks-proxy-agent
package to create an agent that supports SOCKS proxies and then use that agent in your Axios requests.
To use the socks-proxy-agent
package in your project, simply install it using npm
:
npm install socks-proxy-agent
Now, before you can create a SOCKS agent, you'll need to import its implementation from the socks-proxy-agent library:
const SocksProxyAgent = require('socks-proxy-agent');
Then, you can actually create a SOCKS proxy agent:
// Create a SOCKS proxy agent
const proxyAgent = new SocksProxyAgent('socks://123.45.67.89:8080');
Here, 123.45.67.89
is the IP address of your SOCKS proxy server, and 8080
is the port number.
Now, you can put it all together and use the agent you've created in the axios.get()
function to make an Axios request routed through a SOCKS proxy:
const axios = require('axios');
const SocksProxyAgent = require('socks-proxy-agent');
// Create a SOCKS proxy agent
const proxyAgent = new SocksProxyAgent('socks://123.45.67.89:8080');
// Make a request with the SOCKS proxy agent
axios.get("https://api.example.com/data", {
httpsAgent: proxyAgent
})
.then(response => {
console.log(response.data);
})
.catch(error => {
console.error('Error fetching data: ', error);
});
The GET request is routed through the SOCKS proxy server using the httpsAgent
option in the Axios request configuration.
How to use PAC proxy with Axios?
A PAC (Proxy Auto-Configuration) is a file that consists of a single JavaScript function - FindProxyForURL(url, host)
. The return value of that function is used to determine whether the requests you're sending (HTTP, HTTPS) are passed directly to the destination server or are routed through a proxy server.
Axios doesn’t support PAC proxies out of the box. But, you can use the pac-proxy-agent
package to create an agent that supports PAC proxies, similar to the SOCKS agents you discovered in the previous section.
Therefore, you should install the pac-proxy-agent
package via npm
before you use it in your project:
npm install pac-proxy-agent
Now, you can import the implementation of the PAC agents from the package:
const PacProxyAgent = require('pac-proxy-agent');
This enables you to create a new PAC proxy agent:
// Create a PAC proxy agent
const pacFile = "http://example.com/proxy.pac";
const proxyAgent = new PacProxyAgent(pacFile);
Here, http://example.com/proxy.pac
is the placeholder URL of your PAC file. Remember to replace it with your actual PAC file URL.
Now, you can route the Axios GET request according to the rules defined in your PAC file using the httpsAgent
option in the Axios request configuration:
const axios = require('axios');
const PacProxyAgent = require('pac-proxy-agent');
// Create a PAC proxy agent
const pacFile = "http://example.com/proxy.pac";
const proxyAgent = new PacProxyAgent(pacFile);
// Make a request with the PAC proxy agent
axios.get("https://api.example.com/data", {
httpsAgent: proxyAgent
})
.then(response => {
console.log(response.data);
})
.catch(error => {
console.error('Error fetching data: ', error);
});
What you've learned about Axios and proxies
After reading this article, you should have a comprehensive overview of multiple use cases and techniques used to route your Axios requests through a vast majority of proxy servers.
You now understand what proxies are, what types you can use, and - most importantly - how to implement each of those proxy types in Axios.
{
status: 'success',
continent: 'Europe',
continentCode: 'EU',
country: 'France',
countryCode: 'FR',
region: 'NY',
regionName: 'Île-de-Franc',
city: 'Paris',
district: '',
zip: '75000',
lat: 40.674,
lon: -73.9701,
timezone: 'Europe/Paris',
offset: -14400,
currency: 'USD',
isp: 'Amazon.com',
org: 'AS16509 Amazon.com, Inc.',
as: 'AS16509 Amazon.com, Inc.',
asname: 'Amazon.com',
reverse: 'ec2-13-37-235-66.eu-west-3.compute.amazonaws.com',
mobile: false,
proxy: true,
hosting: true,
query: '13.37.235.66'
}