cURL is a powerful tool used for transferring data with URLs. This article will guide you through cURL in Python using three different approaches: the PycURL library, Python's subprocess module, and the Requests library. Each method has its advantages and applications, which we'll explore in detail.
What is cURL?
cURL stands for Client URL, a lightweight command line tool for making network requests from the client side. It's suitable for many use cases, like making HTTP requests and testing APIs. It provides support for using different network protocols such as HTTP, HTTPS, FTP, and many more.
Let's say we have different endpoints of an API, and we want to test those endpoints. Rather than using web-based API calls, we can just use cURL to hit those endpoints and test our APIs.
These methods will allow you to make network requests in Python effectively.
Comparing different approaches to using cURL in Python
Method
Description
Use Case
PycURL
A Python interface to libcurl, the library cURL is based on.
When you need low-level control over your requests.
subprocess
Runs cURL commands as subprocesses.
When you prefer using cURL directly within Python.
Requests
A high-level library for making HTTP requests in Python.
When you need a simple and intuitive interface.
Each method has its own strengths: PycURL offers detailed control, subprocess allows direct use of cURL, and Requests provides an easy-to-use API.
How to use cURL in Python with PycURL
1. Setting up the environment for cURL requests in Python
Before making cURL requests in Python, make sure you have Python and pycURL installed.
You can easily download Python from the official Python website. Once you've downloaded the Python installer, you can install Python and check the version by entering the following command on the command line.
python3 --version
You'll see the version of Python that's installed on your system.
📔
Note: We're using python3 here because we've downloaded the latest version, which uses the syntax of python3 instead of simple python.
2. Installing the PycURL library
Now that you have Python installed on your system, you can install pycurl using Python's package manager pip3.
pip3 install pycurl
Once the command is executed, confirm the installation by simply running the following command:
pip3 show pycurl
This should print the name of the library with other information. Once you're done with the installation, you're ready to make your requests using PycURL.
📔
Note: Be careful while installing packages through pip. If you're using Python 3 or above, you need to use pip3 with that. Otherwise, things might go wrong, like version issues.
3. How to make a GET request with PycURL
Before making GET requests, let's first see how a basic GET request works. When we make a GET request, we basically ask a server to give us a specific resource. The resource could be a file, an HTML page, a JSON object, or other data. It's called a GET request because it gets a resource from the server.
You can make GET requests with the pycurl library by following a series of steps. Don't worry. We'll go through everything in detail.
Let's see the code first.
import pycurl
from io import BytesIO
# Create a Curl object
curl = pycurl.Curl()
# Create a BytesIO object to store the response
responseBuffer = BytesIO()
# Set the URL
curl.setopt(curl.URL, 'https://www.google.com')
# Set the option to write the response to the buffer
curl.setopt(curl.WRITEDATA, responseBuffer)
# Make the request
curl.perform()
# Fetch the response body
response = responseBuffer.getvalue()
# Print the response
print(response.decode('utf-8'))
# Close the Curl object
curl.close()
# Close the buffer
responseBuffer.close()
In the code above, we first create a curl object using pycurl.Curl().
Then we set the URL we want to fetch using curl.setopt(curl.URL, 'https://www.google.com').
We can also set other options using curl.setopt(), such as storing the response data in the responseBuffer object.
We then perform the request using curl.perform(). This sends the HTTP request to the URL and retrieves the response.
We can get the response body using curl.getvalue(), which returns a bytes object.
We decode this response using response.decode('utf-8') to convert it to a string.
Finally, we close the curl and buffer objects using curl.close() and responseBuffer.close() respectively.
While executing this code on MacOS, you may encounter an error due to different versions of curl or ssl. You can try the following solution:
You may be thinking: why is the syntax so confusing, and why are we using so many things to make just one request?
The PycURL library provides a low-level interface to cURL, giving us more control and flexibility. As we get closer to any computer system, we attain more control, but the syntax gets harder and less readable for humans.
The steps we saw above, like importing BytesIO, creating objects, and decoding responses are necessary because PycURL operates at a lower level and provides us with direct access to the raw HTTP response.
4. How to make a POST request with PycURL
The POST request is similar to the GET request. The only difference is that we just add the data we want to send through the request. But the data is encoded first. Let's see how that works.
import pycurl
# Import the urllib for encoding
import urllib.parse
from io import BytesIO
# Create a Curl object
curl = pycurl.Curl()
responseBuffer = BytesIO()
# Set the URL
curl.setopt(curl.URL, 'http://httpbin.org/post')
# Set the method to POST
curl.setopt(curl.POST, True)
# Data
data = {'name': 'John', 'age': '30'}
# Encode the data
dataString = urllib.parse.urlencode(data)
# Add data as POSTFIELDS
curl.setopt(curl.POSTFIELDS, dataString)
# Set the variable that will store data
curl.setopt(curl.WRITEDATA, responseBuffer)
# Make the request
curl.perform()
# Get the responseCode
responseCode = curl.getinfo(curl.RESPONSE_CODE)
print('Response Code:', responseCode)
# Get the responseBody
responseBody = responseBuffer.getvalue()
print('Response Body:', responseBody.decode('utf-8'))
# Close the object
curl.close()
# Close the buffer
responseBuffer.close()
We first create a curl object, and then we set the URL.
We set the request method to POST using curl.setopt(curl.POST, True).
After that, we set the data using a dictionary and encode it using urllib.parse.urlencode() method.
We set the encoded data as the request body using curl.setopt(curl.POSTFIELDS, dataString).
After performing the request, we print the response code and the request body.
Finally, we close the curl and buffer objects.
We've covered some basic concepts of the PycURL library and how it works. Now, let's cover some more advanced topics, including adding custom headers, handling redirects, authenticating requests, and handling errors.
How to add headers in the cURL request using Python
We can send additional information about the requests using the headers, like user agent, content type, or authorization credentials. As we have already mentioned that the .setopt() of PycURL allows us to add additional information about the requests, so we will use this method to add headers as well. Let's see how it's done.
import pycurl
from io import BytesIO
# Create a new Curl object
curl = pycurl.Curl()
# Create a BytesIO object
responseBuffer = BytesIO()
# Set the URL to fetch
curl.setopt(curl.URL, 'https://httpbin.org/headers')
# Set custom headers
customHeaders = ['Authorization: Bearer mytoken', 'User-Agent: MyCustomAgent']
curl.setopt(curl.HTTPHEADER, customHeaders)
# Set the WRITEFUNCTION option to redirect the response body to the responseBuffer
curl.setopt(curl.WRITEFUNCTION, responseBuffer.write)
# Make the request
curl.perform()
# Get the HTTP response code
responseCode = curl.getinfo(curl.RESPONSE_CODE)
print('Response Code:', responseCode)
# Get the response body from the response_buffer
responseBody = responseBuffer.getvalue()
# Print the body of the response
print('Response Body:', responseBody.decode('utf-8'))
# Close the Curl object and the response_buffer
curl.close()
responseBuffer.close()
We set two custom headers (Accept-Language and User-Agent) by setting the curl.HTTPHEADER option to a list of strings containing the header names and values. We then perform the request and retrieve the response body as before.
How to handle redirects in cURL requests with Python
An HTTP redirect is a way to tell a browser to request a different URL instead of the one originally requested. In simple words, when a user tries to access a specific URL, the server automatically takes the user to an alternative URL.
This happens when the owner of the website changes the URL or wants the user to see a better version. In the case of alternative URLs, the website is accessible through both the new and old URLs.
pycurl does not automatically follow the URL redirects. We can configure pycurl to follow these redirects using the curl.setopt() method.
import pycurl
from io import BytesIO
# Create a new Curl object
curl = pycurl.Curl()
# Create a BytesIO object
responseBuffer = BytesIO()
# Set the URL to fetch and follow redirects
curl.setopt(curl.URL, 'http://httpbin.org/redirect-to?url=https%3A%2F%2Fwww.google.com')
curl.setopt(curl.FOLLOWLOCATION, 2)
# Write the data in the responseBuffer
curl.setopt(curl.WRITEDATA, responseBuffer)
# Perform the request
curl.perform()
# Get the response body
response = responseBuffer.getvalue()
# Print the response
print(response.decode('utf-8'))
# Close the Curl object
curl.close()
# Close the buffer
responseBuffer.close()
We set the curl.FOLLOWLOCATION option to True to instruct cURL to follow redirects. We also set the curl.REDIRECT_LIMIT option to 2, which limits the number of redirects that it will follow.
How to handle cookies in cURL requests with Python
Cookies are small pieces of text that are used to save a user's state or information about the user. Pycurl provides us with an easy way to manage them. We just need to set a text file using the setopt() method and utilize two methods, COOKIEJAR and COOKIEFILE.
COOKIEFILE tells the pycurl to read cookies from a file before making a request.
If you use COOKIEJAR,pycurl will save the cookies to a file received in the response.
import pycurl
from io import BytesIO
# Create a Curl object
curl = pycurl.Curl()
# Create a buffer object
responseBuffer = BytesIO()
# Set the URL
curl.setopt(curl.URL, 'https://stackoverflow.com/')
# Set the buffer to receive data
curl.setopt(curl.WRITEDATA, responseBuffer)
# Save cookies to a file
curl.setopt(curl.COOKIEJAR, 'cookies.txt')
# Load cookies from a file
curl.setopt(curl.COOKIEFILE, 'cookies.txt')
# Perform the request
curl.perform()
# Read cookies
cookies = curl.getinfo(pycurl.INFO_COOKIELIST)
# Print the cookies
print("Cookies are:")
for cookie in cookies:
print(cookie)
# Close the Curl object
curl.close()
# Close the buffer
buffer.close()
In this example, we are specifying a file through setopt() . If the file is not created, it will create a new file and save cookies into it. We get the cookies using the pycurl.INFO_COOKIELIST method and print them.
While working with websites and requests, we may encounter some websites that require authentication and credentials to use the content. PycURL provides a way to handle this situation as well through the .setopt() method. We add the credentials through encoding and make the POST request.
Let’s see the code in detail.
import pycurl
from io import BytesIO
from urllib.parse import urlencode
# Create a new Curl object
curl = pycurl.Curl()
# Create a buffer object
responseBuffer = BytesIO()
# Enable cookie handling
curl.setopt(curl.COOKIEJAR, 'cookies.txt')
curl.setopt(curl.COOKIEFILE, 'cookies.txt')
# Set the login URL
curl.setopt(curl.URL, 'https://newsapi.org/login')
# Set the request method to POST
curl.setopt(curl.POST, 1)
# Add the data
postData = {'email': 'yourEmail', 'password': 'yourPassword'}
# Encode the data
postfields = urlencode(postData)
# Add the post fields
curl.setopt(curl.POSTFIELDS, postfields)
# Add the buffer variable
curl.setopt(curl.WRITEDATA, responseBuffer)
# Perform the login request
curl.perform()
# Get the HTTP response code
responseCode = curl.getinfo(curl.RESPONSE_CODE)
print('Response Code:', responseCode)
# Clear the response buffer
responseBuffer.truncate(0)
responseBuffer.seek(0)
# Set the URL to the home page
curl.setopt(curl.URL, 'https://newsapi.org/account')
# Make the request to the home page
curl.perform()
# Fetch the response body
responseBody = responseBuffer.getvalue()
# Print the body of the response
print('Response Body:', responseBody.decode('utf-8'))
# Close the Curl object, and the responseBuffer
curl.close()
responseBuffer.close()
In the code above, we're using an extra module urlencode for encoding the credentials. Then, we have cookies to store the session, and we are saving them in a file.
This curl.setopt(curl.POSTFIELDS, postfields) line of code is adding the credentials to the required fields.
After that, we make the POST request and get the response that would be 200.
After clearing the responseBuffer, we are again making a GET request to get the home page of the account.
While writing code, you might not always be sure whether a piece of code will work. For example, when making a request or reading a file, it's possible that the file isn't available. At that point, we get an error.
To handle such situations, we use tryexcept blocks in pycurl.
Here's an example of using a tryexcept block to perform a curl request.
import pycurl
# Create a new Curl object
curl = pycurl.Curl()
# Set the URL
curl.setopt(curl.URL, 'https://www.googlecom')
try:
# Perform the request
curl.perform()
except pycurl.error as error:
# Handle the error
errorNumber, errorString = error.args
print('Error: %s %s' % (errorNumber, errorString))
# Close the Curl object
curl.close()
In this example, we have intentionally set an invalid URL to demonstrate how to handle errors.
We've used a tryexcept block to catch any pycurl.error exceptions that may be raised during the request.
We then extract the error number and error string from the exception using error.args and print an error message.
We've covered some pretty advanced stuff, but let's go a step further. In this next section, we'll cover some even more advanced topics in cURL requests with Python, including performing file uploads and working with SSL/TLS certificates.
How to perform file uploads in cURL requests with Python
We may need to upload a file along with our HTTP request, such as when working with file storage or API endpoints that accept file uploads. PycURL provides us with an easy way to upload files using the same method setopt().
We just need to set our request to POST and specify the path of the file with its type, and we're good to go.
Let's see how easy this process is.
import pycurl
from io import BytesIO
# Create a new Curl object
curl = pycurl.Curl()
# Create a buffer
responseBuffer = BytesIO()
# Set the URL for the file upload
curl.setopt(curl.URL, 'https://httpbin.org/post')
# Set the file to be uploaded and other options
curl.setopt(curl.HTTPPOST, [('file', (curl.FORM_FILE, '/content/cookies.txt'))])
# Specify the buffer to receive response
curl.setopt(curl.WRITEDATA, responseBuffer)
# Apply the try block
try:
# Perform the request
curl.perform()
except pycurl.error as error:
# Handle the error
errorNumber, errorString = error.args
print('Error: %s %s' % (errorNumber, errorString))
# Close the Curl object
curl.close()
# Decode and print the response
response = responseBuffer.getvalue().decode('utf-8')
print(response)
# Close the buffer
responseBuffer.close()
Our code is mostly the same, and we've already explained the code pretty extensively in the previous examples.
The line that requires explanation here is just 13, where we're specifying the method to POST and giving the file name.
curl.HTTPPOST sets the method the HTTP method to POST and the next argument is a list that includes information about the file.
file is the name of the form field on the server side that will receive the uploaded file.
(curl.FORM_FILE, '/content/cookies.txt') is a tuple that specifies the type of the form field and the file to be uploaded.
These methods make the connections between the browser and server by encrypting the data transferred between them. These certificates are issued by an organization called a Certificate Authority (CA) to websites that ensure that a website is owned by an organization and is trustworthy.
When we visit a website with an SSL/TLS, our web browser checks the SSL/TLS certificate that acts as a digital stamp of approval that verifies the authenticity and trustworthiness of the website.
As certificates change over time, PycURL doesn't provide such support.
We can also add certificates from our local directory, but if we don't have one, we can use the certifi Python package.
Before using this package, let’s install it with following command.
pip3 install certifi
Now, we can use this package in our code.
import pycurl
import certifi
# Create a Curl object
curl = pycurl.Curl()
# Set the URL
curl.setopt(curl.URL, 'https://blog.apify.com/')
# Check the CA certificates through certifi
curl.setopt(curl.CAINFO, certifi.where())
# Perform the request
curl.perform()
# Retrieve the response code
responseCode = curl.getinfo(curl.RESPONSE_CODE)
print(f'Response code: {responseCode}')
# Close the Curl object
curl.close()
In the code above, we're using an additional package that confirms the certificate of any website.
The curl.CAINFO sets the Curl to check for a CA certificate of a website and certifi.where() extracts the path to the default certificates provided by the package.
We've covered the basics and some advanced concepts of cURL combined with the simplicity and ease of use of Python. This combination allows you to perform complex web operations with fewer lines of code and provides better control over HTTP requests.
Let’s put everything we’ve learned so far together and look at a script that covers most aspects.
Install by entering the command pip installbeautifulsoup4 and run the script below.
import pycurl
import certifi
from io import BytesIO
# Import BeautifulSoup
from bs4 import BeautifulSoup
# Make a new curl object
curl = pycurl.Curl()
# Make a new Buffer
responseBuffer = BytesIO()
# Set the URL
curl.setopt(curl.URL, "https://blog.apify.com/")
# Check the certificates
curl.setopt(curl.CAINFO, certifi.where())
# Allow redirections
curl.setopt(curl.FOLLOWLOCATION, 3)
# Save the response data
curl.setopt(curl.WRITEDATA, responseBuffer)
# Make the request
curl.perform()
# Decode the response
htmlResponse = responseBuffer.getvalue().decode('utf-8')
# Add the information to the parser
soup = BeautifulSoup(htmlResponse, 'html.parser')
# Extract articles
articles = soup.find_all('div', class_='post-info-wrap') # Use class_ instead of class
# Loop through all the articles
for article in articles:
title = article.find('h2', class_='post-title').text.strip() # Use class_ instead of class
author = article.find('a', class_='post-author').text.strip() # Use class_ instead of class
print("Title:", title)
print("Author:", author)
print("-" * 25)
curl.close()
responseBuffer.close()
In this script, we're making a GET request to the Apify Blog and using setopt() to check the SSL/TLS certificate and apply redirection.
In the end, we're using Beautiful Soup to parse the HTML response.
First, we retrieve posts on the website, and then we extract the Title and Author of each post.
How to use cURL in Python with subprocessor
We've covered PyCURL. Now for a much more concise explanation of the second method of using cURL in Python: with Python's subprocesser module.
Using the subprocess module allows you to run cURL commands directly within your Python code. This is quick and straightforward for those familiar with the cURL command line.
This approach is useful when you want to use existing cURL commands in a Python script without translating them into another library's syntax.
How to use cURL in Python with Requests
Finally, the third way to use cURL in Python: Requests.
The Requests library is a popular, high-level library for making HTTP requests in Python. Its advantage over the previous methods is that it provides a simple and intuitive interface.
Here are a few examples of using cURL with the Requests library.
cURL in Python refers to using Python libraries or modules to make network requests similar to the command-line tool cURL. This can be achieved using libraries such as PycURL, subprocess, or Requests, which provide various levels of control and simplicity.
Can you use cURL in Python?
Yes, you can use cURL in Python through several methods. PycURL, subprocess, and the Requests library are popular ways to execute cURL commands or perform equivalent network requests in Python, each offering different features and levels of abstraction.
What is the Python equivalent of cURL?
Python equivalents of cURL are PyCURL and the Requests library. Requests offers a simple and user-friendly API for making HTTP requests, similar to cURL. For more complex or specific needs, PycURL provides a direct interface to the cURL library.
What is the difference between Python Requests and cURL?
Python Requests is a high-level library for making HTTP requests, known for its simplicity and ease of use. cURL is a command-line tool and library for transferring data with URLs, offering more low-level control and features. Requests abstracts many complexities of cURL.
Conclusion
We've covered using cURL in Python in three different ways:
PycURL
Python's subprocess module
Requests
Use the PycURL library if you want detailed control over requests.
Use the subprocess module for direct use of cURL commands.
Use the Requests library if you want an intuitive, high-level interface.
I used to write books. Then I took an arrow in the knee. Now I'm a technical content marketer, crafting tutorials for developers and conversion-focused content for SaaS.