At Apify, our mission is to empower people to create great web scrapers using the best technologies possible and to run them in the cloud effortlessly. That's why we're thrilled to introduce our new Apify SDK for Python, allowing you to write Apify Actors in Python and tap into the wide range of libraries and tools in the Python ecosystem that make web scraping simple and efficient.
from apify import Actor
from bs4 import BeautifulSoup
import requests
async def main():
async with Actor:
input = await Actor.get_input()
response = requests.get(input['url'])
soup = BeautifulSoup(response.content, 'html.parser')
await Actor.push_data({
'url': input['url'],
'title': soup.title.string,
})
When combined with the Apify platform, Actors have access to a wide variety of features designed specifically to meet developers’ web scraping and automation needs. These include on-demand scaling of computing resources, run scheduling and monitoring, data center and residential proxies, as well as the ability to publish Actors in Apify Store and even monetize your code.
Whether you have a simple scraper using BeautifulSoup, a powerful web spider written with Scrapy, or you use Selenium or Playwright to automate browser interaction, the Apify SDK for Python will help you run your projects in the cloud at any scale.
Getting Started
Actors were designed with the purpose of being used together with the Apify platform. So, to unlock the full potential of Actors, let’s create one in Apify Console. This is a fairly straightforward process, and you will only need to sign up for a free Apify account to follow along.
Once you’re in Apify Console, and you go to Actors → Create New there, you’re presented with a choice of Actor templates:
We have predefined Actor templates for all the major web scraping libraries like Scrapy, BeautifulSoup, Playwright, and Selenium.
Once you create an Actor from your selected Actor template, you can edit its code to perform the scraping tasks you need, run the Actor, and, if you’re happy with it, integrate it with your existing data pipelines and schedule it to scrape data in regular intervals.
Creating Actors locally
If you want to create and run Apify Actors directly on your local computer so that you can, for example, track the source code in a version control system, you can do so using the Apify CLI, using the command apify create my-python-actor
.
When you execute that command, you’ll be presented with the same choice of templates as in Apify Console. Once you choose a template, an Actor will be created for you in the my-python-actor
directory, and all its requirements will be installed in a virtual environment in my-python-actor/.venv
. To run the actor, you can just run cd my-python-actor && apify run
.
When you run an Actor locally, its output is stored in the storage
folder. There, you can find the contents of the Actor’s default dataset, key-value store, and request queue.
To push the Actor to Apify Console and run it there, you can use the apify push
command, which will upload the actor’s source code to the Apify platform and build the actor there.
Get in touch
We’re excited to see what you will create with the Apify SDK for Python. If you find any issues, please report them in the SDK’s GitHub repository.
And don’t forget to join our developer community on Discord. We will be waiting for you there to hear your feedback and help you with any questions that might arise.