Apify is thrilled to introduce Crawlee for Python, our latest addition to the Crawlee family, designed specifically for the Python community.
Crawlee has done really well with the Node.js community since we launched it in 2022, but we’ve regularly received a lot of requests to make it work nicely with Python. So here we go. Crawlee for Python is here, with a unified interface for HTTP and headless browser crawling, automatic retries, proxy rotation, session management, and more, all wrapped in a type-hinted package.
Why Crawlee for Python?
- Unified interface for HTTP and headless browser crawling.
- HTTP - HTTPX with Beautiful Soup.
- Headless browser - Playwright.
- Automatic parallel crawling based on available system resources.
- Written in Python with type hints - better DX (IDE autocompletion) and fewer bugs (static type checking).
- Automatic retries on errors or when you’re getting blocked.
- Integrated proxy rotation and session management.
- Configurable request routing - direct URLs to appropriate handlers.
- Persistent queue for URLs to crawl.
- Pluggable storage of both tabular data and files.
- Crawlee is built on Asyncio, so it’s fully asynchronous.
Join the Crawlee community
Head over to the Crawlee blog for more information about the launch of Crawlee for Python and don’t forget to star the repo on GitHub!
Crawlee for Python is open source, and we actively want developers to contribute, report issues, and help us improve. The best way to do this is by joining our Discord community.
You’ll be in good company with nearly 8,000 web scraping developers, and our team will be happy to help you get started with Crawlee for Python.
We’ve also launched Crawlee for Python on Product Hunt, so you can help us get the word out by upvoting it!