There are lots of web scraping services out there, but which is the right choice for you? We look at ScrapingBee to see what it offers the dev looking to get data.
Whether you're building an application, conducting market research, or analyzing trends, accessing timely and accurate data is essential. However, identifying the most efficient and reliable methods for obtaining this data can be a daunting task. Should you build your own web scrapers? Use an existing web scraping API? Or go for something in between?
If you've spent some time googling around for an answer to those questions, then you've probably come across ScrapingBee. But now a different question emerges. How do I know if this service is right for my use case? Well, that’s precisely what we will try to answer in this article. We will review ScrapingBee’s service and analyze the different kinds of tools that they provide, and the pros and cons of using the service.
So, let’s get started and see if ScrapingBee is worth using for your web scraping project.
ScrapingBee: what are the pros and cons?
Benefits: user-friendly web scraping API
ScrapingBee provides a user-friendly web scraping API that offers various features required for large-scale web scraping and to prevent getting blocked, including proxies and JavaScript rendering. It is recommended for developers seeking a simple solution for extracting data, which can be seamlessly integrated with their existing code for data processing.
Limitations: limited control and no integrated cloud solution
ScrapingBee's straightforward approach may be limiting for developers with advanced web scraping knowledge, as they are required to follow the rules set by ScrapingBee's API and have restricted control over the entire data extraction process.
Additionally, ScrapingBee lacks an integrated solution for managing data extraction flows in the cloud. This can be inconvenient since you would need to find a separate cloud provider or set up your own infrastructure.
ScrapingBee Proxy and API credit consumption
When it comes to large-scale data extraction, proxies are essential for circumventing anti-bot systems used by modern websites. However, utilizing proxies can significantly increase the cost of your web scraping activities. ScrapingBee's API provides several proxy options: Rotating Proxy (default), Premium Proxy, Stealth Proxy, or the ability to use your own proxy. Here is an overview of how the usage of these proxies impacts your API Credit consumption within their system:
Feature used
API credit cost/request
Rotating Proxy without JavaScript rendering
1
Rotating Proxy with JavaScript rendering (default)
5
Premium Proxy without JavaScript rendering
10
Premium Proxy with JavaScript rendering
25
Stealth Proxy with JavaScript rendering (only option available)
75
ScrapingBee pricing
The pricing of a service often plays a crucial role in our decision-making process. Fortunately, ScrapingBee provides a freemium model that allows users to try their service for free with 1,000 API credits. Their paid plans range from $49/month to $599+/month for the business plan. The key distinction between these plans is the allocation of API credits, with the base plan offering 150,000 credits and the business plans providing 8,000,000+ credits, depending on your needs. Additionally, the more expensive plans offer higher limits for concurrent requests and improved support.
Browse hundreds of ready-made scrapers on Apify Store. Get started for FREE.
ScrapingBee offers a versatile data extraction API as one of its primary services, allowing users to extract data from a wide range of web pages. To evaluate its capabilities, I decided to scrape Amazon, a well-known website notorious for implementing sophisticated anti-bot systems.
Navigating through ScrapingBee's API was straightforward, and the ScrapingBee documentation provided clear and updated information. With just a few lines of code, as shown in the example below, I successfully extracted the titles, prices, and links of the iPhones listed on the first page of Amazon.com:
In this specific request, using ScrapingBee's API with the default configurations (Rotating Proxy and JavaScript rendering), I was charged 5 API credits. Despite making multiple requests to Amazon.com, I did not encounter any blocking issues when using the API's default settings, which is a good sign about the service’s reliability.
However, as our operation scales up, it is reasonable to assume that we would require more reliable and costly proxies to sustain this level of performance. So, let's see how we can enable different proxy options using ScrapingBee's API.
Enabling proxies in ScrapingBee is straightforward. To use a specific proxy type, you just need to include the corresponding parameter and set it to "True". For instance, to utilize the Premium Proxy, you would add "premium_proxy=True" to your response parameters, as shown below:
from scrapingbee import ScrapingBeeClient # Importing SPB's client
client = ScrapingBeeClient(api_key='YOUR_API_KEY')
response = client.get("<https://www.amazon.com/s?k=iphone&crid=1BIGRK4NGFLDS&sprefix=ipho%2Caps%2C278&ref=nb_sb_noss_2>", params={
# Choose the proxy type you want by adding the premium_proxy, stealth_proxy or own_proxy parameters
'premium_proxy': 'True',
'extract_rules':{
"product-titles": {
"selector": "div.a-section.a-spacing-none.puis-padding-right-small.s-title-instructions-style > h2 > a > span",
"type": "list",
},
"product-prices": {
"selector": "div.a-section.a-spacing-none.a-spacing-top-micro.puis-price-instructions-style > div > a > span > span.a-offscreen",
"type": "list",
},
"product-links": {
"selector": "div.a-section.a-spacing-none.puis-padding-right-small.s-title-instructions-style > h2 > a",
"type": "list",
"output": "@href"
},
}
})
if response.ok:
print(response.json())
It's worth mentioning that enabling this option can enhance the reliability of our data extraction process by reducing the risk of our bot being blocked. However, it's important to note that this improvement comes at a higher cost per request.
For instance, in my case, using the Premium Proxy and JavaScript rendering for this request consumed 25 credits, which is a fivefold increase compared to the 5 credits spent when using the default Proxy rotation configuration.
Although I was pleasantly surprised by the ease of extracting the desired data and the low incidence of blocked requests, I found it frustrating that the API had limitations when it came to more complex operations. For instance, if I were building my own scraper, I could easily handle Amazon's pagination and extract data from all the search results while maintaining complete control over the scraper's behavior. However, achieving a similar outcome using ScrapingBee's API was not immediately apparent, and their documentation lacked information on this matter.
Furthermore, the simplicity of ScrapingBee's pricing system has both positive and negative aspects. It is reassuring to know the exact number of credits each request will cost based on the chosen parameters. However, I would have appreciated a more detailed breakdown of my usage and charges within ScrapingBee's dashboard for better transparency.
Giving credit where credit is due, ScrapingBee began 2024 with an update that notably included the launch of a brand new Log and Analytics dashboard. This development suggests we can anticipate further enhancements aimed at improving access to data on API runs' performance and costs in the near future.
Lastly, I missed having convenient access to an integrated cloud infrastructure like Apify or Zyte. While I understand that is not ScrapingBee's primary focus, having an all-in-one solution for my web scraping needs would save considerable time and effort, rather than having to search for and pay for different services to host my data extraction workflows.
SuperScraper API is a versatile, open-source REST API designed for web scraping. It offers compatibility with services like ScrapingBee, Scraper API, and ScrapingAnt. That means this Apify Actor can be used as a potentially cheaper drop-in replacement for ScrapingBee.
In conclusion, the ScrapingBee Data Extraction API offers a reliable solution for developers seeking a straightforward method to extract data from websites without the complexities of building a scraper from scratch. However, if you require a more comprehensive solution with a wider range of pre-built features and greater control over your applications and data extraction process, relying solely on ScrapingBee may not fully meet your needs.
Try Apify for Free, the full-stack web scraping cloud platform.
Finally, I want to emphasize that this post serves as an introductory analysis and guide to ScrapingBee's service, assisting developers in determining if it is the right choice for them. It is important to note that not all features provided by their API have been explored in this article.
If you find yourself intrigued by ScrapingBee, I encourage you to further explore the ScrapingBee documentation for a more in-depth understanding of the platform's capabilities.
This is the first in a series of articles we commissioned from an external developer (although Percival is a former Apifier). We want to create unbiased reviews of other web scraping platforms and companies as part of our continued evaluation of the web scraping industry. Want more like this? Check out ScraperAPI vs. Apify.