Whether you're building an application, conducting market research, or analyzing trends, accessing timely and accurate data is essential. However, identifying the most efficient and reliable methods for obtaining this data can be a daunting task. Should you build your own web scrapers? Use an existing web scraping API? Or go for something in between?
If you've spent some time googling around for an answer to those questions, then you've probably come across ScrapingBee. But now a different question emerges. How do I know if this service is right for my use case? Well, that’s precisely what we will try to answer in this article. We will review ScrapingBee’s service and analyze the different kinds of tools that they provide, and the pros and cons of using the service.
So, let’s get started and see if ScrapingBee is worth using for your web scraping project.
ScrapingBee: what are the pros and cons?
Benefits: user-friendly web scraping API
ScrapingBee provides a user-friendly web scraping API that offers various features required for large-scale web scraping and to prevent getting blocked, including proxies and JavaScript rendering. It is recommended for developers seeking a simple solution for extracting data, which can be seamlessly integrated with their existing code for data processing.
Limitations: limited control and no integrated cloud solution
ScrapingBee's straightforward approach may be limiting for developers with advanced web scraping knowledge, as they are required to follow the rules set by ScrapingBee's API and have restricted control over the entire data extraction process.
Additionally, ScrapingBee lacks an integrated solution for managing data extraction flows in the cloud. This can be inconvenient since you would need to find a separate cloud provider or set up your own infrastructure.
ScrapingBee Proxy and API credit consumption
When it comes to large-scale data extraction, proxies are essential for circumventing anti-bot systems used by modern websites. However, utilizing proxies can generally increase the cost of your web scraping activities significantly. ScrapingBee's API provides several proxy options: Rotating Proxy (default), Premium Proxy, Stealth Proxy, or the ability to use your own proxy. Here is an overview of how the usage of these proxies impacts your API Credit consumption within their system:
Feature used | API credit cost/request |
---|---|
Rotating Proxy without JavaScript rendering | 1 |
Rotating Proxy with JavaScript rendering (default) | 5 |
Premium Proxy without JavaScript rendering | 10 |
Premium Proxy with JavaScript rendering | 25 |
Stealth Proxy with JavaScript rendering (only option available) | 75 |
ScrapingBee pricing
The pricing of a service often plays a crucial role in our decision-making process. Fortunately, ScrapingBee provides a freemium model that allows users to try their service for free with 1,000 API credits. Their paid plans range from $49/month to $599+/month for the business plan. The key distinction between these plans is the allocation of API credits, with the base plan offering 150,000 credits and the business plans providing 8,000,000+ credits, depending on your needs. Additionally, the more expensive plans offer higher limits for concurrent requests and higher support levels.
ScrapingBee scraping test
ScrapingBee offers a versatile data extraction API as one of its primary services, allowing users to extract data from a wide range of web pages. To evaluate its capabilities, I decided to scrape Amazon, a well-known website notorious for implementing sophisticated anti-bot systems.
Navigating through ScrapingBee's API was straightforward, and the ScrapingBee documentation provided clear and updated information. With just a few lines of code, as shown in the example below, I successfully extracted the titles, prices, and links of the iPhones listed on the first page of Amazon.com:
from scrapingbee import ScrapingBeeClient # Importing SPB's client
client = ScrapingBeeClient(api_key='YOUR_API_KEY')
response = client.get("<https://www.amazon.com/s?k=iphone&crid=1BIGRK4NGFLDS&sprefix=ipho%2Caps%2C278&ref=nb_sb_noss_2>", params={
'extract_rules':{
"product-titles": {
"selector": "div.a-section.a-spacing-none.puis-padding-right-small.s-title-instructions-style > h2 > a > span",
"type": "list",
},
"product-prices": {
"selector": "div.a-section.a-spacing-none.a-spacing-top-micro.puis-price-instructions-style > div > a > span > span.a-offscreen",
"type": "list",
},
"product-links": {
"selector": "div.a-section.a-spacing-none.puis-padding-right-small.s-title-instructions-style > h2 > a",
"type": "list",
"output": "@href"
},
}
})
if response.ok:
print(response.json())
If you want to test the provided code yourself, follow these steps:
- Create a ScrapingBee account.
- Replace the placeholder text in the code with your own ScrapingBee API key.
Once you have completed these steps and run the code, you can expect to see results similar to the example below printed to your terminal:
{
"product-titles":[
"Apple iPhone 11, 64GB, Black - Unlocked (Renewed)",
"Apple iPhone SE (2nd Generation), 64GB, Red - Unlocked (Renewed)",
"Apple iPhone 12, 64GB, White - Fully Unlocked (Renewed)",
"Apple iPhone 8, 64GB, Gold - Unlocked (Renewed)",
"Apple iPhone 12 Mini, 64GB, Black - Unlocked (Renewed)",
"Apple iPhone X, US Version, 64GB, Silver - Unlocked (Renewed)",
"Apple iPhone XR, 64GB, Black - Unlocked (Renewed)",
"Apple iPhone XS, US Version, 64GB, Space Gray - Unlocked (Renewed)",
"Apple iPhone 8 Plus, US Version, 64GB, Gold - Unlocked (Renewed)",
"Apple iPhone 14 Pro Max, 128GB, Space Black - Unlocked (Renewed)",
"Apple iPhone 13, 256GB, Midnight - Unlocked (Renewed)",
"Apple iPhone 11 Pro, 64GB, Midnight Green - Unlocked (Renewed)",
"iPhone 13 Mini, 128GB, Pink - Unlocked (Renewed)",
"Apple iPhone 12 Pro, 256GB, Gold - Fully Unlocked (Renewed)",
"Apple iPhone SE 3rd Gen, 64GB, Midnight - Unlocked (Renewed)",
"Apple iPhone 14, 512GB, Purple - Unlocked (Renewed Premium)"
],
"product-prices":[
"$305.55",
"$147.00",
"$394.95",
"$137.99",
"$308.99",
"$223.00",
"$214.75",
"$232.00",
"$189.99",
"$1,019.99",
"$629.99",
"$388.00",
"$494.99",
"$584.99",
"$257.99",
"$875.00"
],
"product-links":[
"/Apple-iPhone-11-64GB-Black/dp/B07ZPKN6YR/ref=sr_1_1?keywords=iphone&qid=1688323279&sprefix=ipho%2Caps%2C278&sr=8-1",
"/Apple-iPhone-SE-64GB-Red/dp/B088N8TF64/ref=sr_1_2?keywords=iphone&qid=1688323279&sprefix=ipho%2Caps%2C278&sr=8-2",
"/Apple-iPhone-12-64GB-White/dp/B08PPBQM23/ref=sr_1_3?keywords=iphone&qid=1688323279&sprefix=ipho%2Caps%2C278&sr=8-3",
"/Apple-iPhone-Fully-Unlocked-64GB/dp/B0775717ZP/ref=sr_1_4?keywords=iphone&qid=1688323279&sprefix=ipho%2Caps%2C278&sr=8-4",
"/Apple-iPhone-12-Mini-Black/dp/B08PPDJWC8/ref=sr_1_5?keywords=iphone&qid=1688323279&sprefix=ipho%2Caps%2C278&sr=8-5",
"/Apple-iPhone-Fully-Unlocked-64GB/dp/B07C357FSJ/ref=sr_1_6?keywords=iphone&qid=1688323279&sprefix=ipho%2Caps%2C278&sr=8-6",
"/Apple-iPhone-XR-Fully-Unlocked/dp/B07P6Y7954/ref=sr_1_7?keywords=iphone&qid=1688323279&sprefix=ipho%2Caps%2C278&sr=8-7",
"/Apple-iPhone-64GB-Space-Gray/dp/B07SC58QBW/ref=sr_1_8?keywords=iphone&qid=1688323279&sprefix=ipho%2Caps%2C278&sr=8-8",
"/Apple-iPhone-Plus-Fully-Unlocked/dp/B07757LZ1J/ref=sr_1_9?keywords=iphone&qid=1688323279&sprefix=ipho%2Caps%2C278&sr=8-9",
"/Apple-iPhone-14-Pro-Max/dp/B0BN94DL3R/ref=sr_1_10?keywords=iphone&qid=1688323279&sprefix=ipho%2Caps%2C278&sr=8-10",
"/Apple-iPhone-13-256GB-Midnight/dp/B09LNCVCKW/ref=sr_1_11?keywords=iphone&qid=1688323279&sprefix=ipho%2Caps%2C278&sr=8-11",
"/Apple-iPhone-64GB-Midnight-Green/dp/B07ZQRMWVB/ref=sr_1_12?keywords=iphone&qid=1688323279&sprefix=ipho%2Caps%2C278&sr=8-12",
"/Apple-iPhone-13-Mini-128GB/dp/B09LKF2RPP/ref=sr_1_13?keywords=iphone&qid=1688323279&sprefix=ipho%2Caps%2C278&sr=8-13",
"/Apple-iPhone-Pro-256GB-Gold/dp/B08PN7R2MZ/ref=sr_1_14?keywords=iphone&qid=1688323279&sprefix=ipho%2Caps%2C278&sr=8-14",
"/Apple-iPhone-SE-3rd-Midnight/dp/B0BDY71GRG/ref=sr_1_15?keywords=iphone&qid=1688323279&sprefix=ipho%2Caps%2C278&sr=8-15",
"/Apple-iPhone-14-512GB-Purple/dp/B0BYKX35NT/ref=sr_1_16?keywords=iphone&qid=1688323279&sprefix=ipho%2Caps%2C278&sr=8-16"
]
}
In this specific request, using ScrapingBee's API with the default configurations (Rotating Proxy and JavaScript rendering), I was charged 5 API credits. Despite making multiple requests to Amazon.com, I did not encounter any blocking issues when using the API's default settings, which is a good sign about the service’s reliability.
For other websites, or at very large scales, it might be necessary to enable the premium proxy. So, let’s see how to do that.
Using proxies in ScrapingBee
Enabling proxies in ScrapingBee is straightforward. To use a specific proxy type, you just need to include the corresponding parameter and set it to "True". For instance, to utilize the Premium Proxy, you would add "premium_proxy=True"
to your response parameters, as shown below:
from scrapingbee import ScrapingBeeClient # Importing SPB's client
client = ScrapingBeeClient(api_key='YOUR_API_KEY')
response = client.get("<https://www.amazon.com/s?k=iphone&crid=1BIGRK4NGFLDS&sprefix=ipho%2Caps%2C278&ref=nb_sb_noss_2>", params={
# Choose the proxy type you want by adding the premium_proxy, stealth_proxy or own_proxy parameters
'premium_proxy': 'True',
'extract_rules':{
"product-titles": {
"selector": "div.a-section.a-spacing-none.puis-padding-right-small.s-title-instructions-style > h2 > a > span",
"type": "list",
},
"product-prices": {
"selector": "div.a-section.a-spacing-none.a-spacing-top-micro.puis-price-instructions-style > div > a > span > span.a-offscreen",
"type": "list",
},
"product-links": {
"selector": "div.a-section.a-spacing-none.puis-padding-right-small.s-title-instructions-style > h2 > a",
"type": "list",
"output": "@href"
},
}
})
if response.ok:
print(response.json())
It's worth mentioning that enabling this option can enhance the reliability of our data extraction process by reducing the risk of our bot being blocked. However, it's important to note that this improvement comes at a higher cost per request.
For instance, in my case, using the Premium Proxy and JavaScript rendering for this request consumed 25 credits, which is a fivefold increase compared to the 5 credits spent when using the default Proxy rotation configuration.
Limitations of the ScrapingBee web scraping API
Although I was pleasantly surprised by the ease of extracting the desired data and the low incidence of blocked requests, I found it frustrating that the API had limitations when it came to more complex operations. For instance, if I were building my own scraper, I could easily handle Amazon's pagination and extract data from all the search results while maintaining complete control over the scraper's behavior. However, achieving a similar outcome using ScrapingBee's API was not immediately apparent, and their documentation lacked information on this matter.
Furthermore, the simplicity of ScrapingBee's pricing system has both positive and negative aspects. It is reassuring to know the exact number of credits each request will cost based on the chosen parameters. However, I would have appreciated a more detailed breakdown of my usage and charges within ScrapingBee's dashboard for better transparency.
Giving credit where credit is due, ScrapingBee began 2024 with an update that notably included the launch of a brand new Log and Analytics dashboard. This development suggests we can anticipate further enhancements aimed at improving access to data on API runs' performance and costs in the near future.
Lastly, I missed having convenient access to an integrated cloud infrastructure like Apify or Zyte. While I understand that is not ScrapingBee's primary focus, having an all-in-one solution for my web scraping needs would save considerable time and effort, rather than having to search for and pay for different services to host my data extraction workflows.
Conclusion and final considerations
In conclusion, the ScrapingBee Data Extraction API offers a reliable solution for developers seeking a straightforward method to extract data from websites without the complexities of building a scraper from scratch. However, if you require a more comprehensive solution with a wider range of pre-built features and greater control over your applications and data extraction process, relying solely on ScrapingBee may not fully meet your needs.
Try Apify for Free, the full-stack web scraping cloud platform.
Finally, I want to emphasize that this post serves as an introductory analysis and guide to ScrapingBee's service, assisting developers in determining if it is the right choice for them. It is important to note that not all features provided by their API have been explored in this article.
If you find yourself intrigued by ScrapingBee, I encourage you to further explore the ScrapingBee documentation for a more in-depth understanding of the platform's capabilities.
This is the first in a series of articles we commissioned from an external developer (although Percival is a former Apifier). We want to create unbiased reviews of other web scraping platforms and companies as part of our continued evaluation of the web scraping industry. Want more like this? Check out ScraperAPI vs. Apify.