Hello there, web scraping enthusiast! Our September edition of updates offers you a great read on the legality of web scraping, 2 excellent use cases of web scraping for real estate research, a couple of handy guides on how to scrape Google, Instagram and set up a watchdog, as well as first out of many upcoming technical blogposts.
Is web scraping even legal? ๐๐ฎ
There are many nuances to the legal side of web scraping and we're here to clear them up. Read our blog post about the legality of web scraping if you ever wondered whether it's on the dodgy side of web activities (spoiler: it's not). You can also expect a video on this topic coming up next month. Stay tuned and keep it legal!
How to tune MongoDB performance ๐๏ธ
For Apify, MongoDB is a crucial element that can affect both UX and our platform's performance. In early 2021, our users started reporting degraded performance of our UI. The cause? Over-utilized drives in our MongoDB cluster. It was time to take action and improve our overall usage of MongoDB.
Read about some of the techniques and MongoDB Cloud features we've used to debug performance issues and expose sub-optimal queries.
Scraping real estate for research ๐ก
Looking at real estate prices may hurt, but not if you're armed with real data. Check out how one American student used our Zillow Scraper to analyze the real estate market in white picket fence areas of Boston, MA. Or see for yourself how easy it is to compare prices of thousands of cottages all over Czechia, just like these ladies from Czechitas did. This is some pretty impressive research done via data extraction - try it yourself๐จโ๐ฌ๐ฉโ๐ฌ
How to scrape Instagram ๐
You can stop scrolling now - with our tutorial and a newly polished Instagram scraper you can now extract data from thousands of posts, comments as well as stories, provided you're logged in.
How to scrape Google ๐๐
That's right, you can scrape the whole Google now. If you're feeling lucky today ๐ here's two tutorials - one on how to scrape Google SERPs with a small trip down memory lane, and one for scraping Google Trending Searches for staying ahead of the curve in whatever your web-related goals are. There's also a collection of SEO-oriented actors that can replace many SEO tools. Enjoy!
How to set up a content change watchdog for any website ๐๏ธโ๏ธ
Wouldn't you want to get a notification when your favorite item goes on sale? Or when a concert of your most loved jazz band is planned in your area? Read about how, with just some basic JavaScript, you can set up a watchdog for events or items appearing on sale more in just 5 minutes.
Actors updated and/or running at top speed ๐๐๏ธ
- Make good use of the IMDb scraper for anything cinema-related
- Use our GIF Scroll Animation Actor for testing UI or showcasing your work
- Check out our new Shopify scraper๐ฅ for keeping an eye on the most precious items you'd like to add to your collection
As you can see, we've started a series of dev-oriented blog posts. Keep an eye on our blog, since we're planning to publish articles on the following subjects:
- How to use Apify from PHP
- Apify+Python=โค๏ธ
- Handling IPv4-mapped IPv6 addresses in Node.js
... and many more! We'll keep you posted ๐๏ธ
Minor and major UI improvements โ
Last but not least, we also rolled out some tweaks to the app UI:
- New Actor UI version has been perfected with the help of your feedback, ready to be relaunched
- List inputs now support large amounts of data (by switching to json editor)
- You can now access your own Actors&tasks by referring to them as ~resource-name
- Enabled token editing validation
- You can now see a Python example in the API tab in public Actor pages
- We've added API endpoints for actor environment variables
Are you our next amazing teammate? ๐จโ๐ป ๐ฉโ๐ป You might be exactly who we're looking for:
- DevOps/SRE (Site reliability engineer)
- Backend Engineer
- Head of Operations
We kicked off this month by making #HackerCamp happen. If you haven't heard of it yet - perhaps you will next time, as it was our first year this September. You can find some snapshots of this epic getaway on our LinkedIn, but be warned: they don't reflect the whole vibe. Do join us in 2022 ๐๐ฒ ๐๏ธ The only thing running faster than our scrapers this month was our amazing #VltavaRun team, who spent 32h running in total, and took 71st place out of 272 teams. Look at them go! ๐๐โโ๏ธ๐โโ๏ธ๐โโ๏ธ