How to never miss a beat on ever changing websites
The web is constantly changing. Posts in discussions are being deleted, online stores are changing prices on hourly bases and news sites are updating their articles even weeks after publication.
What can you do if you don’t want to miss a change?
Looking at an ever changing website from Hacker News Show, we can begin to see the information we are losing by not constantly monitoring. There are few posts per hour and then each post is going up and down based on its popularity and some blackbox magic. It would be cool to visualize the flow of the posts to see the trends.
As I am working for Apify I will use our platform :). I will combine 2 services
Crawler — to scrape the data from HN Show every 15 minutes
Actor — to merge new data with previous one after each crawler run
First we will create a crawler that scrapes the HN Show and returns following JSON where each link has a rank from 1 to 100 (from the lowest to the highest position):
and the following Page function to scrape the data:
Then we need to setup a scheduler with cron expresssion */15 * * * * to execute our crawler every 15 minutes.
Finally, to collect the data from each crawler run, we initiate the start of mtrunkat/crawler-timeline act from the finish webhook of our crawler. This act simply takes the result of the last crawler execution and adds it as a new line to previous ones. Outputs are saved into a key-value store in the user’s account in CSV and JSON formats:
If we wait for a few days for the crawler to collect data (a phenomenon called web scraping in 2024), then we will be able see the visualization of flow of the HN Show posts from the beginning of this article.
This way you can monitor product prices on e-commerce sites, current stock prices, occupancy of your favorite public pool, you name it. Apify’s different services allow for you to harness the insight of the Web to any project you can dream of.
CTO and one of the earliest Apifiers. Writing about challenges our development team faces when building and scaling the Apify platform, which automates millions of tasks every month.