How to turn any website into an RSS feed

Marek Trunkát
Marek Trunkát
Table of Contents

What if a website you want to integrate does not provide an RSS feed? In this article, we’ll show you how to build a simple crawler and publish its content in an RSS feed.

If you are like me - used to following the internet the good old way by using RSS, it happens once upon a time that you find an interesting website with no RSS feed available. A sad but not unresolvable situation.

One of our main claims on the Apify website is that we turn websites into APIs. So we should know how to solve this kind of situation, right? The only missing ingredient for this article is a website without a proper RSS feed. Well, it's the cobbler's children that go barefoot, so let's use the Apify change log!

Change log · Apify
Keep up to date with the latest releases, fixes, and features from Apify.

We will be using our most popular generic scraper - Web Scraper (apify/web-scraper).

Get a free Apify account if you haven't already got one, then open the scraper in Apify Console and create a task for it:

After you've created a new task, open the "Input and options" tab. Here we will have to configure three fields:

  • Start-URLs - simply enter the URL for the scraper to start at https://apify.com/change-log.
  • Pseudo-URLs - this is a pattern for URLs you want the scraper to visit. These are all in the form https://apify.com/change-log/[.*] where [.*] stands for any series of characters.
  • Page function - here the programmer fun starts 🎡.

We won't need any other configuration fields to accomplish our task. But to put together the page function, we will have to look more deeply into the HTML source code of the changelog page.

For more information on various inputs of Web Scraper, see its documentation.

To summarize what we have just configured:

If you open the Wikipedia page about RSS, you will find that the RSS item must contain the following fields to be valid:

  <item>
	<title>Example entry</title>
  	<description>Here is some text ...</description>
  	<link>http://www.example.com/blog/post/1</link>
  	<guid>7bd204c6-1655-4c27-aeee-53f933c5395f</guid>
  	<pubDate>Sun, 06 Sep 2009 16:20:00 +0000</pubDate>
  </item>

This means that from each changelog post (such as this one), we need to create a page function to extract:

  • Title
  • Description - the body of the post
  • URL
  • Some unique ID
  • Publication date

Here you can see the HTML structure of a changelog detail page with the data we need:

I'll use jQuery, which is already embedded in the page function, to extract the data, and the whole code will look as follows:

async function pageFunction(context) {
    // Skip landing page and extract data from details only.
    if (context.request.url === 'https://apify.com/change-log') return;

    const $ = context.jQuery;
    const title = $('h1').first().text();
    const date = $('.ChangeLogItem-date').text();
    // There is one <div> between header and the description.
    // Also trim() the text to get rid of a whitespace.
    const description = $('.ChangeLogItem-header')
        .next().next() // Skip the <div> in between
        .text()
        .trim();
    const isoDate = new Date(date).toUTCString();

    return {
        url: context.request.url,
        title,
        date: isoDate,
        guid: isoDate, // Date is unique so we can use it.
        description,
    };
}

Now let's run our task, and after it finishes, preview the data in the dataset. You should get the following results:

Now you need to configure Apify Scheduler to run your task every hour to get fresh updates. Finally, you can copy-paste this API URL to access results from the last run of your task in RSS format:

https://api.apify.com/v2/actor-tasks/[TASK_ID]/runs/last/dataset/items?token=[YOUR_API_TOKEN]&format=rss

to RSS reader of your choice, and this is the result:

There you go. Now that you know how to create an automatic RSS feed from any website resource, staying on top of the most important news updates is easy. Let us know how it worked for you on Twitter!



Great! Next, complete checkout for full access to Apify
Welcome back! You've successfully signed in
You've successfully subscribed to Apify
Success! Your account is fully activated, you now have access to all content
Success! Your billing info has been updated
Your billing was not updated