How to turn any website into an RSS feed

Marek Trunkát
Marek Trunkát

XML-based RSS feeds are a way of publishing frequently changing content such as news or blogs posts in a format that things like news readers, aggregators, and content importers can consume.

But what if a website you want to integrate does not provide an RSS feed? Apifier recently introduced a feature to export crawled data in the RSS format. In this article, we’ll show you how to build a simple crawler and publish its content in an RSS feed.

For example, let’s say you want to keep yourself up to date with Apifier release notes. There is no RSS feed or mailing list to subscribe to so we will build a crawler publishing to page as an RSS channel that you can add into your favourite RSS reader or integrate with Zapier to receive email notifications for every new entry.

First, let’s have a look at the HTML structure of the release notes page:

For each release note item we want to extract its date (see <h4> tag) and description (see <ul> tag), and export that data as the following RSS feed:

The RSS specification requires each <item> element to contain <title>, <description> and <link> child elements. We will also add <pubDate> and <guid> which is often used along with <link> to recognize new items.

Let’s add a new crawler and set its Custom ID to Apifier_Release_Notes. This value will be used for the <title> child element of the <channel>. Set Start URLs to http://apifier.com/release-notes

and set Page function to:

All the objects returned in the result array will be attached as <item> elements into RSS feed.

Now let’s set up a schedule to update the RSS feed every hour (or every day — whatever you prefer). Go to Schedules section in the main menu and add a new schedule with the following settings:

Then go back to the crawler settings and click the View runs button to show a list of your crawler runs. Open detail of one of the runs by click at the magnifying glass icon. In the results section you find link RSS (last execution) which opens url in following format:

https://api.apifier.com/v1/[user_id]/crawlers/[crawler_id]/lastExec/results?token=[access_token]&format=rss&status=SUCCEEDED

As you can see it’s Apifier API endpoint for results of last crawler’s execution in RSS format with status=SUCCEEDED filter for successfully finished executions.

And that’s it, now you can simply add this URL to your favourite RSS reader. Here’s a working example of the RSS feed for Apifier release notes:

https://api.apifier.com/v1/hNNQbYhnwafECWc8f/crawlers/MFJcydfzBm6vzmksb/lastExec/results?token=y6QpCxWfHXHhJbSNhFXXLAeBM&format=rss&status=SUCCEEDED

To find out more about the RSS results format see the API Reference. For more information about RSS 2.0 see W3C Schools.



Great! Next, complete checkout for full access to Apify
Welcome back! You've successfully signed in
You've successfully subscribed to Apify
Success! Your account is fully activated, you now have access to all content
Success! Your billing info has been updated
Your billing was not updated