How to get data from Hacker News with unofficial HN API

Tuğkan Cengiz
Tuğkan Cengiz

If you're into development, technology, or startups in general and want to keep up with what's going on in the ecosystem, Hacker News is definitely one of your daily tabs.

Hacker News Scraper acts as an unofficial HN API to help you extract items from Hacker News. It is optimized to run blazing fast and scrape many as listings as possible.

TL;DR

Hacker News Scraper · Apify
Scrape Y Combinator’s Hacker News based on any search criteria. Crawl the front page, Show HN, Ask HN, news, job listings, and historical data. Extract and download links, titles, comments, ratings and more.
Hacker News Scraper detail link on the Apify platform

Features

This Hacker News data scraper supports the following features:

  • Scrape front page listing: scrape homepage listings - any page you want.
  • Scrape newest listing: the latest news can be scraped right away from Hacker News.
  • Scrape historical data: if you're looking for historical data, pick any date you want and scrape it.
  • Scrape listings of Ask HN: if you're specifically looking for an “Ask HN” type of listing, you can target it.
  • Scrape listings of Show HN: if you are specifically looking for a “Show HN” type of listing, you can target it.
  • Scrape listing details: you can scrape a single listing.
  • Scrape job listings: you can scrape the latest job listings posted on Hacker News.

Upcoming changes

  • Implement nesting on comment replies.

Setup & usage

You can see how this actor works these videos:

Using start URLs

Using Hacker News Scraper with start URLs

You can see the output of this example run here.

Using mode

Using Hacker News Scraper with Mode

You can see the output of this example run here.

Tips

When you want to scrape a specific listing URL, just copy and paste the link as one of the start URLs.

If you would like to scrape only the first page of a list then put the link for the page and have the endPage as 1.

With the last approach explained above, you can also fetch any interval of pages. If you provide the 5th page of a list and define the endPage parameter as 6 then you’ll have the 5th and 6th pages only.

If you would like to scrape historical data (e.g. 2020–03–18) go to Hacker News, click on the “Past” tab and find the URL that you're looking for. Then use the link as a start URL. Here's the format for historical data: https://news.ycombinator.com/front?day=2020-03-18

Final words

Hacker News is one of my daily information sources and I initially developed this actor for myself. Since it's working quite nicely, I decided to open it to the public.

Like my all other actors there are lots of new features on the roadmap and I am always open to new ideas. Please don’t hesitate to contact me if you have any feedback, feature requests, or totally new ideas that might be interesting to implement.

P.S. You should always use a proxy to get the best results.



Great! Next, complete checkout for full access to Apify
Welcome back! You've successfully signed in
You've successfully subscribed to Apify
Success! Your account is fully activated, you now have access to all content
Success! Your billing info has been updated
Your billing was not updated