Unofficial HN API to extract posts, titles, links, usernames, points, Ask HN, Show HN, jobs, comments, and historical data from Hacker News.
If you're into development, technology, or startups in general and want to keep up with what's going on in the ecosystem, Hacker News is definitely one of your daily tabs.
Hacker News Scraper acts as an unofficial HN API to help you extract items from Hacker News. It is optimized to run blazing fast and scrape many as listings as possible.
This Hacker News data scraper supports the following features:
- Scrape front page listing: scrape homepage listings - any page you want.
- Scrape newest listing: the latest news can be scraped right away from Hacker News.
- Scrape historical data: if you're looking for historical data, pick any date you want and scrape it.
- Scrape listings of Ask HN: if you're specifically looking for an “Ask HN” type of listing, you can target it.
- Scrape listings of Show HN: if you are specifically looking for a “Show HN” type of listing, you can target it.
- Scrape listing details: you can scrape a single listing.
- Scrape job listings: you can scrape the latest job listings posted on Hacker News.
When you want to scrape a specific listing URL, just copy and paste the link as one of the start URLs.
If you would like to scrape only the first page of a list then put the link for the page and have the endPage as 1.
With the last approach explained above, you can also fetch any interval of pages. If you provide the 5th page of a list and define the endPage parameter as 6 then you’ll have the 5th and 6th pages only.
If you would like to scrape historical data (e.g. 2020–03–18) go to Hacker News, click on the “Past” tab and find the URL that you're looking for. Then use the link as a start URL. Here's the format for historical data: https://news.ycombinator.com/front?day=2020-03-18
Hacker News is one of my daily information sources and I initially developed this actor for myself. Since it's working quite nicely, I decided to open it to the public.
Like my all other actors there are lots of new features on the roadmap and I am always open to new ideas. Please don’t hesitate to contact me if you have any feedback, feature requests, or totally new ideas that might be interesting to implement.
P.S. You should always use a proxy to get the best results.