How to scrape Reddit data with unofficial Reddit API

Why Reddit is one of the biggest social sharing sites on the internet and how you can use web scraping and a Reddit web scraper to extract useful data from subreddits.

Content

We're Apify. You can build, deploy, share, and monitor any scrapers on the Apify platform. Check us out.

Reddit bills itself as “the front page of the internet.” That’s a bold claim, but it is definitely true for a significant number of internet users, with the latest figures for 2023 showing that it has over 430 million monthly active users and over 100,000 active communities. Reddit was launched in 2005, but it is still popular and relevant, unlike many other early social media sites.

But the main question here for us is: can we scrape Reddit?

We certainly can! Read on and follow this guide to find out how, or watch our short YouTube tutorial if you prefer video📹🔴▶️

🤖 What is Reddit for?

It’s a huge social sharing site composed of smaller communities called subreddits where any user can post links, stories, images, or videos to these subreddits. The post gets more or less coverage depending on how much engagement it has.

Each subreddit also has moderators who make sure that the submissions are relevant to the topic of the subreddit, follow the rules, and aren’t just spam. Subreddits can have their own themes, and some look dramatically different from others. This interactive Reddit map gives you some idea of the scope of Reddit interests. Check it out for yourself and explore how subreddits connect.

🤔 Why scrape Reddit?

Now that you know how many users are on Reddit and how diverse their interests are, you might be starting to think of how you could gather useful data with a Reddit web scraping tool. Here are just some of the ways you can make use of all that user-generated content:

👁 Topic awareness. Keep track of how your brand, product, or topic is being discussed across the site. Get a sample of comments to assess the range of opinions.

❓ Customer engagement. Connect with your users and ensure their questions are answered quickly and effectively.

✨ Trend monitoring. Watch for new trends, and attitudes, and avert potential PR disasters. Reddit often acts as an incubator for ideas, and how Redditors behave and think usually precedes mainstream channels by months or even years.

📰 News monitoring. Keep ahead of potential profits or losses resulting from Reddit activity, like the GameStop stock price surge and high-stakes subjects such as finance, politics, technology, and news in general.

📹 Content aggregation. Aggregate data, posts, images, or videos from multiple subreddits and present them in new and interesting ways for your users.

🕵️‍ What data can I get from Reddit?

  • Subreddits will get you the most popular posts, community details (URL, number of members, category, etc.).
  • Reddit posts will get you the Redditor's username, post title, text, post comments, and the number of votes.
  • Reddit comments will get you the time of the comments, points received, author usernames, original posts, and relevant URLs.
  • User details will get you comment history and recent posts.
  • Data across Reddit by specifying keywords or search URLs.

🗿 What about getting data with the Reddit API?

Reddit has its own API (application programming interface) designed to let developers interact in lots of useful ways with the Reddit site. It’s a great resource and every dev interested in scraping Reddit should be familiar with what it offers. So why should you use a Reddit web scraper rather than the Reddit API?

Here are just some reasons why the official API might not be for you:

  1. Reddit requires you to be authenticated to scrape the Reddit website with their API.
  2. The use of Reddit's API for commercial use requires special authorization.
  3. Reddit requires developers to register to get a token and use the official API. While we can't say whether Reddit ever refuses to give someone a token, they might since they are not that much pro-scraping.
  4. Reddit has specific rules to follow for how one should use their API.
  5. Last but not least. following the example of Twitter from earlier this year, Reddit is planning to restrict free access to its API. The main concern that pushed them over the edge is the extensive use of Reddit’s user-generated content as training material for customized GPT-4 models and LLMs. While it will stay free for developers and researchers to create apps that help people use Reddit, Reddit data will become less available.
Training your LLM: how to get the data you need
Learn how to collect and process data for LLMs like ChatGPT.

So, if you have been wondering whether an unofficial Reddit API exists and whether there's an easy way to use it, this step-by-step tutorial is for you. The good news is: web scraping Reddit data is not all that difficult - even if you've never extracted data from websites before. We’ll use a free, ready-made tool called Reddit Scraper to get Reddit data.

🥾 Step-by-step guide to scraping Reddit

Step 1. Go to Reddit Scraper

Find the Reddit Scraper page and click the Try for free button.

Step 1. Go to Reddit Scraper in Apify Store
Step 1. Go to Reddit Scraper in Apify Store

Now you're on Apify sign-up page. If you don’t have an Apify account yet, you can easily sign in by using your Gmail, another email, or GitHub account. After you create an account, you’ll be redirected to Apify Console — your workspace for web crawlers and other web automation tools.

Create your account using just your email email, Google, or GitHub
Create your account using just your email, Google, or GitHub

Step 2. Select subreddit, Reddit profile, post, or keyword to scrape

Now you’re in your web scraping workspace, the first thing you need to do is to tell Reddit Scraper what data you want to get from Reddit. Choose your starting point:

  • section 1 - URLs 🔗 of specific subreddits, separate posts, and profiles
  • section 2 - keywords 🗝 across the whole Reddit site

🔗🤖 How to scrape Reddit posts from subreddits

Head over to Reddit and find subreddits you want to extract data from. Then copy their URLs and paste them into the Start URLs fields. You can add as many as you want.

Copy-paste URLs of one or more subreddits that interest you. You will be able to scrape all posts as well as user pages. Subreddit data will include (for each scraped post of this subreddit):

🗿
Subreddit data will include (for each scraped post): poster's username, any URLs included in the post, timestamp, scraping timestamp, comments, commentators' usernames, number of replies, comments, upvotes, etc.

🔗💬 How to scrape Reddit comments from posts

Now, instead of subreddit URLs, pick specific posts on Reddit.

Screenshot 2023-05-10 at 14.30.00.png
💬
Comment data will include (for each scraped post): poster's username, any URLs included in the post, timestamp, scraping timestamp, comments, commentators' usernames, number of replies, comments, upvotes, etc.

🔑🔍 How to scrape Reddit by search term

Alternatively, you can also scrape Reddit by a search term instead of pasting a URL. There is no need to go on Reddit for this, just type in the keyword in the Search field - one or more.

Then, pick what you are searching for: posts, comments, users or communities that contain this keyword. You can filter scraping results by date and decide how you want your search to be sorted when scraped: trending, novelty, amount of comments, etc.

Insert the keyword and pick which category to search through
Insert the keyword and pick which category to search through
🔍
Search data will depend on what you're searching - comments, posts, users or communities

Three final remarks before we start extracting data:

✅ in each section, you can add as many URLs or keywords as you want using the Add button or by pasting a prepared list into Bulk edit field.

✅ you can fill in either the URL or Search section; they are mutually exclusive.

✅ you can set how many results you want in Limits field. You can set how many comments per post you want how many posts per page.

Step 3. Click Start ▶️

When you’re happy with how you’ve set up your scraping parameters, click the Start button. The actor will start scraping, and you’ll see that it has a status of Running. It might take a few minutes to complete the scraping run, but you should soon see that the actor has ☑️ Succeeded.

Step 4. Collect your Reddit data

Now click on the Export results button or go to the Storage tab. Storage contains your scraped data in many formats, including HTML table, JSON, JSONL, CSV, Excel, XML. You can preview data using the 👁 Preview button, and download it in a format that suits your needs.

🔗🤖 Sample of scraped posts and comments from subreddits

Sample of scraped posts and comments from subreddits

🔗💬 Sample of scraped comments from posts

Sample of scraped comments from posts

🔍🔑 Sample of scraped data by search term

🔍🔑 Sample of scraped data by search term

Now that you know how to scrape Reddit with our free Reddit web scraper, you can play around with the settings and see what kind of data you can get.


❓ FAQ

⛳️ Does Reddit allow web scraping?

In general, yes, but it doesn't encourage it. Recently Reddit has been expressing concerns about businesses using scraped Reddit data to feed generative AI tools and large-language models. Hence, the plan to make the official Reddit API paid and less accessible for scraping.

Scraping Reddit is legal as long as you respect regulations such as the GDPR and the CCPA, which cover personal data protection. It’s also important to only scrape publicly available content that is not protected by copyright. You can get more info about the legality of web scraping in this in-depth blog post from our lawyers.

💰 Is Reddit API paid?

Starting from July 1st, 2023, Reddit API is free to use as long as you have OAuth registration and make less than 100 queries per minute per OAuth ID. Without registration, you are limited to 10 queries per minute. In spring 2023, Reddit announced plans to make Redit API paid arguing this decision with increased use of Reddit data as training material for large language models such as OpenAI’s ChatGPT and GPT-4.

💸 Is there a free Reddit Scraper?

Yes, there is. If you just want to extract a small amount of Reddit data, try our super-fast Free Reddit Scraper 🔗 Compared to the high-scale Reddit Scraper 🔗, it has a longer trial period. And although the free version delivers fewer results, it is very fast and has low consumption. The steps of this tutorial can be easily replicated for both of these scraping tools.

🤖 Can I use AI to scrape Reddit?

AI is currently unable to scrape websites directly, but it can help generate code for scraping Reddit if you prompt it with the target elements you want to scrape. Note that the code may not be functional, and website structure and design changes may impact the targeted elements and attributes.

👷 Can I build a Reddit scraper of my own?

Yes, you can and we can host it in the cloud for you. You can create your own Reddit crawler (or crawler for any website for that matter) directly on the platform and keep production there. Alternatively, you can develop it locally on your computer and only push it to the Apify cloud during deployment.

🍪 Do I need to apply cookies to get behind login when scraping Reddit?

No. As of May 2023, Reddit keeps its data public and doesn’t apply the login wall.

🛡 Do you need proxies to scrape Reddit?

These days, absolutely. Subreddits are public to access and don’t require a login to allow you to fetch information. You will usually need some sort of proxy to be able to scrape Reddit successfully. Although you can still get some results with just datacenter proxies, our best bet is on residential proxies for all Reddit scraping. Luckily, our Free plan comes with a free trial of Apify Proxy, so that should help you get started.

📦 Can I export Reddit data via an API?

Yes. You can use the Apify API to manage, schedule, and run any Apify Actors, including the Reddit Scraper. You'll be able to access any datasets, monitor performance, fetch results, create and update versions, and more. To access the API using Node.js or Python, use the respective apify-client package. For full details, see Apify API reference docs or click on the API tab for code examples.

🤝 Can I integrate my Reddit data with other services?

Yes. Reddit Scraper can be connected with almost any cloud service or web app thanks to integrations on the Apify platform. You can integrate with LangChain, Make, Trello, Zapier, Slack, Llama Hub, Airbyte, GitHub, Google Sheets, Google Drive, Asana, and more.


David Barton
David Barton
Apifier since 2016 so learned about web scraping and automation from the experts. MSc in Computer Science from TCD. Former game designer and newspaper production manager. Now Head of Content at Apify.
Natasha Lekh
Natasha Lekh
Crafting content that charms both readers and Google’s algorithms: readmes, blogs, and SEO secrets.

Get started now

Step up your web scraping and automation