How to scrape Instagram posts, comments, and photos

Jarda Hejlek
Jarda Hejlek

The official Instagram API allows you to programmatically access your own comments and posts on Instagram. However, the Instagram API doesn’t allow you to get a list of posts made by other people, comments, and photos on the posts, or get a list of posts with a particular hashtag.

To extract that kind of information, you'll need to use our Instagram Scraper. The scraper is free to use and will let you download all of this public data from Instagram:

  • Profiles
  • Hashtags
  • Locations
  • Comments
  • Likes

Where to find the data

The easiest way to search and access content on Instagram is by using the mobile app or the website:

Medium on Instagram

Some of Instagram is accessible even without logging in, but not as much as in the past. Once you log in, you can browse through hashtags, profiles, locations, and posts. This is very encouraging, because if you can do something manually in a web browser, you can automate it on Apify 😉

Data available publicly without login

If you check the website in an incognito browser window, you’ll quickly find that there are some features that you can access freely and some that are either blocked or require you to log in. Here is what you can find without a login (Instagram keeps changing the rules, so some of this might have changed):

You can search for profiles, hashtags, and places, and Instagram will return the top 100 posts.

The search feature on Instagram.com

There is even a nice internal API endpoint that can be used to get the results in JSON format:

https://www.instagram.com/web/search/topsearch/?context=blended&query=avengers

The context query parameter serves as the filter and it can contain a location, user or a hashtag. The only limitation is that the endpoint returns just 100 results. If you need more, you need to enter a more detailed filter.

2) Posts from Profiles/Hashtags/Locations

When you open any public Instagram page that contains posts (e.g. profile, hashtag or location), Instagram will return an HTML page with the first few posts preloaded (probably using React server-side rendering). Then, when you scroll down the page, Instagram will continue loading more posts using an XHR request to an Instagram’s GraphQL endpoint. The endpoint is protected with a token, so it’s not really possible to access it directly and we need to infinitely scroll the page. However, we can automate infinite scrolling nicely using headless Chrome with Puppeteer.

Posts visible on Instagram’s own profile page

Our tests haven’t found a limit to how many posts can be loaded using infinite scroll. There probably is one, but even up to a thousand posts were loaded during testing.

3) Comments on posts

Every Instagram post has publicly visible comments and shows a Load more comments button if there are more comments that can be shown.

Comments on Medium.com’s Instagram post - “Medication” by @misscaseyjo

Clicking on the button fires off an XHR request to the Instagram’s GraphQL endpoint. Again, we can easily automate this using Puppeteer’s page.click() function and then extract the content of the comments from the web page.

Data available only after login

Unfortunately, certain content can be only accessed if you’re logged in using your Instagram account, for example:

  • List of followers
  • List of people a user follows

Although it would be possible to automatically log in to Instagram in order to access this data, this approach is risky since it can lead to the banning of your account by Instagram. Sure, you could create a fake Instagram account and use that instead, but that’s beyond the scope of this article.

Over time, Instagram has been increasingly limiting the data you can access without login, so you’ll need to test to see what you can scrape.

Using an Apify actor to scrape the data

Our Instagram Scraper is one of many actors available in Apify Store. Apify actors are cloud programs that accept input, perform their job, and generate some output. They can be run manually in the app, using the API or scheduler.

The actor is written in Node.js and uses Apify SDK. On input, it takes an Instagram query or a list of direct profiles URLs, then it searches the query and scrapes page details, posts, or comments from the results and direct URLs. All of the resulting data is stored in a structured form into a dataset, from which you can download it in formats such as JSON, XML, Excel, CSV, etc. The source code for Instagram Scraper is available on GitHub — pull requests and ideas for improvement are more than welcome!

Our Instagram scraper is free to use, although you will need to use residential proxies on Apify Proxy if you run it on the Apify platform. This is because Instagram changed the rules in 2021 and now you always need to use a residential proxy for scraping 😖

The free trial of Apify Proxy given to every new Apify user won’t be enough, as you’ll need a residential proxy and the free trial only includes data center proxies. But just email support@apify.com and tell them that you want a free trial of residential proxies so that you can scrape Instagram and they’ll sort you out!

Before you get started scraping Instagram, note that we do not consider scraping vast amounts of personal data ethical and discourage anyone from doing so.

Step-by-step guide to scraping Instagram

So you want to use our Instagram Scraper to extract data from Instagram? This tutorial should get you started in just a few minutes.

1.  Go to the Apify website: https://apify.com

Visit apify.com to get started

2. Sign in at the top right corner using your email account, Google, or GitHub.

Log in with email, Google, or GitHub

3. Once you log in, you’ll find yourself in Apify Console. You can do a lot more here than just scrape Instagram, so make sure you check it out later on.

Apify Console is where you can control your actors

4. Now click on the Store button in one of the tabs on top. Apify Store is packed with free, ready-to-use web scraping and automation tools called actors. Search for Instagram.

Search for Instagram in Apify Store

5. By clicking on the Instagram Scraper card or dropdown result, you’ll be redirected to the scraper’s own page, where you can read more about how to customize the actor.  When you’re ready, click the blue Try me button. You'll be redirected back to Apify Console.

The Instagram Scraper page

6. You should see a new Task automatically created for Instagram Scraper. But the task won't start running until you tell it to! You can just click Run with the default search term, or you can search for something else and then click Save & Run to start the actor.

Run the actor with the default parameters or choose your own

7. Your task will change status to show that it is Running. The scraper is now visiting Instagram and busily extracting data.

Instagram Scraper is running

8. As soon as you see that the status has changed to Succeeded, click on the Dataset tab to check your scraped search results.

The Dataset tab is where you can find your results

9. The Dataset tab contains your data in lots of versatile formats, including HTML table, JSON, CSV, Excel, XML, and RSS feed. You can open them by clicking on View in another tab, Preview or Download. You can then share the data, or upload it anywhere you like. Use it in spreadsheets, other programs or apps, or your own projects.

Preview of the data in JSON format

And that's all you need to get started scraping Instagram. If you run into any problems or need some advice, just email support@apify.com and we'll try to help. You can also join the conversation on our Discord server, where the Apify team regularly discuss solutions to tricky scraping problems.

If you need to scrape Instagram at scale or need end-to-end service, you can request a custom solution.



Great! Next, complete checkout for full access to Apify
Welcome back! You've successfully signed in
You've successfully subscribed to Apify
Success! Your account is fully activated, you now have access to all content
Success! Your billing info has been updated
Your billing was not updated