How to scrape Twitter

David Barton
David Barton

So why use web scraping on Twitter? What kind of information can we get from the notoriously fast-moving website and how easy is it to get that information?

Twitter has become one of the most popular online discussion platforms in the world

Twitter started off as a simple ‘microblogging’ system for users to share short posts called tweets. That straightforward idea of expressing your thoughts in just 140 characters (and now 280 characters) has made Twitter one of the most active discussion platforms on the internet. People engage and argue, both companies and individuals market their brands, and politicians even use it as a way to reach their voters.

Twitter has more than 340 million users and more than 500 million tweets are posted every day. As Twitter itself boasts: Twitter is what’s happening and what people are talking about right now.

Chart from https://www.statista.com/

As you might imagine, that means that there’s a lot of useful data just sitting around on Twitter, waiting to be used for other purposes.

A single tweet can tell you information about:

  • the demographics of people who liked or retweeted the tweet
  • total clicks on a profile
  • how many people saw the tweet

And that’s just the tip of the proverbial iceberg.

For a Twitter user or marketer, access to data about how others engage with their tweets, can be vital for developing a brand. For companies, gathering data across Twitter can provide them with a competitive advantage. Academic researchers and journalists can make use of the data to understand how people interact and identify trends before they rise to the surface. Once you have the data, what you do with it is up to you.

The Twitter API is really great for developers. It gives you a lot of access to the platform underlying Twitter. You can use it to compose tweets, read profiles, access data about your followers, and get information on four main Twitter data points: Tweets, Entities, Places, and Users.

But we believe that web scraping can allow you to do more with Twitter than the API allows. Apify’s Twitter Scraper has the following advantages over the official API:

  • you do not need to have an account
  • our scraper is not rate-limited
  • you don’t need a registered app and API key

What technology does Apify use?

Some developers have asked us in the past why we don’t use Python for scraping Twitter. We built Apify with JavaScript, which really is the language of the web. We believe that JavaScript gives both our platform and our scrapers the speed and flexibility that modern web scraping needs.

And if you really want to use Python, don’t forget that Apify datasets can be downloaded in JSON and other formats that can easily be fed into other tools. Why reinvent the wheel and write a Python scraper for Twitter when you can use ours 😉

How to scrape data from Twitter — Apify’s step-by-step guide

The Apify platform has lots of ready-made scraping tools. The one we’re going to use is the Twitter Scraper.

1. First, you need to sign in at Apify https://apify.com/

Apify: the one-stop shop for all your web scraping, data extraction, and robotic process automation (RPA) needs

2. You can log in or sign up with your email address, or with a Google or GitHub account.

You can log in using your email, GitHub, or Google

3. Once you log in, you’ll be redirected to the Apify app. Click on the Store button. Apify Store is where you can find ready-made web scraping and automation tools.

Your Apify Dashboard will look something like this

4. When you’re on Apify Store, you can search for the Twitter Scraper actor. Apify actors are serverless cloud programs running on the Apify platform that can perform arbitrary computing jobs such as sending an email or crawling a website with millions of pages.

The Apify Store is filled with actors — cloud programs to help you scrape and automate

5. On the actor page, click Try for free and it will automatically redirect you back to the Apify platform.

Our Twitter Scraper — revised and updated for 2021

6. An actor Task will have been created back on your Apify Dashboard. This will enable you to set the parameters for your Twitter scraping run.

The input fields you can use to customize the Twitter Scraper

7. So let’s go to Twitter and find something to scrape. How about the Apify profile 😁 Just copy the URL or Twitter handle: @apify or https://twitter.com/apify

Apify’s Twitter — hey, why not go and follow us? Just go here: https://twitter.com/apify

8. Now you need to fill in the input fields for the scraper. There are lots of possible inputs, but we’ll keep it simple for this example, so you can just fill in a URL for a Twitter user, e.g. https://twitter.com/apify

Adding the start URL tells the scraper where to start scraping

But you can also change the following parameters:

  • Fill in the username you want to scrape.
  • Limit the number of max tweets to make everything go faster.
  • Select the types of tweet you’re interested in
  • Enter your credentials if you want to scrape a lot of information.
If you log in, you can get even more information out of Twitter

Once you’re ready, click on the “Save & Run” button and wait for the actor to finish its scraping run.

Just Save & Run to get going

9. As soon as you see that the run has “Succeeded”, you can check the results in the Dataset tab. In fact, you can even check the Dataset tab before the scraper has finished its run, if you’re curious to see how it’s doing 😁

Hurrah — another successful actor run!

10. The Dataset tab contains your data in lots of useful formats. You can access them by clicking on “View” or “Download”. You can share the data or use it however you want.

The Dataset tab is your window to lots of data in handy formats such as JSON, HTML and more

JSON format preview:

Data in JSON format

HTML format preview:

Data in HTML format

And that’s it — you can learn lots more about how to scrape Twitter by studying the readme documentation over on the Twitter Scraper.

Have fun and happy scraping!

Did you know?

  • 82% of B2B content marketers used Twitter for organic content marketing in the last 12 months
  • 27% of B2B content marketers used Twitter ads in the last 12 months

Stats from https://blog.hootsuite.com/twitter-statistics/



Great! Next, complete checkout for full access to Apify
Welcome back! You've successfully signed in
You've successfully subscribed to Apify
Success! Your account is fully activated, you now have access to all content
Success! Your billing info has been updated
Your billing was not updated