How to scrape Twitter (X.com) data using Python without Twitter API

Step-by-step guide to building a Python Twitter scraper without using Twitter API.

Content

We're Apify. We've created 1,600+ data extraction tools and unofficial APIs for popular websites, including Twitter (X). Check us out.

Starting off as a simple ‘microblogging’ system for users to share short posts called tweets, Twitter (now X.com) has more than 368 million users, and more than 500 million tweets are posted every day. As you might imagine, that means that there’s a lot of useful data behind those 280 characters just sitting around on Twitter.

How to scrape Twitter data using Python without using Twitter’s API
How to scrape Twitter data using Python without using Twitter’s API

For that reason, creating a Python script able to gather all that Twitter data can seem intimidating at first, especially considering the recent changes with Twitter API. But with the right steps, building a Python Twitter scraper can be a pretty straightforward process.

This tutorial will walk you through creating a web scraper using an open-source Python library. We'll start by setting up your environment, then move on to logging in and retrieving Twitter data, and finally storing and exporting this data.

🐦 Is Twitter scraping a good alternative to the official Twitter API?

Even though the official Twitter API provides structured data access, it also enforces rate limits, registration, authentication, an API key; and since 2023, it also comes with a hefty price. Using the Python Twitter library like Tweepy can simplify the authentication part. But what about the rest?

With the Twitter API having become less accessible, it makes more sense to rely on alternative methods to get data from Twitter. Web scraping allows you to do more with Twitter data than the API does.

Independent research has also found that web scraping has advantages over the Twitter API in terms of speed and flexibility.

Web scraping is faster than Twitter API, and it is more flexible in terms of obtaining data
Web Scraping versus Twitter API: A Comparison for a Credibility Analysis

So what are our options for web scraping here? Well, we'll be exploring these two in our guide:

  1. Build your own Twitter crawler in Python. It will act as an unofficial API.
  2. Use a ready-made web crawler like Twitter Scraper. It already works as an unofficial API.

We're not looking for easy ways out. So let's get into the thick of it with the first one. Let's build a Python crawler for Twitter.

1. Pick the right web scraping library for X.com

Let's start with our basics, Twikit. Twikit is an open-source library dedicated to scraping Twitter data in Python. It's essentially an unofficial Python Twitter API created with multiple purposes in mind: from scraping tweets and user info to posting tweets, as well as liking or following users.

Now, you might be wondering: why are we starting off with a library? Aren't we creating a web scraper from scratch? Of course, you can write a Python Twitter scraper from scratch. But web scraping these days requires far from just a few actions on a page.

The open-source library we'll be using as a base for our Python crawler for Twitter
The open-source library we'll be using as a base for our Python crawler for Twitter

If you want your Python Twitter scraper to be successful at its task, you need to strengthen it with something that can interact with web browsers and protect you from getting blocked (proxies, headers, IP address rotation, etc.) — a whole infrastructure of things. Which is what Twikit library will provide for us, so all we have left to do is build the scraper itself. So let's get started.

2. Set up your environment

Before creating a new project, we highly encourage you to create a virtual environment. Before you dive into writing the script, ensure you have Python installed on your computer.

To start with the project, just run pip command:

pip install twikit pandas

3. Import libraries

Now that the installation is done, the actual first part of your script will involve importing the libraries you'll be using. twikit is essential for interacting with Twitter data, json for JSON processing, and pandas for data handling.

from twikit import Client
import json
import pandas as pd

4. Initialize the client

To interact with Twitter data, you'll need to initialize a Client object from twikit. This object allows you to perform various operations like logging in, fetching tweets, etc. Make sure to replace 'en-US' with the relevant language/locale if necessary.

client = Client('en-US')

5. Login with provided user credentials

These days, to access most data on Twitter, you'll need to log in. Logging in with the sole purpose of scraping is frowned upon and discouraged by Twitter, but it's impossible to get any real Twitter data from behind Twitter's login wall without logging in. So for educational purposes, we're going to show you that it's possible with this open-source library:

client.login(auth_info_1='yourusername', password='yourpassword')
client.save_cookies('cookies.json')
client.load_cookies(path='cookies.json')

You will need to replace 'yourusername' and 'yourpassword' with your Twitter credentials.

🍪
If you want to know more about the topic, check out the Dealing with headers, cookies, and tokens section in our web scraping course.

📘
Tip: stay logged in and reuse login information.

In order not to endanger your account by needlessly logging in every time you need to scrape Twitter, you can opt for saving your cookies and reusing them each time. After logging in once and saving the cookies, you can comment out the login part and directly load the cookies in subsequent runs.

📙
Tip: do not send too many requests.

To avoid overloading the website, getting marked as suspicious, or getting blocked. Read about more reasons why it's advisable to stay logged in and tips on how to behave on the website.

6. Begin X.com scraping – scrape tweets with Python

Once you have an initialized Client, you can pick what data you want to scrape. The library offers many methods that include also creating new tweets, sending messages, etc. We are interested in the get part of the library.

To scrape tweets from a specific user, you'll need to use the get_user_by_screen_name method with the user's screen name as the parameter.

Then, you can extract a specified number of tweets using the get_tweets method. In our example, we're getting the last 5 tweets from the user with a public account zelenskyyua.

user = client.get_user_by_screen_name('zelenskyyua')
tweets = user.get_tweets('Tweets', count=5)

7. Store scraped X.com data

After getting the list of tweets from Twitter, you can loop through them and store the scraped tweet properties. Our example collects such properties as the creation date, favorite count, and full text of each tweet.

tweets_to_store = []
for tweet in tweets:
    tweets_to_store.append({
        'created_at': tweet.created_at,
        'favorite_count': tweet.favorite_count,
        'full_text': tweet.full_text,
    })

8. Analyze and save tweet data

Now that you have your tweet data, you can use the library we've imported before, pandas to convert it into a DataFrame. This step would make it easier for us to sort, filter, and analyze the extracted Twitter data. Let's first save this Twitter data to a CSV file.

df = pd.DataFrame(tweets_to_store)
df.to_csv('tweets.csv', index=False)
print(df.sort_values(by='favorite_count', ascending=False))

9. Export X.com data as JSON (or other format)

If you prefer the data in JSON format — say, for integration into web applications — you can convert your list to a JSON string and print it.

print(json.dumps(tweets_to_store, indent=4))

You can export in any format you want, actually: CSV, JSON, HTML.

10. See the full code for building an X.com scraper in Python

So this is what the full code looks like. Pretty neat, isn't it? It would have been way longer without installed libraries. But this is a great thing about programming: you can share your tools and build even better tools together.

from twikit import Client
import json
import pandas as pd

client = Client('en-US')

## You can comment this `login`` part out after the first time you run the script (and you have the `cookies.json`` file)
client.login(
    auth_info_1='yourusername',
    password='yourpassword',
)

client.save_cookies('cookies.json');
client.load_cookies(path='cookies.json');

user = client.get_user_by_screen_name('zelenskyyua')
tweets = user.get_tweets('Tweets', count=5)

tweets_to_store = [];

for tweet in tweets:
    tweets_to_store.append({
        'created_at': tweet.created_at,
        'favorite_count': tweet.favorite_count,
        'full_text': tweet.full_text,
    })

# We can make the data into a pandas dataframe and store it as a CSV file
df = pd.DataFrame(tweets_to_store)
df.to_csv('tweets.csv', index=False)

# Pandas also allows us to sort or filter the data
print(df.sort_values(by='favorite_count', ascending=False))

# We can also print the data as a JSON object
print(json.dumps(tweets_to_store, indent=4))

So there you have it, a step-by-step guide to scraping, storing, and exporting Twitter data using Python. Keep in mind that this script is simply a great starting point for developing more complex data analysis tools or integrating Twitter data into your applications.

11. Avoid getting blocked

So we're done, right? The issue with scraping nowadays is that it requires much more than just a scraper. Websites are more than happy to block you for visiting way too often, even if all you're doing is just copying some tweets into an Excel spreadsheet. That means you will need to gear up your Python scraper.

Web scraping: 10 tips on how to crawl without getting blocked.

💡 How to scrape X.com data  with a ready-made scraper

For reliable and convenient web scraping you'll need proxies, for monitoring results and scheduling scraper's runs – some webhooks, for plugging in scraped data without too much hassle – your own API or integrations, etc. Luckily, such comprehensive solutions already exist. Any scraper that lives on the Apify platform gets multiple lives.

With plenty of ready-made scraping tools at your fingertips, like the Twitter Scraper, just hop over to the Apify Store, choose a profile from X.com to scrape, and kick off the scraper. Once it's done, grab your extracted data in Excel, JSON, HTML, or CSV format, take a quick peek, and download it or stash it away for later. Feel free to tweak the input settings and explore the vast world of data waiting to be uncovered from tweets and other content on X.com!

And here's a short video demo showing you how to go about scraping Twitter the easy way:

❓FAQ

Can I scrape Twitter data using Python with the Twitter API?

Yes. For reliable Twitter data scraping, it's recommended that you use the official Twitter API combined with something to help parse the data, like the Tweepy library. Here's a basic example of using Twitter API in Python with Tweepy to scrape tweets from a user's timeline:

import tweepy

# Authenticate to the Twitter API
consumer_key = 'your_consumer_key'
consumer_secret = 'your_consumer_secret'
access_token = 'your_access_token'
access_token_secret = 'your_access_token_secret'

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

api = tweepy.API(auth)

# Use the API to fetch recent tweets from a user
tweets = api.user_timeline(screen_name='zelenskyyua', count=5)

for tweet in tweets:
    print(tweet.text)

Can you scrape Twitter with Python?

Yes. You can build an independent web scraper or web crawler using Python for any website, including Twitter.

You're not the first person to ask if Twitter scraping is legal, and you probably won't be the last. Since scraping basically automates tasks that could be done manually by a human, it is legal. So provided that you are only obtaining data that is openly available, the answer is yes.

On top of that, in 2022, US Ninth Circuit Court of Appeals confirmed this with a ruling that scraping publicly accessible data is legal. The court's decision further confirms that everything posted on the internet is fair game for crawling and scraping. Recently, web scraping made it into the public eye during the defamation trial of Johnny Depp v. Amber Heard as a method of investigation.

Does Twitter ban scrapers?

Yes, X.com prohibits scraping its platform without permission, and it actively bans scrapers that violate its terms of service. Although it isn't illegal, according to Twitter's Terms of Use, unauthorized scraping may result in suspension or termination of access.

📜
If you want to know more about the topic, check out the Are Terms of Use enforced? article written by our lawyers.

What can Twitter data be used for?

For a Twitter user or marketer, access to data about how others engage with their tweets can be vital for developing a brand. For companies, gathering data across Twitter can provide them gain a competitive advantage. Academic researchers and journalists can make use of the data to understand how people interact and identify trends before they rise to the surface. Once you have the data, what you do with it is up to you.

🐦 Want to scrape X.com data in other ways?

Apify Store also offers other Twitter (X.com) scrapers to carry out smaller scraping tasks. You only need to insert a keyword or a URL and start your run to extract your results, including Twitter followers, profile photos, usernames, tweets, images, and more.

The full range of Twitter scrapers includes:

🅇 X Twitter 🐦 Twitter Scraper 📱 Tweet Scraper
🔗 Twitter URL Scraper 🎥 Twitter Video Downloader Best Twitter Scraper
🧞‍♂️ Twitter Profile Scraper 🛰️ Twitter Spaces Scraper 🧭 Twitter Explorer
David Barton
David Barton
Apifier since 2016 so learned about web scraping and automation from the experts. MSc in Computer Science from TCD. Former game designer and newspaper production manager. Now Head of Content at Apify.

Get started now

Step up your web scraping and automation