In this guide, you will learn how to create a TikTok scraper in Python, step-by-step. It doesn’t require any prior experience in web scraping, and as long as you’re able to follow the steps, you will have built a Python TikTok scraper by the end of the guide.
Yes, you can scrape TikTok using Python. Scraping TikTok using Python involves writing code to automatically collect information from the platform, such as video details, user profiles, or comments. You can create Python scripts that navigate through TikTok, find the data you need, and save it for later use. This can help you gather large amounts of information quickly, and it’s especially useful for projects like analyzing trends, studying user behavior, research, or marketing.
Here are the steps you should follow to build your TikTok scraper.
Set up your environment
Understand TikTok’s web structure
Write the Scraper code
Deploy on Apify
1. Setting up your environment
Here are what you’d need to continue with this tutorial: • Python 3.5+: Make sure you have installed Python 3.5 or higher and set up the environment. • Installing and importing required libraries: Run the command below on the terminal to install the required libraries— selenium , pandas and webdriver-manager .
pip install selenium webdriver-manager pandas
To import the libraries to your script, use the following code
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
import pandas as pd
# Importing built-in libraries
import re
import time
2. Understanding TikTok’s web structure
Before scraping any website, it’s important to understand its web structure and where you can scrape data from.
💡
To look at a website's web structure/HTML structure, right-click on any element and click “Inspect”. Or else, you can press the F12 key to open Developer Tools.
The two TikTok pages mainly contain useful information for web scraping purposes.
Data available: Trending videos, including their hashtags, uploaded users, number of likes, and views count.
Page structure:
This page consists of a card-like structure which consists of trending videos on TikTok. These are inside a <div>container with the class of tiktok-559e6k-DivItemContainer e1aajktk28
Inside each video card is a <strong> element which includes the total view count, with the class tiktok-ksk56u-StrongLikes e1aajktk9 .
The bottom of the video card includes the description, which includes the relevant hashtags with the class tiktok-1anth1x-DivVideoDescription.e1aajktk10. Please note that both the video description text and hashtags are included in one class, which could be a problem when they should be extracted separately.
The user name is nested within an <a> tag inside the parent <div>, with the <a> tag containing a <p> element having the attribute data-e2e="video-user-name" (XPath: ..//a/p[@data-e2e="video-user-name"]).
The like count is found within a nested span element, where the outer span has the class tiktok-10tcisz-SpanLikes e1aajktk13 and the inner span has the class tiktok-dqro2j-SpanLikeWrapper e1aajktk24 (XPath: ..//span[contains(@class, "tiktok-10tcisz-SpanLikes") and contains(@class, "e1aajktk13")]).
Remember: We use XPATH instead of normal CSS Selectors to navigate complex structures— like the username and likes count in the nested <div> container— more accurately and easily.
Data available: Follower count, following count, and likes count of a specific user.
Page structure:
Username is an <h1> element with the attribute css-1xo9k5n-H1ShareTitle e1457k4r8
Followers count is within an <div> element with the class css-1ldzp5s-DivNumber e1457k4r1.
The following count is inside an <strong> element with the attribute data-e2e="following-count".
The likes count is also in an <strong> element with the attribute data-e2e="likes-count".
The bio is located within an <h2> element with the attribute data-e2e="user-bio".
3. Writing the scraper code
Now that you know how to access the required elements on the TikTok website, we can start writing the code.
While the libraries requests and beautifulsoup are commonly used for web scraping, it’s not the the best option when it comes to websites with dynamically loaded content, like TikTok. That's why it’s best to use a web driver like ChromeDriver along with Selenium.
💡
A web driver is a tool that can be used to automate web browsers. It allows developers to control a web browser via code, to perform actions like scrolling, opening links and pages, etc.
You can either download ChromeDriver manually or use the following code, which includes the install() function to download and set up ChromeDriver. However, make sure you have the Chrome browser installed already.
# Initializing the WebDriver using webdriver-manager
service = Service(ChromeDriverManager().install())
options = webdriver.ChromeOptions()
driver = webdriver.Chrome(service=service, options=options)
After initializing the web driver, you can use the following function to extract data, using the selectors mentioned in the previous section.
Scraping the Trending Now page
As mentioned before, the information on this page has a bit of a complex structure. Therefore, we will need two types of selectors to scrape data: CSS selectors and XPath.
Furthermore, since the video description text and hashtags both have one class, you will need functions to do the following:
Detect and separate hashtags: Hashtags can be detected by the # symbol. If a word starts with # and there’s no space between the first and the rest of the characters, it’s a Hashtag.
Remove hashtag from description text: One detected the re library can replace hashtags with blank text in the description text.
Here are the two functions for the above tasks:
# Function to extract hashtags from description
def extract_hashtags(description):
hashtags = re.findall(r'#\\w+', description)
return hashtags
# Function to remove hashtags from description
def remove_hashtags(description):
return re.sub(r'#\\w+', '', description).strip()
Once the above functions are ready, you can write the rest of the code as below:
# Function to scrape trending videos
def scrape_trending_videos_with_selenium(url):
driver.get(url)
time.sleep(5) # Waiting for the page to load
videos = []
scroll_pause_time = 2 # Pausing to allow content to load
while len(videos) < 50:
video_description = driver.find_elements(By.CSS_SELECTOR, 'div.tiktok-1anth1x-DivVideoDescription.e1aajktk10')
for video in video_description[len(videos):]:
video_data = {}
description_element = video.find_element(By.XPATH, '..')
description_text = video.text
video_data['Description'] = description_text # Capture description with hashtags
# Extracting views
if description_element.find_elements(By.CSS_SELECTOR, 'strong.tiktok-ksk56u-StrongLikes.e1aajktk9'):
video_data['Views'] = description_element.find_element(By.CSS_SELECTOR, 'strong.tiktok-ksk56u-StrongLikes.e1aajktk9').text
else:
video_data['Views'] = 'N/A'
# Extracting username
try:
user_element = description_element.find_element(By.XPATH, '..//a/p[@data-e2e="video-user-name"]')
video_data['User'] = user_element.text
except Exception as e:
video_data['User'] = 'N/A'
# Extracting likes
try:
likes_element = description_element.find_element(By.XPATH, '..//span[contains(@class, "tiktok-10tcisz-SpanLikes") and contains(@class, "e1aajktk13")]')
video_data['Likes'] = likes_element.text.split()[-1] # Extract the last part assuming it's the like count
except Exception as e:
video_data['Likes'] = 'N/A'
videos.append(video_data)
if len(videos) >= 50:
break
# Scrolling down to load more videos
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(scroll_pause_time) # Wait for new videos to load
return videos
# URL for the page
trending_videos_url = 'https://www.tiktok.com/channel/trending-now?lang=en'
trending_videos = scrape_trending_videos_with_selenium(trending_videos_url)
# Convert to DataFrame and save to CSV
# (for easier conversion and better accessibility)
df_videos = pd.DataFrame(trending_videos)
df_videos.to_csv('trending_videos_selenium.csv', index=False)
print('Data scraped and saved to trending_videos_selenium.csv')
driver.quit()
Result:
Scraping user pages
Unlike the trending now page, TikTok user pages follow a simple structure that can be navigated easily using CSS selectors.
# Function to scrape user pages
def scrape_user_page_with_selenium(url):
driver.get(url)
time.sleep(5) # Waiting for the page to load
user_data = {}
# Extracting follower count
try:
follower_count = driver.find_element(By.CSS_SELECTOR, 'strong[data-e2e="followers-count"]').text
user_data['Follower Count'] = follower_count
except Exception as e:
user_data['Follower Count'] = 'N/A'
# Extracting following count
try:
following_count = driver.find_element(By.CSS_SELECTOR, 'strong[data-e2e="following-count"]').text
user_data['Following Count'] = following_count
except Exception as e:
user_data['Following Count'] = 'N/A'
# Extracting likes count
try:
likes_count = driver.find_element(By.CSS_SELECTOR, 'strong[data-e2e="likes-count"]').text
user_data['Likes Count'] = likes_count
except Exception as e:
user_data['Likes Count'] = 'N/A'
# Extracting bio
try:
bio = driver.find_element(By.CSS_SELECTOR, 'h2[data-e2e="user-bio"]').text
user_data['Bio'] = bio
except Exception as e:
user_data['Bio'] = 'N/A'
# Extracting username
try:
username = driver.find_element(By.CSS_SELECTOR, 'h1[data-e2e="user-title"]').text
user_data['Username'] = username
except Exception as e:
user_data['Username'] = 'N/A'
return user_data
# List of user page URLs to scrape
user_page_urls = [
'https://www.tiktok.com/@google',
'https://www.tiktok.com/@nba',
# Add more URLs if needed
]
# Scrape data for each user page
user_data_list = []
for url in user_page_urls:
user_data = scrape_user_page_with_selenium(url)
user_data['URL'] = url # Add URL to the data for reference
user_data_list.append(user_data)
# Convert to DataFrame and save to CSV
df_user = pd.DataFrame(user_data_list)
df_user.to_csv('user_pages_selenium.csv', index=False)
print('Data scraped and saved to user_pages_selenium.csv')
driver.quit()
Result:
4. Deploying to Apify
There are several reasons for deploying your scraper code to a platform like Apify. In this specific case, deploying code to Apify helps you automate the data extraction process efficiently by regulating data collection and it also offers a convenient way to store and download your data.
Note that you would have to make some changes to the previous script to make it Apify-friendly. Such changes are: importing Apify SDK, adding headless options for web drivers, and updating input/output handling. You can find the modified script on GitHub.
#4. Create the Dockerfile and requirements.txt
Dockerfile:
FROM apify/actor-python
# Install system dependencies
RUN apt-get update && apt-get install -y \\
wget \\
unzip \\
libnss3 \\
libgconf-2-4 \\
libxss1 \\
libappindicator1 \\
fonts-liberation \\
libasound2 \\
libatk-bridge2.0-0 \\
libatk1.0-0 \\
libcups2 \\
libgbm1 \\
libgtk-3-0 \\
libxkbcommon0 \\
xdg-utils \\
libu2f-udev \\
libvulkan1 \\
&& rm -rf /var/lib/apt/lists/*
# Install ChromeDriver
RUN wget -q -O /tmp/chromedriver.zip <https://chromedriver.storage.googleapis.com/114.0.5735.90/chromedriver_linux64.zip> \\
&& unzip /tmp/chromedriver.zip -d /usr/local/bin/ \\
&& rm /tmp/chromedriver.zip
# Install Google Chrome
RUN wget -q -O - <https://dl-ssl.google.com/linux/linux_signing_key.pub> | apt-key add - \\
&& sh -c 'echo "deb [arch=amd64] <http://dl.google.com/linux/chrome/deb/> stable main" >> /etc/apt/sources.list.d/google-chrome.list' \\
&& apt-get update \\
&& apt-get install -y google-chrome-stable \\
&& rm -rf /var/lib/apt/lists/*
# Copy all files to the working directory
COPY . ./
# Install necessary packages
RUN pip install --no-cache-dir -r requirements.txt
# Set the entry point to your script
CMD ["python", "main.py"]
requirements.txt:
selenium
webdriver_manager
apify-client
#5. Deploy to Apify
Type apify login to log in to your account. You can either use the API token or verify through your browser.
Once logged in, type apify push and you’re good to go.
To run the deployed Actor, go to Apify Console > Actors > Your Actor. Then click the “Start” button, and the Actor will start to build and run.
To view and download output, click “Export Output”.
If required, you can select/omit sections, and download the data in different formats such as CSV, JSON, Excel, etc.
And that’s it. You have successfully built and deployed a TikTok web scraper.
Does TikTok have a Python API?
TikTok does have a Python API, but unfortunately, it has some restrictions and other drawbacks, especially when it comes to data access and functionality. For example, the Python API for TikTok has a rate limit and a data scope limit (no access to comments, and other user interactions).
However, using a ready-made Actor on Apify will save you the hassle. It does not have the limitations of rate limits or data access, providing more comprehensive data scraping capabilities.
from apify_client import ApifyClient
client = ApifyClient('YOUR_API_TOKEN')
You can find your API Token in Apify Console > Settings > Integrations.
Give the Actor input and create a new run for the Actor.
run_input = {
# Feel free to add other queries.
"hashtags": ["api"],
"resultsPerPage": 100,
}
# Run the Actor and wait for it to finish
run = client.actor("clockworks/free-tiktok-scraper").call(run_input=run_input)
Print the results from the dataset:
dataset_items = client.dataset(run["defaultDatasetId"]).list_items()
for item in dataset_items.items:
print(item)
Give the Actor input and create a new run for the Actor.
Once run, the above code would scrape TikTok video information with the hashtag “#api”. You can also add other parameters if needed.
Scrape TikTok with ready-made Actors
As you can see, using an Actor in Apify Console is even easier than using the APIs, and it doesn’t need code either.
Here's a list of ready-made, no-code-required Actors to easily scrape TikTok.
For more information on how to use these no-code solutions, visit Apify Docs.
A quick recap
We've demonstrated two effective methods of scraping TikTok data: 1) using Python code, and 2) using an Apify TikTok scraper via API. Whichever approach you prefer, what you've learned here will surely help you in your own TikTok scraping projects. Good luck!