Yeah, you’ve read it right - it is now possible to scrape video content as well! You probably don’t need an introduction to YouTube, since:
- An average viewer aged 18+ spends around 40 minutes on YouTube every day
- YouTube has over 2 billion monthly logged-in users
- 500+ hours of content are uploaded to YouTube every minute.
- The platform is represented in 100+ countries around the world, in 80 languages
YouTube has become an essential tool for creators and marketers. It is the go-to platform for information sharing and product or service promotions in video format. It is understandable why more and more people are diving into the video trend. Free online video editors are empowering newbies and pros alike to make YouTube videos with utmost ease and convenience.
The huge amount of content uploaded and watched ranges from DIYs, movie reviews, and live streams to controversial debates, lectures, and pure random entertainment. It’s a treasure trove of data in visual form just waiting to be discovered. And the best news here is: now all that YouTube data is scrapable and we’ll show you how it’s done.
Why scrape YouTube?
Here are a couple of reasons why collecting data from YouTube is useful and versatile:
- The right data makes it easy to calculate the frequency of brand mentions, audience reach, and their reaction. For example, businesses can use the info to count ROI for advertisement or referrals from YouTube channels and scale their marketing campaigns accordingly. Or simply monitor YouTube for brand awareness and general web reputation.
- Dissecting big news topics and analyzing sentiment - we’re mostly talking about the infamous YouTube comment section here. Of course, this approach concerns only digital citizens, not real people, but a noticeable reaction on the web can be telling.
- By the same principle, you can search, pick out, analyze and delay the spread of fake news, bot activity, as well as illegal or harmful content.
- Collect data for any kind of research; identify and follow emerging trends or topics and even predict new ones: globally or by country and language.
- Following similar logic, you as a consumer can find reviews of products and services you consider buying and make better choices: this is true for anything from pianos to gardening supplies.
Is it legal to scrape YouTube?
Most data found on YouTube is accessible to the general public, making it legal to scrape. But it’s still important to comply with regulations that deal with personal data and copyright protection. To learn more about the legal context of web scraping, check out our blog article on the subject.
What about YouTube API?
Youtube does have its own API enabling you to do some basic content search and collect data from each video. However, the YouTube API has significant limitations: YouTube scraping is limited to video data, subscriptions, recommendations, ranking, and ads. In addition, YouTube API has a strong anti-scraping system in place, and it requires you to log in and imposes quota limits.
It seems like what you need is some kind of scraper that is flexible enough for various parameters, simple enough to use, but also strong enough to withstand the anti-bot blocking. Sure, you can try your hand at coding your own scraper. But why reinvent the wheel when you can try our ready-made tools, like our YouTube Scraper.
Our YouTube Scraper will scrape your searches easily by cherry-picking data from the selected YouTube URL page. It’s powered by our Puppeteer scraper tool and will enable you to scrape channels, all their videos, and their details, as well as fine-tune your search. A unique new feature here is that you can now scrape not only basic video- and algorithm-related info, but also comments and subtitles, which opens a whole new dimension to analyzing video data. You can scrape both auto-generated and added captions added in srt format in various languages, making your scraping possibilities close to limitless.
So here’s a short step-by-step tutorial on how our Apify tools can carry out web scraping on YouTube. In this article, we’ll be using the terms tool, API, scraper and actor interchangeably as they all mean the same thing. We’ll be using a ready-made tool called YouTube Scraper which was created specifically to scan and extract the data we need.
Step-by-step tutorial on how to get data from YouTube
1. Start by going to the actor's page and clicking the Try for free button. You will be redirected to Apify Console, which is your workspace to run tasks for your scrapers. If you already have an Apify account and are logged in, go to Step 3.
2. If you are not signed in, you’ll find yourself on the sign-up page (if you are already signed in, skip to Step 3). Sign up using your email account, Google, or GitHub. You will be redirected to the scraper’s page on your Apify Console.
3. Let's say you are looking for Chopin-related content. You can type the keyword "Chopin" into the actor's input field, as shown below.
Alternatively, you can open YouTube on a separate browser window, search for the keyword "Chopin" and, once you have the results, copy the URL from the browser and paste it into the actor's URL field. There are other fields you can play with, such as the maximum number of results, subtitles, etc.
4. Once you are all set, click the Start button. Notice that your task will change its status to Running, so wait for the scraper's run to finish. It will be just a minute before you see the status switch to Succeeded.
5. Move to the Dataset tab to see the results of your scraping. Explore the Dataset tab containing your scraped data in many formats, including HTML table, JSON, CSV, Excel, XML, and RSS feed.
6. Preview the data by clicking the Preview button or viewing it in a new tab if the dataset is too large. You can choose to download it onto your computer for further use as spreadsheets or in other apps and your projects.
You can then return and make another run for this scraper, with different parameters this time, and see what kind of data you can catch. Use the pasting URL step and experiment with input configurations to make your searches more specific.
Here are some input parameters you can configure before running the scraper:
- searchKeywords - query to search YouTube for
- maxResults - how many videos should be loaded from each search or channel, the default is 50
- postsFromDate - how far back in YouTube’s history to go, default is 5 years ago. You can go as far as minutes, hours, days, weeks, and months
- startUrls - initial YouTube URLs, you can provide search, channel, or videos URLs
Depending on how you set up your inputs, here are some of the possible outputs you might get in your dataset as a result:
- viewCount - amount of views of the videos you’d scraped
- uploadDate - from the oldest to the newest
- likesCount or dislikesCount
- durationStr - how long the video lasts
- description - anything related in the short description under the video
- subtitles - closed captions scraping - both autogenerated and added manually
- comments and more - comment section scraping
Important note: usually you will need to use a proxy to scrape YouTube or the actor might get blocked. Luckily, your free Apify account comes with a free trial of Apify Proxy, so that should help you to get started with web scraping YouTube.