Yeah, you’ve read it right - it is now possible to scrape video content as well! You probably don’t need an introduction to YouTube, since:
- An average viewer aged 18+ spends around 40 minutes on YouTube every day
- YouTube has over 2 billion monthly logged-in users
- 500+ hours of content are uploaded to YouTube every minute.
- The platform is represented in 100+ countries around the world, in 80 languages
The huge amount of content uploaded and watched ranges from DIYs, movie reviews and live streams to controversial debates, lectures and pure random entertainment. It’s a treasure trove of data in visual form just waiting to be discovered. And the best news here is: now all that YouTube data is scrapable and we’ll show you how it’s done.
Why scrape YouTube?
Here’s a couple of reasons why collecting data from YouTube is useful and versatile:
- The right data makes it easy to calculate the frequency of brand mentions, audience reach and their reaction. For example, businesses can use the info to count ROI for advertisement or referrals from YouTube channels and scale their marketing campaigns accordingly. Or simply monitor YouTube for brand awareness and general web reputation.
- Dissecting big news topics and analyzing sentiment - we’re mostly talking about the infamous YouTube comment section here. Of course, this approach concerns only digital citizens, not real people, but a noticeable reaction on the web can be telling.
- By the same principle, you can search, pick out, analyze and delay the spread of fake news, bot activity, as well as illegal or harmful content.
- Collect data for any kind of research; identify and follow emerging trends or topics and even predict new ones: globally or by country and language.
- Following similar logic, you as a consumer can find reviews of products and services you consider buying and make better choices: this is true for anything from pianos to gardening supplies.
What about YouTube API?
Youtube does have its own API enabling you to do some basic content search and collect data from each video. However, the YouTube API has significant limitations: YouTube scraping is limited to video data, subscriptions, recommendations, ranking, and ads. In addition, YouTube API has a strong anti-scraping system in place, and it requires you to log in and imposes quota limits.
It seems like what you need is some kind of scraper that is flexible enough for various parameters, simple enough to use, but also strong enough to withstand the anti-bot blocking. Sure, you can try your hand at coding your own scraper. But why reinvent the wheel when you can try a ready-made tools, like our YouTube Scraper.
Our YouTube Scraper will scrape your searches easily by cherry-picking data from the selected YouTube URL page. It’s powered by our Puppeteer scraper tool and will enable you to scrape channels, all their videos and their details, as well as fine-tune your search. A unique new feature here is that you can now scrape not only basic video- and algorithm-related info, but also comments and subtitles, which opens a whole new dimension to analyzing video data. You can scrape both auto-generated and added captions added in srt format in various languages, making your scraping possibilities close to limitless.
So here’s a short step-by-step tutorial on how our Apify tools can carry out web scraping on YouTube. In this article, we’ll be using the terms tool, API, scraper and actor interchangeably as they all mean the same thing. We’ll be using a ready-made tool called YouTube Scraper which was created specifically to scan and extract the data we need.
Step-by-step tutorial on how to get data from YouTube
- Go to Apify’s website: http://apify.com
2. Sign in at the top-right corner using your email account, Google, or GitHub.
3. Once you’re all set with the account, head over to Apify Store in the Solutions tab.
4. You’ll be redirected to Apify Store, our collection of ready-made scraping tools called actors.
5. Find YouTube Scraper in the Videos section or by typing “youtube” into the search bar. You can always come back to the Store later on to explore other useful actor tools. Click on the YouTube Scraper card.
6. By clicking on the YouTube card, you’ll be redirected to this scraper’s own page, where you can see the actor’s description and main features in its Readme, customizable parameters and even source code.
7. When you’re ready, find the blue Try me button and click on it.
8. You’ll be redirected to your Apify Dashboard. You can explore it later once you’ve learned how to scrape YouTube.
9. Notice the task automatically created for your YouTube actor. Now, think of your search query. Let’s say, you’re looking for everything Chopin-related on YouTube. Type Chopin into the first field and click Save & Run.
10. Alternatively, search for Chopin on YouTube itself, type it in and press Enter. And then copy-paste the resulting URL into the field.
11. When you’re done setting up your scraping parameters, slide down and click the Save & Run button. The actor will start the scraping process and you’ll notice its status as Running.
12. It might take a few minutes to complete the scraping process, so go grab an apple or something. But upon your return, you should see that the actor has Succeeded. You can then click the Dataset tab to see what you’ve got there.
13. Now you can see and download your scraping results. It might not look like it at first glance, but this is a nice little library of mentions of Chopin all over YouTube. Nice job!
14. We’ve picked JSON format for a preview as it’s the most universal one. You can choose to export your results in other formats as well: an Excel spreadsheet, HTML table, XML, CSV, RSS. Just pick the one you prefer and click Download.
You can then return and make another run for this scraper, with different parameters this time, and see what kind of data you can catch. Use the pasting URL step and experiment with input configurations to make your searches more specific.
Here’s some input parameters you can configure before running the scraper:
- searchKeywords - query to search YouTube for
- maxResults - how many videos should be loaded from each search or channel, the default is 50
- postsFromDate - how far back in YouTube’s history to go, default is 5 years ago. You can go as far as minutes, hours, days, weeks and months
- startUrls - initial YouTube URLs, you can provide search, channel or videos URLs
Depending on how you set up your inputs, here’s some of the possible outputs you might get in your dataset as a result:
- viewCount - amount of views of the videos you’d scraped
- uploadDate - from the oldest to the newest
- likesCount or dislikesCount
- durationStr - how long the video lasts
- description - anything related in the short description under the video
- subtitles - closed captions scraping - both autogenerated and added manually
- comments and more - comment section scraping
Important note: usually you will need to use a proxy to scrape YouTube or the actor might get blocked. Luckily, your free Apify account comes with a free trial of Apify Proxy, so that should help you to get started with web scraping YouTube.