We're Apify. You can build, deploy, share, and monitor any scrapers on the Apify platform. Check us out.
In this article, you’ll learn what a headless browser is, how to use it, and what tasks it’s suitable for.
What is a headless browser?
A headless browser is a web browser without a GUI (graphical user interface). That means you won’t see any graphical components such as buttons or clickable windows.
In more technical terms, instead of rendering web pages on screen, a headless browser provides access to web page content and functionality through a command-line interface (CLI) or application programming interface (API) to perform an action on the page. Or better said, through specialized libraries that incorporate the APIs. But more about that later.
What is a headless browser used for?
Headless browsers are much faster than normal browsers. Therefore, software developers love to use them to simplify their work. What for, exactly? There’s a bunch of use cases. Such as…
1. Automated testing
You can implement headless browsers as an automation tool for test case executions. That means a headless browser can behave like a real user—do mouse clicks, simulate keyword typing, or even check submission forms.
2. Performance testing
Another thing you can test with a headless browser easily is performance. Do you have any performance tasks without UI interaction that you need to test? Do it with a command line using only a headless browser.
3. Web scraping and data extraction
Headless browser scraping is one of the most efficient ways to extract data from websites with dynamic content. It can load JavaScript, interact with elements, and simulate user actions, making it ideal for dynamic pages. However, for static websites, simpler tools like Requests and Beautiful Soup are often quicker and more resource-efficient since they don't need to render the entire website.
Examples of headless web browsers
- Chromium is probably the most popular browser that can run headless. It wasn't the first headless browser, but it was the first full-featured one. Since many current browsers like Edge or Brave are based on Chromium, it means that you can run those headless as well.
- Google Chrome is built on Chromium, meaning it can also run headless. It's a little bulkier, and headless features come to it later, but it's easier to mimic a real user with it.
- Firefox can also run headless. Thanks to its native privacy settings, it's a great choice for blending in with other traffic or testing privacy features.
- WebKit (Apple Safari) is the open-source browser engine behind Apple Safari. It can also run headless, which makes it especially useful for testing whether your website runs well on Safari.
- Splash is a headless web browser designed specifically for web scraping. It offers an HTTP API, Lua scripting support, and a built-in web-based IDE. While it’s not as widely adopted due to its limited browser emulation capabilities, Splash is effective for many web scraping scenarios, especially when used with Scrapy.
- HtmlUnit is a headless browser in Java used mostly to test e-commerce websites. However, it can also serve as a great tool for any project written in Java. It can simulate and test clicking, logging in, submitting forms, and more.
- Phantom JS was one of the first headless browsers written in JavaScript. However, it’s been deprecated for a few years now.
Libraries that control headless browsers (drivers)
Remember reading that headless browsers access the pages via API in the beginning? Specialized libraries were mentioned. What are these?
Headless browsers access web pages via specialized libraries that encapsulate the headless browser's API into a format that's easier to work with.
These libraries are sometimes called drivers and are often confused with headless browsers. Let's take a look at the top three.
- Playwright
- Selenium
- Puppeteer
- Playwright is an open-source library built by Microsoft to automate Chromium, WebKit, and Firefox browsers with a unified API. It's available in many programming languages, including JavaScript (Node.js), Python, Java, and more. In our opinion, Playwright is by far the best library (aka driver) for running headless browsers these days. It’s exceptionally useful for testing. It allows running tests concurrently across multiple browsers at the same time.
Check also: How to scrape the web with Playwright
- Selenium is an open-source suite of tools to automate web browsers across multiple platforms. Selenium is the king in terms of usage and community. You can use it in Java, Python, JavaScript, C#, Ruby, and even Perl. In comparison with Playwright, Selenium can be slow and clunky sometimes.
- Puppeteer is an open-source Node.js library by Google that automates Chromium and Chrome. Puppeteer is maintained by people close to the Chromium team. With Puppeteer, you can easily write in JavaScript with your preferred IDE (Integrated Development Interface). You should not forget about Puppeteer, especially if you need to take screenshots, as it’s the best-equipped library for that. You can use specialized tools such as Puppeteer-screenshot-tester to compare the taken screenshots.
Playwright vs. Puppeteer: which is better?
How to scale Puppeteer and Playwright
Headless browser testing
What is a headless browser testing?
Use headless browser testing when you need to perform end-to-end tests without loading the user interface. That being said, you’ve probably already figured out that headless browser testing is great for finding out whether all required functionalities of your application work, and you want to do it fast.
When to use headless browser testing?
You should add headless browser testing to your testing kit, particularly in the following situations:
- Taking screenshots
- Scraping website content
- Handling Ajax requests
- Making JavaScript execution
- Network monitoring
- HTML responses automation
Limitations of headless browser testing
- Visual elements
Of course, you can’t test visual elements without a GUI. If you need to do so, you should try UI-driven tests.
- Different bugs for headless browsers and normal browsers
Without rendering UI elements, the headless browser might sometimes behave differently than the traditional browser. It’s crucial to differentiate between important bugs that need your attention and minor ones that won’t occur when using a normal browser.
How to execute headless browser testing
Headless browser testing execution differs depending on the language and library you use. However, when using the ones listed above, running tests headless won’t be a problem.
For example, let’s take a look at how to execute headless browser testing with Playwright.
Are there any extra steps you need to take? Not at all. Tests in Playwright run in headless mode by default. So the only command you need to write is:
npx playwright test
Web scraping with headless browsers
When web scraping with a headless browser, you can do much more in comparison to making HTTP requests for static content.
You can programmatically do anything a human could do with a browser (click elements, take screenshots, type into text areas), and since headless browsers are capable of loading the JavaScript contained on a website, dynamic content can be rendered, interacted with, and scraped.
How to detect headless browsers
Web scraping is ethical and legal, but that does not prevent website owners from employing anti-scraping measures in order to protect their data from being extracted.
Understandably, they only want real users using real web browsers on their websites. Headless browsers are one of the ways you can get your bots to emulate real users, which is why websites try to detect bots acting under the guise of a headless browser.
Websites detect that you’re using headless Chrome or a similar headless browser by finding small discrepancies in your browser’s behavior. Here are the most common strategies:
- They check the user-agent header your browser is sending
- They check other HTTP headers and header consistency
- They check your browser's JavaScript web APIs, such as Navigator.
Can you make a headless browser undetectable?
There are two ways you can solve the above problems and make Chrome Headless or any other headless browser undetectable.
- Modify the user-agent
- Change your browser fingerprint
Modifying the user agent is fast and easy, and it can work against naive or old anti-scraping protections. However, in most cases, you'll have to pair it with updating the fingerprint as well. You can do this easily with Crawlee, an open-source web scraping library that provides this feature with zero configuration necessary.
FAQ
Which headless browser is best?
The best headless browsers are:
- Chromium
- Google Chrome
- Firefox
- WebKit
- Splash
- HtmlUnit
Each of them can serve as a good servant when doing automation tests or web scraping. However, it depends purely on the nature of your project to decide which headless browser you are gonna use.
Is a headless browser faster?
Yes, headless browser is much faster than a regular one. When using headless browsers, pages render without GUI. Thanks to that, headless browsers are faster and, therefore, are used widely for automation testing or web scraping.
Is Chromium a headless browser?
Yes, Chromium is a headless browser. Moreover, it’s probably the most popular one, and other browsers, such as Edge or Brave, are based on It.
Is Chrome a headless browser?
Yes, Chrome can also be run as a headless browser because it’s based on Chromium. However, you can decide whether to run Chrome as a headless browser or a normal one.
Can a headless browser be detected?
Yes, a headless browser can be detected. Websites detect headless browsers thanks to small differences between human behavior and the behavior of a headless browser.
How do I know if my browser is headless?
To know if your browser is headless or not, you can check the presence of GUI. If it’s missing, your browser is headless. However, to make sure, you can simply look at the list of headless browsers above.
What is the difference between a headless browser and a browser?
The difference between a headless browser and a browser is that a headless browser doesn’t render a GUI (graphical user interface). That means no graphical icons, buttons, or widgets are displayed.
What is the difference between Chrome and Chrome Headless?
The difference between Chrome and Chrome Headless is that Chrome Headless lets you render pages without a GUI (graphical user interface). It allows you to programmatically access web page content faster but without a graphic.
What is a headless browser: Conclusion
A headless browser is a faster type of browser that doesn’t display a graphical interface. This makes it a great tool for software developers’ tests and web scraping projects.
Examples of popular headless browsers include Headless Chrome and Splash, but almost any modern browser can now be run headless. The most popular libraries for controlling headless browsers are Puppeteer, Playwright, and Selenium.
Headless browsers can be detected. However, you can modify the user agent or change your browser fingerprint to avoid that.