Having the ability to capture screenshots and generate PDFs automatically is a valuable skill for developers, whether you're web scraping, making reports, testing, monitoring content, or archiving web pages. In this article, you'll learn how to use Puppeteer, a powerful Node.js package, to easily capture screenshots and generate PDFs from web pages.
What you need to get started
All you need is a basic knowledge of Node.js and JavaScript. If you don't have Node.js installed on your computer, head over to the official Node.js website, download and install the latest stable version. You can also follow this guide to help you properly install Node.
Why choose Puppeteer?
Before getting into the technical details, let's have a look at why Puppeteer is such a great tool for capturing screenshots and generating PDFs. Puppeteer is a headless browser library that provides a high-level API for controlling Chrome or Chromium browsers. Because of its versatility and ease of use, it is a popular tool for automating browser tasks.
Puppeteer provides a perfect environment for such jobs, whether you need to navigate web pages, interact with elements, or perform specific actions.
6 reasons to use Puppeteer to take screenshots and make PDFs
Automated testing and quality assurance Developers can use the Puppeteer screenshots method at various phases of a web application's functionality. These screenshots can be used for automated testing to ensure that user interface elements, layouts, and interactions are consistent across devices and browsers.
Visual regression testing When making updates to a website, it's critical to avoid unintentional visual modifications. The ability of Puppeteer to take screenshots makes it a fantastic tool for performing visual regression tests. Developers can rapidly spot visual inconsistencies by comparing screenshots before and after changes.
Content monitoring and archiving Puppeteer allows developers to capture screenshots or produce PDF snapshots of web pages on a regular basis. This is especially useful for content monitoring, tracking changes in online material, and building web page archives for historical purposes.
Reporting and documentation Puppeteer makes it easier to generate PDF reports from web pages. By capturing screenshots of pertinent data and content, developers can produce informative and visually appealing reports. These PDF reports are useful for presenting insights, analytics, and summary information.
Regulatory sompliance and legal records Some industries require web content records to be kept for regulatory or legal purposes. Puppeteer can be used to generate PDF screenshots of web pages, ensuring that correct records are kept.
E-commerce and catalog management Online retailers frequently capture product pages, catalog listings, or shopping carts for a variety of purposes, such as producing marketing materials, checking inventories, or creating PDF catalogs.
The Puppeteer screenshot and Puppeteer PDF generation capabilities provide developers with a robust toolkit for a variety of applications such as testing, monitoring, documentation, and data extraction. Its automation features help to streamline operations, increase efficiency, and improve user experiences across sectors and use cases.
đź’» Installing Puppeteer
Let's begin by installing and configuring Puppeteer on your device. To install Puppeteer and configure your environment, follow these steps:
Firstly, you need to create a folder for your project or if it is an existing project, open your terminal and change your working directory to the folder you created to ensure that the necessary package is installed for Puppeteer to work perfectly.
Example: puppeteer-project
mkdir puppeteer-project
cd puppeteer-project
npm init
The command above will create a folder puppeteer-project, and initialize Node into it. Follow the prompt that comes up after the npm init is run. Then, use the command below to install Puppeteer:
npm install puppeteer
Test Installation: verify that Puppeteer is correctly installed by writing a simple script in the project folder. The script will start a browser session, load the blog.apify.com website and close the browser window after three seconds.
Name the file simple.js, and add the code below to it
import puppeteer from "puppeteer"
//launch a new browser instance
const browser = await puppeteer.launch({
headless: false
});
//create a new page
const page = await browser.newPage()
//navigate to a sample website
await page.goto('https://blog.apify.com')
//wait for 3 seconds before closing the browser
await page.waitForTimeout(3000)
//close the browser
await browser.close()
To run the script, you can use the code editor terminal, then use this command: node simple.js to run the code.
Note: Steps 2 and 3 should be completed within a folder to ensure that the necessary node package is installed and effectively referenced with your code.
đź“‚Taking screenshots with Puppeteer
Taking screenshots with Puppeteer is easy. Let's go over the steps one by one:
Browser and website instance: Before you can use Puppeteer to perform any action on a site, you need to create a browser instance, load the website you want to work on, then move ahead to take the screensh. As a starting point, create a new file, you can name it screenshot.js, add the code below into it:
screenshot.jpeg is the filename and extension you want to save the image as. You can use either jpeg, png or webp. One thing to keep in mind is that taking screenshots in jpeg format is faster than in png format.
Note: If no path is specified, the image will not be saved to disk.
Customizing screenshot options
Puppeteer offers various options to customize your screenshot:
Full Page Screenshot: to capture the entire webpage, use the fullPage option. When the option is true, it takes a screenshot of the full scrollable page
To see the full-page screenshot option in action, comment out the viewport lines and uncomment the full-page option.
In this section, you have learned how to make a screenshot of a website and how to adjust the choices available with the screenshot option in Puppeteer. The next part of the article will cover how to make PDF files with Puppeteer.
đź“•Generating PDFs with Puppeteer
Generating PDFs is another powerful feature of Puppeteer. Let's explore how to convert web pages into PDFs.
Converting web pages to PDFs: Similar to capturing screenshots, you'll start by navigating to the desired web page. Then, use the page.pdf method to generate a PDF:
As you become more comfortable with Puppeteer, you can explore advanced techniques to enhance your capabilities:
Handling dynamic content Some web pages load content dynamically. Use the waitForSelector function to ensure the content is fully loaded before taking a screenshot or generating a PDF.
await page.waitForSelector('.dynamic-element');
Automating batch jobs To automate capturing screenshots or generating PDFs for multiple web pages, create a loop that iterates through an array of URLs.
For the purpose of this example, create a separate folder called images, all the screenshots will be saved into the folder.
const urlArr = ['https://blog.apify.com', 'https://blog.apify.com/puppeteer-submit-forms', 'https://blog.apify.com/puppeteer-web-scraping-tutorial'];
for(var i = 0; i < urlArr.length; i++){
const site_url = urlArr[i];
// Open URL in current page
await page.goto(site_url, {
waitUntil: 'networkidle0'
});
// Capture screenshot
await page.screenshot({ path: `images/screenshot_${i+1}.png`, fullPage: true });
}
The code above will loop over each URL in the array before taking a full-page snapshot and saving it to the images folder.
Dealing with authentication If a webpage requires basic authentication, use Puppeteer's page.authenticate method before navigating to the page.
//setup your basic authentication credential
const username = 'myUsername';
const password = 'password123';
// set the Authentication credentials
await page.authenticate({
username, password
});
// go to the website where you want to perform Authentication
await page.goto('https://website-url/auth-page');
//perfom further action on the page
Or use the page.type and page.click methods to fill the login form manually.
await page.goto('https://warehouse-theme-metal.myshopify.com/account/login');
// Find the input field by its ID selector
await page.type('input[id*="customer"]', 'demo@username.com', {delay: 100});
await page.type('input[type=password]', 'demo_password', {delay: 100});
// click the login button
page.click('.form__submit.button--full')
Troubleshooting and tips
Even the best developers encounter challenges. Here are some troubleshooting tips to help you overcome common issues:
Content not loading: Ensure you're waiting for the necessary elements to load using waitForSelector or waitForNavigation.
Stale element reference: If you're interacting with elements before taking a screenshot or generating a PDF, ensure those elements are still valid.
Great, you've unlocked Puppeteer's potential for taking screenshots and generating PDFs! You now know how to install Puppeteer, capture screenshots, configure options, and convert web pages to PDFs. But is Puppeteer the right choice for you. Check out Playwright vs. Puppeteer and Puppeteer vs. Selenium to find out about two handy alternatives.
Ayodele is a Developer Relations engineer with experience in few other tech skills such as frontend, technical writing, early stage startup advisory, product management and consulting.