Having the ability to capture screenshots and generate PDFs automatically is a valuable skill for developers, whether you're web scraping, making reports, testing, monitoring content, or archiving web pages. In this article, you'll learn how to use Puppeteer, a powerful Node.js package, to easily capture screenshots and generate PDFs from web pages.
What you need to get started
All you need is a basic knowledge of Node.js and JavaScript. If you don't have Node.js installed on your computer, head over to the official Node.js website, download and install the latest stable version. You can also follow this guide to help you properly install Node.
Why choose Puppeteer?
Before getting into the technical details, let's have a look at why Puppeteer is such a great tool for capturing screenshots and generating PDFs. Puppeteer is a headless browser library that provides a high-level API for controlling Chrome or Chromium browsers. Because of its versatility and ease of use, it is a popular tool for automating browser tasks.
Puppeteer provides a perfect environment for such jobs, whether you need to navigate web pages, interact with elements, or perform specific actions.
6 reasons to use Puppeteer to take screenshots and make PDFs
- Automated testing and quality assurance
Developers can use the Puppeteer screenshots method at various phases of a web application's functionality. These screenshots can be used for automated testing to ensure that user interface elements, layouts, and interactions are consistent across devices and browsers. - Visual regression testing
When making updates to a website, it's critical to avoid unintentional visual modifications. The ability of Puppeteer to take screenshots makes it a fantastic tool for performing visual regression tests. Developers can rapidly spot visual inconsistencies by comparing screenshots before and after changes. - Content monitoring and archiving
Puppeteer allows developers to capture screenshots or produce PDF snapshots of web pages on a regular basis. This is especially useful for content monitoring, tracking changes in online material, and building web page archives for historical purposes. - Reporting and documentation
Puppeteer makes it easier to generate PDF reports from web pages. By capturing screenshots of pertinent data and content, developers can produce informative and visually appealing reports. These PDF reports are useful for presenting insights, analytics, and summary information. - Regulatory sompliance and legal records
Some industries require web content records to be kept for regulatory or legal purposes. Puppeteer can be used to generate PDF screenshots of web pages, ensuring that correct records are kept. - E-commerce and catalog management
Online retailers frequently capture product pages, catalog listings, or shopping carts for a variety of purposes, such as producing marketing materials, checking inventories, or creating PDF catalogs.
The Puppeteer screenshot and Puppeteer PDF generation capabilities provide developers with a robust toolkit for a variety of applications such as testing, monitoring, documentation, and data extraction. Its automation features help to streamline operations, increase efficiency, and improve user experiences across sectors and use cases.
💻 Installing Puppeteer
Let's begin by installing and configuring Puppeteer on your device. To install Puppeteer and configure your environment, follow these steps:
Firstly, you need to create a folder for your project or if it is an existing project, open your terminal and change your working directory to the folder you created to ensure that the necessary package is installed for Puppeteer to work perfectly.
Example: puppeteer-project
mkdir puppeteer-project
cd puppeteer-project
npm init
The command above will create a folder puppeteer-project
, and initialize Node into it. Follow the prompt that comes up after the npm init
is run. Then, use the command below to install Puppeteer:
npm install puppeteer
- Test Installation: verify that Puppeteer is correctly installed by writing a simple script in the project folder. The script will start a browser session, load the
blog.apify.com
website and close the browser window after three seconds.
Name the file simple.js
, and add the code below to it
import puppeteer from "puppeteer"
//launch a new browser instance
const browser = await puppeteer.launch({
headless: false
});
//create a new page
const page = await browser.newPage()
//navigate to a sample website
await page.goto('https://blog.apify.com')
//wait for 3 seconds before closing the browser
await page.waitForTimeout(3000)
//close the browser
await browser.close()
To run the script, you can use the code editor terminal, then use this command: node simple.js
to run the code.
Note: Steps 2 and 3 should be completed within a folder to ensure that the necessary node package is installed and effectively referenced with your code.
📂Taking screenshots with Puppeteer
Taking screenshots with Puppeteer is easy. Let's go over the steps one by one:
- Browser and website instance: Before you can use Puppeteer to perform any action on a site, you need to create a browser instance, load the website you want to work on, then move ahead to take the screensh. As a starting point, create a new file, you can name it
screenshot.js
, add the code below into it:
import puppeteer from "puppeteer"
const browser = await puppeteer.launch({
headless: false
});
const page = await browser.newPage();
await page.goto('https://blog.apify.com');
// Continue with screenshot code
The code above will launch a headless chrome browser, it will then create a new page and open blog.apify.com
on the new page.
- Taking a screenshot:
Once you're on the webpage, taking a screenshot requires only one line of code:
await page.screenshot({ path: 'screenshot.jpeg' });
screenshot.jpeg
is the filename and extension you want to save the image as. You can use either jpeg
, png
or webp
. One thing to keep in mind is that taking screenshots in jpeg
format is faster than in png
format.
Note: If no path is specified, the image will not be saved to disk.
Customizing screenshot options
Puppeteer offers various options to customize your screenshot:
- Full Page Screenshot: to capture the entire webpage, use the
fullPage
option. When the option is true, it takes a screenshot of the full scrollable page
await page.screenshot({ path: 'fullpage.png', fullPage: true });
- Specified viewport size: you can as well capture a specific section of the page, by defining a viewport size:
await page.setViewport({ width: 800, height: 600 });
await page.screenshot({ path: 'viewport.png' });
For more screenshot options, check out the Puppeteer docs.
Complete code for this section
import puppeteer from "puppeteer"
const browser = await puppeteer.launch({
headless: false
});
const page = await browser.newPage();
await page.goto('https://blog.apify.com');
//screenshot code, fullpage option
// await page.screenshot({ path: 'apify.jpeg', fullPage: true });
//specified viewport
await page.setViewport({ width: 800, height: 600 });
await page.screenshot({ path: 'apifyView.png' });
//close the browser
await browser.close()
To see the full-page screenshot option in action, comment out the viewport lines and uncomment the full-page option.
In this section, you have learned how to make a screenshot of a website and how to adjust the choices available with the screenshot option in Puppeteer. The next part of the article will cover how to make PDF files with Puppeteer.
📕Generating PDFs with Puppeteer
Generating PDFs is another powerful feature of Puppeteer. Let's explore how to convert web pages into PDFs.
- Converting web pages to PDFs:
Similar to capturing screenshots, you'll start by navigating to the desired web page. Then, use thepage.pdf
method to generate a PDF:
await page.goto('https://blog.apify.com');
await page.pdf({ path: 'apify.pdf' });
Adjusting PDF options
Puppeteer allows you to customize the PDF output by adjusting various options:
- Page format and margins:
await page.pdf({
path: 'formatted.pdf',
format: 'A4',
margin: { top: '40px', right: '20px', bottom: '40px', left: '20px' },
});
For more PDF options, check out the Puppeteer docs.
Complete code for this section
import puppeteer from "puppeteer"
const browser = await puppeteer.launch({
headless: false
});
const page = await browser.newPage();
await page.goto('https://blog.apify.com');
//generate pdf
// await page.pdf({ path: 'page.pdf' });
//format pdf options
await page.pdf({
path: 'formatted.pdf',
format: 'A4',
margin: { top: '40px', right: '20px', bottom: '40px', left: '20px' },
});
//close the browser
await browser.close()
🔑Advanced Puppeteer techniques
As you become more comfortable with Puppeteer, you can explore advanced techniques to enhance your capabilities:
- Handling dynamic content
Some web pages load content dynamically. Use thewaitForSelector
function to ensure the content is fully loaded before taking a screenshot or generating a PDF.
await page.waitForSelector('.dynamic-element');
- Automating batch jobs
To automate capturing screenshots or generating PDFs for multiple web pages, create a loop that iterates through an array of URLs.
For the purpose of this example, create a separate folder called images, all the screenshots will be saved into the folder.
const urlArr = ['https://blog.apify.com', 'https://blog.apify.com/puppeteer-submit-forms', 'https://blog.apify.com/puppeteer-web-scraping-tutorial'];
for(var i = 0; i < urlArr.length; i++){
const site_url = urlArr[i];
// Open URL in current page
await page.goto(site_url, {
waitUntil: 'networkidle0'
});
// Capture screenshot
await page.screenshot({ path: `images/screenshot_${i+1}.png`, fullPage: true });
}
The code above will loop over each URL in the array before taking a full-page snapshot and saving it to the images folder.
- Dealing with authentication
If a webpage requires basic authentication, use Puppeteer'spage.authenticate
method before navigating to the page.
//setup your basic authentication credential
const username = 'myUsername';
const password = 'password123';
// set the Authentication credentials
await page.authenticate({
username, password
});
// go to the website where you want to perform Authentication
await page.goto('https://website-url/auth-page');
//perfom further action on the page
Or use the page.type
and page.click
methods to fill the login form manually.
await page.goto('https://warehouse-theme-metal.myshopify.com/account/login');
// Find the input field by its ID selector
await page.type('input[id*="customer"]', 'demo@username.com', {delay: 100});
await page.type('input[type=password]', 'demo_password', {delay: 100});
// click the login button
page.click('.form__submit.button--full')
Troubleshooting and tips
Even the best developers encounter challenges. Here are some troubleshooting tips to help you overcome common issues:
- Content not loading:
Ensure you're waiting for the necessary elements to load usingwaitForSelector
orwaitForNavigation
. - Stale element reference:
If you're interacting with elements before taking a screenshot or generating a PDF, ensure those elements are still valid.
Link to the complete code examples on GitHub
Screenshot.js
Pdf.js
Batch.js
Simple.js
Alternatives to Puppeteer
Great, you've unlocked Puppeteer's potential for taking screenshots and generating PDFs! You now know how to install Puppeteer, capture screenshots, configure options, and convert web pages to PDFs. But is Puppeteer the right choice for you. Check out Playwright vs. Puppeteer and Puppeteer vs. Selenium to find out about two handy alternatives.
Continue learning about Puppeteer
- Puppeteer tutorial: submitting forms, clicking buttons, and handling inputs
- How to scrape the web with Puppeteer in 2023
❓FAQ
Can Puppeteer capture screenshots of specific elements on a page?
Yes, Puppeteer allows you to target specific elements by using their selectors. You can then capture screenshots of these individual elements.
How can I capture a screenshot of a dynamically loaded element?
Use the waitForSelector
function to wait for the element to be fully loaded before capturing the screenshot.
Is it possible to generate a PDF from a protected web page that requires login?
Yes, you can use Puppeteer's page.authenticate method to provide login credentials before navigating to the protected page.
Can Puppeteer generate PDFs from multiple web pages in a single batch job?
Absolutely! You can create a loop that iterates through an array of URLs, capturing screenshots or generating PDFs for each page.