Web Scraping Using Puppeteer: A Beginner’s Guide

Ever realized the need to access data from a website but found it difficult to access it in a structured format? Well, Web Scraping is used to solve these types of issues. It is a technique of extracting data from any public website and using that data by storing it locally or showing it in the form of live data in our application. In this process, we are sending a crawler that automatically crawls all the data from the provided website. Node js provides many such libraries for web scraping such as Axios for fetching API responses, or Nightmare or Puppeteer for advanced scraping such as automation or skip captchas, etc. In this blog, we will be discussing how to use Puppeteer in data scraping from the web which is a free web scraping tool. 

NOTE: Kindly make sure you’re doing web scraping on websites that allow it without disturbing any company norms or privacy measures.

What is Puppeteer & Why It is Used?

Puppeteer is a library used for Web Scraping in node js. It is developed by Google and provides an advanced API for controlling headless or headful(by default runs in headless mode) on Chromium browsers. Apart from scraping, this library is also used for taking screenshots, automating tasks like navigating to the next pages, etc, or generating pdf from the website content. 

Before diving in further, you should have a basic knowledge of Node Js, and HTML DOM, as these technologies are used together to use Puppeteer. 

How To Use Puppeteer For Web Scraping?

Follow the given instructions to use Puppeteer for Web Scraping: 

Step 1: First, install Puppeteer in your running node js project using npm.

npm install puppeteer

Step 2: Once the package is installed successfully,  you can require the package in your js file.

const puppeteer = require(‘puppeteer’);

Step 3: Now you can launch/create a browser, create pages on that browser, go to the websites you wish to work via its URL, and manipulate it to extract any information.

To begin with, let’s take a look at a simple web scraping example that will open the desired web page and extract its title.

(async () => { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto('https://www.google.com/'); const pageTitle = await page.title(); console.log(`Title: ${pageTitle}`); await browser.close(); })();
Code language: JavaScript (javascript)

Let’s understand what the above code performs:

  • The launch() function is used to launch a Chromium browser for testing. By default, the browser will be launched in headless mode but if you want to see the launching of the browser in your system, you need to set the following in the launch function itself.

const browser = await puppeteer.launch(headless: false)  

  • Now we want to open a page in the browser we launched. The newPage() function is used to do so.
  • To navigate to the required website,  we use the goto() function and pass the URL of that website. In the above code, we are navigating to the homepage of Google.
  • Now we can manipulate our queries depending on the data we want to extract.
  • In the above code, we wanted to know the title of the webpage we navigated and, therefore, the page.title() function is used which is a predefined function that provides the title of the webpage and we have displayed it in the console.
  • Lastly, we close the browser window.

Some Web Scraping Techniques in Puppeteer

There are some of the built-in functions within the puppeteer library that we can readily use. These are advanced functions that help us to interact with the website automatically and extract data, make PDFs or take screenshots depending on our needs. Some of them are:

  • page.viewPort(): to set the height and width of the browser window page.
  • page.screenshot({path: “path”}): to take the screenshot of the page and store it in the path provided.
  • page.pdf(): to form a pdf of the webpage.
  • page.click():  to click on the element of the page that matches the selector passed as the parameter.
  • page.type(): to fill or type on the element automatically that matches the selector passed.
  • page.url(): to get the URL of the page.
  • page.waitforNavigation(): this function is used to handle the navigations correctly. This function will wait for 10 sec by default until the page is navigated to the next page. If the page has not been navigated, it will throw us an error. You can change the waiting time. 

syntax: await page.waitForNavigation({ timeout: 30000 });

  • page.waitForSelector() : this function works in the similar way as waitForNavigation() works. The difference between the two is page.waitForSelector() will wait until the selector that is passed is found when the page navigates. If not found, throws an error. You can change the waiting time in the same way as the page.waitForNavigation().

These are some of the basic yet important functions used while scraping or interacting with web pages using Puppeteer. Apart from these, if you want to evaluate the page or access the data with the help of HTML DOM with the help of query selectors or providing X paths, you can check all the other functions to get more information on Puppeteer and discover more ways you can use Puppeteer for Web Scraping from the official documentation of Puppeteer. 


Puppeteer is a fantastic library for doing web scraping and automating UI interactions with our desired web pages and extracting information from them. By using its various ready-to-use features and technologies, it is much easier to scrap and perform complex tasks, capturing screenshots or making PDFs and storing data in our desired structure making it easy to access the data. Be ethical and responsible to use scraping only on those sites that allow it without violating any norms or privacy measures else legal actions can be taken against the website holder.

Happy Scraping!

Recent Post

  • How to Implement File Uploads in Node.js with Multer?

    For many web applications that let users share papers, photos, and other types of material, file uploads are a necessary functionality. Multer is a well-liked middleware used for Handling file uploads in Node.js using Multer middleware.in the Node.js environment to effectively handle file uploads. We’ll look at how to use Multer to create file uploads […]

  • How to Use Local Storage to Persist Data in a JavaScript Application?

    Data persistence is a key aspect of web applications. In JavaScript, one of the primary tools for achieving this is Local Storage, a part of the Web Storage API. This JavaScript data persistence tool provides a simple key-value storage mechanism within the user’s browser. Unlike session storage, which persists data only during a session, Local […]

  • How to Use WordPress API?

    WordPress stands out as a leading content management system, celebrated for its dynamic capabilities in website and blog development. Renowned for its rich feature set and extensive customization options, WordPress empowers creators to fashion captivating online experiences. In recent times, WordPress has expanded its horizons into the realm of APIs, granting developers a potent tool […]

  • How to Use Post and Pre Hooks in Mongoose?

    In Mongoose, a widely-used library for working with MongoDB, there are valuable tools called “Pre and Post Hooks.” These hooks, also known as “Mongoose Middleware” or “Mongoose Model Hooks,” allow developers to run their own code just before or after specific actions on the database. With “Pre Hooks,” you can customize actions before they happen, […]

  • How To Create Reusable Components in React: Best Practices

    Rеact is a popular library for building usеr intеrfacеs with JavaScript. One of the main benefits of Rеact is its ability to create reusable componеnts that can be used in different parts of your application. Reusable componеnts can help you savе timе, rеducе codе duplication, and еnsurе consistеncy in your UI. However, creating rеusablе componеnts […]

  • Implementing Higher Order Components in Class-Based Components for Code Reusability in ReactJS

    Rеact is a powerful library for building usеr intеrfacеs, and one of its corе strengths liеs in its componеnt-basеd architеcturе. Componеnts arе thе building blocks of Rеact applications, and thеy allow you to crеatе modular and rеusablе piеcеs of UI. Howеvеr, as your application grows, you may find thе nееd for rеusing cеrtain functionalitiеs across […]