Building a Web Scraper with Next.js API Routes and Cheerio on Vercel

Introduction: In this guide, we will explore how to utilize Next.js server and Cheerio to scrape web pages using Next.js API Routes. We will focus on the methodology of building web scrapers and leveraging the power of Next.js serverless functions. Please note that while the context of this guide was initially based on scraping piracy sites, we will strictly focus on the technical aspects and exclude any piracy-related content. This guide is intended for educational purposes and to help developers and startups understand how to effectively use Vercel edge workers with Next.js.

Prerequisites: Before diving into the implementation, it is recommended to have a basic understanding of Next.js, serverless functions, and Cheerio. Familiarity with JavaScript and Node.js is also beneficial.

Step 1: Setting up the Project

Create a new Next.js project by running the following command:

npx create-next-app my-scraper-app
cd my-scraper-app

Install the Cheerio package by running the following command:

npm install cheerio

Create a new file called pages/api/scraper.js to define our Next.js API route.

Step 2: Writing the Scraper In the scraper.js file, we will define our serverless function that performs the web scraping using Cheerio.

import cheerio from 'cheerio';

export default async (req, res) => {
  // Fetch the HTML content of the web page to be scraped
  const response = await fetch('https://example.com');
  const html = await response.text();

  // Load the HTML content into Cheerio
  const $ = cheerio.load(html);

  // Use Cheerio selectors to extract the desired data
  const title = $('h1').text();
  const links = $('a').map((i, el) => $(el).attr('href')).get();

  // Return the scraped data as a JSON response
  res.status(200).json({ title, links });
};

Step 3: Testing the Scraper To test the scraper, start the Next.js development server by running the following command:

npm run dev

You can then access the scraper API at http://localhost:3000/api/scraper. Make sure to replace https://example.com with the URL of the web page you want to scrape in the fetch function.

Step 4: Deploying to Vercel To deploy the scraper to Vercel, follow these steps:

  1. Sign up for a Vercel account and install the Vercel CLI.

  2. Run the following command to deploy your Next.js project:

vercel
  1. Follow the prompts to configure your deployment settings.

  2. After the deployment is complete, you will receive a URL where your scraper API is accessible.

Conclusion: In this guide, we have explored the process of building web scrapers using Next.js API Routes and Cheerio. We have covered the steps of setting up a Next.js project, writing the scraper using Cheerio, testing the scraper locally, and deploying it to Vercel. With this knowledge, you can create powerful web scrapers that can be used for various purposes, such as data analysis and automation. Remember to always respect the legal and ethical guidelines when scraping websites and ensure you have the necessary permissions.

References:

  1. Next.js Scraper Playground - John Polacek

  2. Create a Public API by Web Scraping in Next.js - Michael Liendo

  3. How to Build a Web Scraper using Express.js, Node.js, and Cheerio - Section.io

  4. How to Scrape Websites with Node.js and Cheerio - freeCodeCamp