Building a Page to PDF Creator with Puppeteer & Google Cloud Functions

Posted on 6th July 2020 by Mark Tiddy.

I recently found myself needing to allow users of a website I was building for a client to download content as a PDF…it took quite a bit of working out how to implement this feature so I thought I’d write about how I finally managed it on the server side using a Firebase Function…and share a couple of failed attempts!

The Ways that Didn’t Work
Before I settled on the method below I had a couple of failed attempts…these were:

  1. html2pdf, jsPDF – There is a well documented method of achieving turning a page into a PDF using html2canvas and jsPDF (bundled into html2pdf). This runs on the client-side and was the first method I implemented and it sort-of worked however it didn’t work on all browsers or devices and occasionally gave weird results (such as flipping images 180 degrees!)
  2. There is a React library out there called React-to-PDF but it requires writing your PDF content using separate syntax more similar to React Native…as I wanted to take an existing page this didn’t meet my needs but is worth checking out.

What Worked…and how I did it
My final working solution was a Firebase Function (although this would also work on Lambda with some tweaks) that ran a express API and on a particular endpoint would use Puppeteer (a headless browser) to navigate to the part of the site I needed as a PDF and then return that PDF to the client. (I then simply linked this endpoint in my frontend code).

So, what are the stages of doing this?

  1. Set up a new folder for your project (e.g. page-to-pdf)

Run

npm init -y

This adds a package.json and then run

firebase init

This sets up firebase with your project…I’m assuming you’ve previously installed the firebase CLI and logged in. (Note: See the Firebase documentation for setting your computer up the first time including logging into your firebase account

  1. At this point Firebase has created a ‘functions’ folder for you. CD into this and Install Express, Puppeteer and Body-Parser
npm install --save express puppeteer body-parser
  1. Now our packages are installed create an index.js file in your functions folder. We’re going to add the following code just to get us started.
const functions = require('firebase-functions')
const server = require('./server')
const api = functions.funWith({ memory: '2GB', timeoutSeconds: 120 }).https.onRequest(server)

module.exports = {
api
}

In the code above we required functions (which we need for firebase functions to work). We then created an instance of our server (which we’ll code and create in a second) and then we set up our cloud function with a few preferences.

  1. Next we need to create our actual Express server and set up our commands to puppeteer. So, create a server.js file and add the following code
const express = require('express')
const bodyParser = require('body-parser')
const puppeteer = require('puppeteer')

const app = express()
app.use(bodyParser.json()).use(bodyParser.urlencoded({ extended: false }));

let browserPromise = puppeteer.launch({
args: ['--no-sandbox']
})

Above we simply set up our server by creating an instance of express called ‘app’ (which we imported into index.js in stage 3) and then added some bodyParser middleware.

The last line of code assigns puppeteer to a variable and adds the argument ‘–no-sandbox’…if we don’t add this then puppeteer doesn’t work on cloud functions.

  1. Next, below that code we need to set up our endpoints. I’m going to set up something fairly simple for the purpose of this tutorial, a get request which will include a url added by a user.
app.get('/turn-website-to-pdf', async (req, res) => {
const url = req.query.url;

const options = {
format: 'A4',
printBackground: true
}

const browser = await browserPromise
 const browser = await browserPromise;
  const context = await browser.createIncognitoBrowserContext();
  const page = await context.newPage();
  await page.goto(url, {
    waitUntil: 'networkidle0',
  });
const pdf = await page.pdf(options)
  res.setHeader('Content-Type', 'application/pdf');
  res.send(pdf);
  context.close();
})

There’s a reasonable amount of code there so let me explain what we did after creating our asynchronous API endpoint.

First we grab our url that we submitted with the request…this might look something like this if we wanted a pdf of Google

http://myapi.com/turn-website-to-pdf?url=http://google.com

Secondly, we set up some options for when we turn our page into the PDF. In this case we want an A4 format with the background included.

Next, We then carry out a variety of things with puppeteer.

  1. We assign the browserPromise we created earlier to a variable called ‘browser’
  2. We create a new context of that browser using puppeteer’s ‘createIncongnitoBrowserContext()’ function. We do this using incognito mode so we get the latest version of the website we’re visiting
  3. We then create a new page and then visit the url we passed in (in our case Google). We use lots of the await keyword because we’re in an asynchronous function

(We also pass a second argument in to waitUntil the network is idle…this means we don’t move on until our page has fully loaded)

  1. We then create a new variable called ‘pdf’ and assign it the result of Puppeteer’s pdf function passing in the options we set earlier.
  2. Then, we send it all back to the user. First setting a header telling the browser we’re sending back a PDF and then sending back the PDF itself

Finally, at the bottom of your code add this line so our index.js can actually access it

module.exports = app;

And that’s it…we’re ready to test

To test it locally just navigate to the functions folder in your terminal and run the ‘firebase serve’ commend which will provide you with a local link to your API.

Once it’s all working you can then run ‘firebase deploy’ to send it to your Firebase project. You can find the link to your cloud function in your Firebase project (firebase.google.com) and under ‘Functions’.

And you’re done!!!

If you want some extras and some troubleshooting notes read on…

** Extras **
The above is a pretty simple example but when I coded this I needed to visit a React SPA, log into an account and then access some content before saving as a PDF. I used a couple of extra Puppeteer functions to do this

await page.click('#somethingtoclick)
await page.focus('#somethingtofocuson)
await page.type('#textbox,'my text')
await page.waitFor(1000)

The above functions (in order) let you click an item with that ID (e.g. a button), focus on something like a text input, type something into that input and wait before moving on (This last one was essential for me as my React app had some animations I needed to finish before creating a PDF)

** Troubleshooting **
If you’re finding that your function doesn’t perform as you expect then it’s worth setting the headless parameter for your browser. This means when you run it locally you can see the browser open up and the magic happening…you can also then see where it gets stuck.

To do this we have to pass a second argument in when we create our browser promise.

let browserPromise = puppeteer.launch({
args: ['--no-sandbox'],
headless: false
})

That’s it! I hope that’s helped you out if this is what you were looking for!

Tags: , , , , ,


© Mark Tiddy 2020