13

I am using Puppeteer to generate PDF files from HTML strings. Reading the documentation, I found two ways of generating the PDF files:

First, passing an url and call the goto method as follows:

page.goto('https://example.com');
page.pdf({format: 'A4'});

The second one, which is my case, calling the method setContent as follows:

page.setContent('<p>Hello, world!</p>');
page.pdf({format: 'A4'});

The thing is that I have 3 different HTML strings that are sent from the client and I want to generate a single PDF file with 3 pages (in case I have 3 HTML strings).

I wonder if there exists a way of doing this with Puppeteer? I accept other suggestions, but I need to use chrome-headless.

1
  • I would basically approach this as: 1.) puppeteer script that does THREE separate page.goto's 2.) a variable to hold each of the 3 scraped HTML strings from those 3 HTML pages 3.) at the end generate 3 separate PDF files I'm not sure you can merge PDF documents with puppeteer. If you find a way to do it please post your solution here. Commented Jan 31, 2018 at 4:51

3 Answers 3

12

I was able to do this by doing the following:

  1. Generate 3 different PDFs with puppeteer. You have the option of saving the file locally or to store it in a variable.

  2. I saved the files locally, because all the PDF Merge plugins that I found only accept URLs and they don't accept buffers for instance. After generating synchronously the PDFs locally, I merged them using PDF Easy Merge.

The code is like this:

const page1 = '<h1>HTML from page1</h1>';
const page2 = '<h1>HTML from page2</h1>';
const page3 = '<h1>HTML from page3</h1>';

const browser = await puppeteer.launch();
const tab = await browser.newPage();
await tab.setContent(page1);
await tab.pdf({ path: './page1.pdf' });

await tab.setContent(page2); 
await tab.pdf({ path: './page2.pdf' });

await tab.setContent(page3);
await tab.pdf({ path: './page3.pdf' });

await browser.close();

pdfMerge([
  './page1.pdf',
  './page2.pdf',
  './page3.pdf',
],
path.join(__dirname, `./mergedFile.pdf`), async (err) => {
  if (err) return console.log(err);
  console.log('Successfully merged!');
})
Sign up to request clarification or add additional context in comments.

3 Comments

What is page1, page2 and page3 is it three different url or browser.newPage() ? also method name change setContent to content as per latest document..
page1, page2, and page3 contains the HTML of three different pages. They are strings. Thanks for the tip on content. I'll update it
any open source library for pdf merging. Hence I could see easy-pdf merge is not an open source
8

pdf-merger-js is another option. page.setContent should work just the same as a drop-in replacement for page.goto below:

const PDFMerger = require("pdf-merger-js"); // ^4.2.1
const puppeteer = require("puppeteer"); // ^19.7.2

const urls = [
  "https://news.ycombinator.com",
  "https://www.example.com",
  "https://en.wikipedia.org",
  // ...
];
const filename = "merged.pdf";

let browser;
(async () => {
  browser = await puppeteer.launch();
  const [page] = await browser.pages();
  const merger = new PDFMerger();

  for (const url of urls) {
    await page.goto(url);
    await merger.add(await page.pdf());
  }

  await merger.save(filename);
})()
  .catch(err => console.error(err))
  .finally(() => browser?.close());

2 Comments

it also worked for me with this plugin. I think it's the only one still being maintained
This also has a .saveAsBuffer() option, for returning a PDF as a stream, rather than saving it to your harddrive.
6

I was able to generate multiple PDF from multiple URLs from below code:

package.json

{
 ............
 ............

 "dependencies": {
    "puppeteer": "^1.1.1",
    "easy-pdf-merge": "0.1.3"
 }

 ..............
 ..............
}

index.js

const puppeteer = require('puppeteer');
const merge = require('easy-pdf-merge');

var pdfUrls = ["http://www.google.com","http://www.yahoo.com"];

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  var pdfFiles=[];

  for(var i=0; i<pdfUrls.length; i++){
    await page.goto(pdfUrls[i], {waitUntil: 'networkidle2'});
    var pdfFileName =  'sample'+(i+1)+'.pdf';
    pdfFiles.push(pdfFileName);
    await page.pdf({path: pdfFileName, format: 'A4'});
  }

  await browser.close();

  await mergeMultiplePDF(pdfFiles);
})();

const mergeMultiplePDF = (pdfFiles) => {
    return new Promise((resolve, reject) => {
        merge(pdfFiles,'samplefinal.pdf',function(err){

            if(err){
                console.log(err);
                reject(err)
            }

            console.log('Success');
            resolve()
        });
    });
};

RUN Command: node index.js

2 Comments

any open source library for pdf merging. Hence I could see easy-pdf merge is not an open-source
It requires Java :(

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.