2

I'm trying to add an "export to pdf" button to a webpage.

I'm aware that Puppeteer is mostly used as a Node.js library, but the documentation for Puppeteer specifically mentions that it's possible to run Puppeteer in the browser and that a supported feature in this mode is "Document Manipulation: Generate PDFs and screenshots of the current web page." However, it's unclear to me how to connect to the browser in question. The example shows a browser being connected to with a wsUrl, but there's no indication where that wsUrl comes from:

import puppeteer from 'puppeteer-core/lib/esm/puppeteer/puppeteer-core-browser.js';

const browser = await puppeteer.connect({
  browserWSEndpoint: wsUrl,
});

alert('Browser has ' + (await browser.pages()).length + ' pages');

browser.disconnect();

In my investigation, I ran across an older puppeteer-web example with a hard-coded endpoint, but they mentioned starting the browser with a remote-debugging-port, which isn't feasible to expect people using the website to do.

So I guess my question is, am I missing something? Is there an alternate way to connect to the browser or even just the current page? Technically all the properties in ConnectOptions are optional, but the first documentation page I linked says, "Do not forget to include a valid browser WebSocket endpoint when connecting to an instance," so it seems likely that browserWSEndpoint is conditionally required.

3
  • 1
    Puppeteer is a browser automation library; a browser is a pre-requisite to use it. The WebSocket feature on that page says: "Establish connections to existing browser instances using WebSockets. Launching or downloading browsers directly is not supported as it relies on Node.js APIs." So if you have a browser already running, say on a server somewhere, you can connect Puppeteer to it via WebSocket. "current web page" means the page that browser is connected to, the one on the server in this example Commented Sep 5 at 23:14
  • Related: Javascript call programmatically the "Save as PDF" feature of Chrome dialog print and libraries like html2pdf.js Commented Sep 5 at 23:39
  • Related: How to run Puppeteer code in any web browser? Commented Sep 10 at 22:32

2 Answers 2

2

Running Puppeteer inside a browser

Puppeteer is a browser automation library; a browser is a pre-requisite to use it1. More specifically, it connects to a browser's automation interfaces via a remote debugging protocol. Per the same docs page (emphasis my own):

WebSocket Connections: Establish connections to existing browser instances using WebSockets. Launching or downloading browsers directly is not supported as it relies on Node.js APIs.

The Puppeteer-in-browser feature is designed to connect to an existing browser running remotely, which could be on localhost or on a server somewhere. Instead of launching a new browser instance as you can with the Node.js version, you only connect to an already running one. The connection via WebSocket2 is generally3 the same otherwise; where you run Puppeteer changes with this feature, but the browser connection is largely the same.

In this context, the phrase "current web page" means the page that the remote browser is currently on (not the page the Puppeteer-in-browser is running in).

The example shows a browser being connected to with a wsUrl, but there's no indication where that wsUrl comes from:

In my investigation, I ran across an older puppeteer-web example with a hard-coded endpoint, but they mentioned starting the browser with a remote-debugging-port

Technically all the properties in ConnectOptions are optional, but the first documentation page I linked says, "Do not forget to include a valid browser WebSocket endpoint when connecting to an instance," so it seems likely that browserWSEndpoint is conditionally required.

The documentation isn't explicit here, but your understanding is correct that it requires a remote debugging port. Later in that issue thread, it is mentioned by the same maintainer, "modern CDP sessions that Puppeteer requires" -- "CDP" is an acronym for Chrome Debugging Protocol and a link is provided a little bit further down in the thread.
For reference, the protocol of ConnectOptions is either cdp or webDriverBidi (a vendor neutral interface), both of which are implemented over WebSocket.

Note that the Browser Management page also mentions the same WebSocket URL:

Connecting to a running browser

If you launched a browser outside of Puppeteer, you can connect to it using the connect method. Usually, you can grab a WebSocket endpoint URL from the browser output:

const browser = await puppeteer.connect({
  browserWSEndpoint: 'ws://127.0.0.1:9222/...',
});

const page = await browser.newPage();

browser.disconnect();

Connecting to the same browser instance as the one running Puppeteer-in-browser could be done, for example, by opening a port on localhost and connecting to it; indeed, the example above uses the loopback address 127.0.0.1.

which isn't feasible to expect people using the website to do

I would agree that this is a tall ask for the average browser user to do though.

Alternatives

That being said, if your goal is to save a PDF of your site that users are on, there are alternative ways of achieving that without Puppeteer or other browser automation tools.

For example, your "Export to PDF" button could run window.print(), which will open the print dialog in the browser. From there, a user can select to "Save as PDF".
However, as per Javascript call programmatically the "Save as PDF" feature of Chrome dialog print, you cannot select that option for your users, you can only open the print dialog.

There are other options too, such as using a library like html2pdf.js to convert the current HTML into a PDF with in-browser JavaScript, which could then be saved client-side. See also: Is it possible to save HTML page as PDF using JavaScript or jquery?

Some other options would be to provide a pre-generated static PDF (if the content is known ahead of time) or generate a PDF dynamically server-side which the user can download.


  1. Or it can download a browser for you.
  2. Or custom transport.
  3. There are more features available in the Node.js variant than the in-browser variant.
Sign up to request clarification or add additional context in comments.

Comments

1

I found this documentation page to be misleading, so I opened a clarifying PR #14191 which has been merged. This PR makes it clear that you're connecting to a remote browser and simply using the Puppeteer API to automate the remote browser, which is either running locally or on a server you're hosting. The difference is that your Puppeteer code happens to be running in a client browser rather than Node.

For the sentence you quoted, "Document Manipulation: Generate PDFs and screenshots of the current web page", the language is now "Document Manipulation: Generate PDFs and screenshots of the remote browser page".

This reinforces this answer which nicely clarifies that Puppeteer can't be used to turn the current page into a PDF, at the time of writing. That answer suggests tools better suited to the job of purely client-side PDF generation.

To be honest, I don't see much of a use case for "running Puppeteer in the browser" if it can't automate the current page. Access to Puppeteer from the client to a server would usually be through an API of some sort that exposes a set of restricted operations, like generating PDFs, without exposing the whole Puppeteer API.

See How to run Puppeteer code in any web browser? for further details on the broader issue of running Puppeteer from a client, which is often misunderstood.

1 Comment

Thanks for getting the documentation updated, that's awesome!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.