Running Puppeteer inside a browser
Puppeteer is a browser automation library; a browser is a pre-requisite to use it1. More specifically, it connects to a browser's automation interfaces via a remote debugging protocol. Per the same docs page (emphasis my own):
WebSocket Connections: Establish connections to existing browser instances using WebSockets. Launching or downloading browsers directly is not supported as it relies on Node.js APIs.
The Puppeteer-in-browser feature is designed to connect to an existing browser running remotely, which could be on localhost or on a server somewhere. Instead of launching a new browser instance as you can with the Node.js version, you only connect to an already running one. The connection via WebSocket2 is generally3 the same otherwise; where you run Puppeteer changes with this feature, but the browser connection is largely the same.
In this context, the phrase "current web page" means the page that the remote browser is currently on (not the page the Puppeteer-in-browser is running in).
The example shows a browser being connected to with a wsUrl, but there's no indication where that wsUrl comes from:
In my investigation, I ran across an older puppeteer-web example with a hard-coded endpoint, but they mentioned starting the browser with a remote-debugging-port
Technically all the properties in ConnectOptions are optional, but the first documentation page I linked says, "Do not forget to include a valid browser WebSocket endpoint when connecting to an instance," so it seems likely that browserWSEndpoint is conditionally required.
The documentation isn't explicit here, but your understanding is correct that it requires a remote debugging port. Later in that issue thread, it is mentioned by the same maintainer, "modern CDP sessions that Puppeteer requires" -- "CDP" is an acronym for Chrome Debugging Protocol and a link is provided a little bit further down in the thread.
For reference, the protocol of ConnectOptions is either cdp or webDriverBidi (a vendor neutral interface), both of which are implemented over WebSocket.
Note that the Browser Management page also mentions the same WebSocket URL:
Connecting to a running browser
If you launched a browser outside of Puppeteer, you can connect to it using the connect method. Usually, you can grab a WebSocket endpoint URL from the browser output:
const browser = await puppeteer.connect({
browserWSEndpoint: 'ws://127.0.0.1:9222/...',
});
const page = await browser.newPage();
browser.disconnect();
Connecting to the same browser instance as the one running Puppeteer-in-browser could be done, for example, by opening a port on localhost and connecting to it; indeed, the example above uses the loopback address 127.0.0.1.
which isn't feasible to expect people using the website to do
I would agree that this is a tall ask for the average browser user to do though.
Alternatives
That being said, if your goal is to save a PDF of your site that users are on, there are alternative ways of achieving that without Puppeteer or other browser automation tools.
For example, your "Export to PDF" button could run window.print(), which will open the print dialog in the browser. From there, a user can select to "Save as PDF".
However, as per Javascript call programmatically the "Save as PDF" feature of Chrome dialog print, you cannot select that option for your users, you can only open the print dialog.
There are other options too, such as using a library like html2pdf.js to convert the current HTML into a PDF with in-browser JavaScript, which could then be saved client-side. See also: Is it possible to save HTML page as PDF using JavaScript or jquery?
Some other options would be to provide a pre-generated static PDF (if the content is known ahead of time) or generate a PDF dynamically server-side which the user can download.
- Or it can download a browser for you.
- Or custom transport.
- There are more features available in the Node.js variant than the in-browser variant.
html2pdf.js