-1

I'm using Puppeteer and JS to write a web scraper. The site I'm scraping is pretty intense, so I need to use a local chrome instance and a residential proxy service to get it working. Here's my basic setup.

const chromeProcess = spawn(chromePath, [
    `--remote-debugging-port=${PORT}`,
    `--user-data-dir=${userDataDir}`,
    `--proxy-server=${proxyUrl}`,
    "--no-first-run",
    "--no-default-browser-check",
    "--disable-extensions",
    "--start-maximized",
    "--disable-features=IsolateOrigins,site-per-process"
  ], { stdio: "ignore" });

  let browser = await puppeteer.connect({ browserURL: `http://127.0.0.1:${PORT}` });
  let page = await browser.newPage();

I've been getting a multitude of errors trying to get the proxy service working, however, (like net::ERR_NO_SUPPORTED_PROXIES) where the page won't load, or will show a "page not found" error in the browser. I've tried tunneling with mitmproxy with no luck, so I'm just not sure what's possible at this point.

Does anyone have any insight into using proxies with a local chrome instance? Is this even possible?

5
  • Can you elaborate on "intense"? What errors are you seeing specifically--are the errors isolated to a single site or all sites? How can someone repro this? Thanks. Commented Sep 6 at 0:30
  • I'm getting a net::ERR_NO_SUPPORTED_PROXIES error when I try to connect with a proxy as shown above. This is happening for all sites. The site is intense enough that it tracks IP so I need to use the proxy service, but this is more of a general question about how to get this working with local chrome instances rather than a question about any specific website, as I've heard that puppeteer has problems with proxy + local chrome instance but could not find any solutions on how to get it working. Commented Sep 6 at 1:03
  • Makes sense, thanks for clarifying. Commented Sep 6 at 1:30
  • Sorry, this isn't a full answer but I ran into a similar problem setting up my project. I tried several different proxy services before I found one that worked. Most have a free version so you could experiment risk free. OxyLabs and Webshare.io to name a couple. Also, try scraping a simple site first - wikipedia or something. If you get the same error then the problem has nothing to do with the site you are scraping. Commented Sep 6 at 19:12
  • I'm actually using OxyLabs right now and it usually works great but I think its more of a general issue with using proxies with a local chrome instance. Definitely nothing to do with the particular website I'm scraping, just that this website is secure enough that I'm forced to use puppeteer + a proxy service rather than something more lightweight like I usually do. Commented Sep 7 at 2:33

1 Answer 1

0

One possible solution I've found using OxyLabs is to just whitelist your IP rather than use a user/pass to authenticate. You can also setup a system-wide proxy if you're on a windows machine to bypass having to authenticate with chrome directly.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.