1

I set up a self-hosted Firecrawl instance and I want to crawl my internal intranet site (e.g. https://intranet.xxx.gov.tr/).

I can access the site directly both from the host machine and from inside the container using curl:

# host
curl -v https://intranet.xxx.gov.tr/

# inside container
curl -v https://intranet.xxx.gov.tr/

Both return the page content successfully.

However, when I make a request to the Firecrawl API, I get an error:

curl -X POST http://localhost:3002/v0/crawl \
  -H 'Content-Type: application/json' \
  -d '{
    "url": "https://intranet.xxx.gov.tr/"
  }'

Firecrawl log output:

Connection violated security rules.
SCRAPE_ALL_ENGINES_FAILED
All scraping engines failed! -- Double check the URL...

The Playwright engine doesn’t work either, since the /html endpoint returns 404. The Fetch engine fails with the security rules violation error.


My questions:

  • Why does Firecrawl block access to intranet domains (like *.intranet.*)?
  • How can I bypass this safeFetch security rule, or whitelist my intranet domain?

Notes:

  • Firecrawl is running self-hosted in Docker.
  • The intranet domain is accessible fine from both the host and inside the container.

1 Answer 1

0

Firecrawler refuses to crawl anything hosted on a private IP address. This is checked in safeFetch.js in the method isIPv4Private. There does not seem to be any way to deactivate this behavior via config parameter. Instead, you can modify isIPv4Private to always return false.

Unfortunately, the documentation does not mention this bullshit behavior at all.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.