1

Problem

I’m using Docusaurus with Typesense and the docsearch-typesense-scraper to index my documentation site. Everything runs fine — the sitemap is found, and the scraper produces records. However, all indexed records come from the homepage, even though the sitemap includes many distinct documentation URLs.

Setup

  • Docusaurus version: 3.9
  • Scraper: docsearch-typesense-scraper
  • Serving method: static build (npm run build), served on port 80
  • Sitemap is reachable and correct

I followed the official Typesense DocSearch guide:

What Happens

When running the scraper:

The sitemap URLs are fetched correctly. Records are created successfully. But all records contain homepage content, not the actual documentation content for each page.

Question

Has anyone seen this behavior with docsearch-typesense-scraper and Docusaurus? Why would all sitemap URLs return the homepage’s HTML to the scraper, even though each page serves distinct HTML content when accessed manually?

What I Have Tried

I have tried to use the configuration for docusaurus and made the adjustments that are listed in the guide.

I have also tried to use a minimalistic selector config. But all the records contain homepage content instead of the documentation content.

I tried hosting the built Docusaurus site publicly and used a very minimal selector configuration, but the scraper still only indexes homepage content.

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.