2

We are trying to enable full text search. Application stores PDF files in the Azure Blob Storage, which is the data source for Azure Search. Majority of this works fine however the Indexer is not able to extract text from couple of PDFs. Are there any specific kinds of PDFs that Azure Search Indexer can extract?. If Yes, What are they?

Any information, Help/Support in this regard greatly appreciated.

3 Answers 3

2

Azure Search can extract all text from PDF text elements. Extracting text from embedded images (which requires OCR) or tables is not yet integrated in Azure Search, but it is on the roadmap.

If your PDFs contain images and you want to extract text from those as well, then you can try following the steps here.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for taking time to respond, i will try this approach and let you know if this works for me.
1

Are there any specific kinds of PDFs that Azure Search Indexer can extract?

Based on my experience, there are no specific kinds of PDFs that Azure search Indexer can't extract. According to your description, I assume that it reaches the Azure search limitation. For more detailed information please refer to Indexing Documents in Azure Blob Storage with Azure Search.

Azure Search limits how much text it extracts depending on the pricing tier: 32,000 characters for Free tier, 64,000 for Basic, and 4 million for Standard, Standard S2 and Standard S3 tiers. A warning is included in the indexer status response for truncated documents.

1 Comment

I have 1000 pdfs in the blob storage however around 900 are being processed others, i get a warning that the file format is not supported.
1

I recently wrote a blog post about my experience with this. I ended up using a python-based script running in a Docker container within Azure Somewhat complicated, but the blog lays it out pretty clearly (and the results have been very good as far as OCR/searchability)

http://martyice.github.io/docker-in-azure/

1 Comment

Thank you for sharing your blog post!!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.