1

I´m trying to create an Azure Search vector index as well in the Azure ML UI (Prompt flow) portal but having an error in the component "LLM - Crack and Chunk Data": My Flow Error Image

The error says: User program failed with BaseRagServiceError: Rag system error

Part of the logs is:

input_data=/mnt/azureml/cr/j/60652b595f69/cap/data-capability/wd/INPUT_input_data
input_glob=**/*
allowed_extensions=.txt,.md,.html,.htm,.py,.pdf,.ppt,.pptx,.doc,.docx,.xls,.xlsx,.csv,.json
chunk_size=1024
chunk_overlap=0
output_chunks=/mnt/azureml/cr/j/606547e361134e058c4829792b595f69/cap/data-capability/wd/output_chunks
data_source_url=azureml://locations/XXXXX/workspaces/04XXXX0/data/vector-index-input-1734572551882/versions/1
document_path_replacement_regex=None
max_sample_files=-1
use_rcts=True
output_format=jsonl
custom_loader=None
doc_intel_connection_id=None
output_title_chunk=None
openai_api_version=None
openai_api_type=None
[2024-12-19 01:43:28] INFO     azureml.rag.crack_and_chunk.crack_and_chunk - ActivityStarted, crack_and_chunk (activity.py:108)
[2024-12-19 01:43:28] INFO     azureml.rag.crack_and_chunk - Processing file: What is prompt flow.pdf (crack_and_chunk.py:127)
/azureml-envs/rag-embeddings/lib/python3.9/site-packages/pypdf/_crypt_providers/_cryptography.py:32: CryptographyDeprecationWarning: ARC4 has been moved to cryptography.hazmat.decrepit.ciphers.algorithms.ARC4 and will be removed from cryptography.hazmat.primitives.ciphers.algorithms in 48.0.0.
  from cryptography.hazmat.primitives.ciphers.algorithms import AES, ARC4
[2024-12-19 01:43:31] INFO     azureml.rag.azureml.rag.documents.chunking - No file_chunks to yield, continuing (chunking.py:237)
[2024-12-19 01:43:31] INFO     azureml.rag.azureml.rag.documents.chunking - No file_chunks to yield, continuing (chunking.py:237)
[2024-12-19 01:43:31] INFO     azureml.rag.crack_and_chunk - [DocumentChunksIterator::filter_extensions] Filtered 0 files out of 1 (crack_and_chunk.py:129)
[2024-12-19 01:43:31] INFO     azureml.rag.crack_and_chunk - [DocumentChunksIterator::filter_extensions] Skipped extensions: {} (crack_and_chunk.py:130)
[2024-12-19 01:43:31] INFO     azureml.rag.crack_and_chunk - [DocumentChunksIterator::filter_extensions] Kept extensions: {
  ".pdf": 1
} (crack_and_chunk.py:133)
[2024-12-19 01:43:31] INFO     azureml.rag.azureml.rag.documents.cracking - [DocumentChunksIterator::crack_documents] Total time to load files: 0.30446887016296387
{
  ".txt": 0.0,
  ".md": 0.0,
  ".html": 0.0,
  ".htm": 0.0,
  ".py": 0.0,
  ".pdf": 1.0,
  ".ppt": 0.0,
  ".pptx": 0.0,
  ".doc": 0.0,
  ".docx": 0.0,
  ".xls": 0.0,
  ".xlsx": 0.0,
  ".csv": 0.0,
  ".json": 0.0
} (cracking.py:381)
[2024-12-19 01:43:31] INFO     azureml.rag.azureml.rag.documents.cracking - [DocumentChunksIterator::crack_documents] Total time to load files: 0.30446887016296387
{
  ".txt": 0.0,
  ".md": 0.0,
  ".html": 0.0,
  ".htm": 0.0,
  ".py": 0.0,
  ".pdf": 1.0,
  ".ppt": 0.0,
  ".pptx": 0.0,
  ".doc": 0.0,
  ".docx": 0.0,
  ".xls": 0.0,
  ".xlsx": 0.0,
  ".csv": 0.0,
  ".json": 0.0
} (cracking.py:381)
[2024-12-19 01:43:31] INFO     azureml.rag.azureml.rag.documents.chunking - [DocumentChunksIterator::split_documents] Total time to split 1 documents into 0 chunks: 0.9676399230957031 (chunking.py:247)
[2024-12-19 01:43:31] INFO     azureml.rag.azureml.rag.documents.chunking - [DocumentChunksIterator::split_documents] Total time to split 1 documents into 0 chunks: 0.9676399230957031 (chunking.py:247)
[2024-12-19 01:43:31] INFO     azureml.rag.crack_and_chunk - Processed 0 files (crack_and_chunk.py:208)
[2024-12-19 01:43:31] INFO     azureml.rag.crack_and_chunk - No chunked documents found in /mnt/azureml/cr/j/606547e361134e058c4829792b595f69/cap/data-capability/wd/INPUT_input_data with glob **/* (crack_and_chunk.py:215)
[2024-12-19 01:43:31] ERROR    azureml.rag.crack_and_chunk.crack_and_chunk - ServiceError: intepreted error = Rag system error, original error = No chunked documents found in /mnt/azureml/cr/j/606547e361134e058c4829792b595f69/cap/data-capability/wd/INPUT_input_data with glob **/*. (exceptions.py:124)
[2024-12-19 01:43:36] ERROR    azureml.rag.crack_and_chunk.crack_and_chunk - crack_and_chunk failed with exception: Traceback (most recent call last):
  File "/azureml-envs/rag-embeddings/lib/python3.9/site-packages/azureml/rag/tasks/crack_and_chunk.py", line 229, in main_wrapper
    map_exceptions(main, activity_logger, args, logger, activity_logger)
  File "/azureml-envs/rag-embeddings/lib/python3.9/site-packages/azureml/rag/utils/exceptions.py", line 126, in map_exceptions
    raise e
  File "/azureml-envs/rag-embeddings/lib/python3.9/site-packages/azureml/rag/utils/exceptions.py", line 118, in map_exceptions
    return func(*func_args, **kwargs)
  File "/azureml-envs/rag-embeddings/lib/python3.9/site-packages/azureml/rag/tasks/crack_and_chunk.py", line 220, in main
    raise ValueError(f"No chunked documents found in {args.input_data} with glob {args.input_glob}.")
ValueError: No chunked documents found in /mnt/azureml/cr/j/606547e361134e058c4829792b595f69/cap/data-capability/wd/INPUT_input_data with glob **/*.
 (crack_and_chunk.py:231) ...................................

I tried with Serverless and Compute instance and is the same result. It seems the chunk is not doing nothing. My file is PDF format file with only one page without images to let it more easy.

Someone has a suggestion? thank you in advanced!!

1
  • add the configuration you done for this. also check if the file present in the given source. Commented Dec 20, 2024 at 9:47

1 Answer 1

0

This kind of error comes when there is no content to chunk the document.

Even i got the same error.

enter image description here

I have two text files New Text Document.txt and New Text Document (2).txt, both are empty no content in those and got the error.

You said you have a single page pdf file, the possible reason is the content is not being extracted properly.

So, you try with 3-4 pdf files with proper content also make sure the file is not password protected.

Sign up to request clarification or add additional context in comments.

2 Comments

JayashankarGS, Thank you for your suggestion!! I will try it again with other pdf and post the result :))
Hello JayashankarGS, your solution works!! I have added another PDF with more pages and the job finished. Thank you so much!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.