I am tasked to build a production level RAG application over CSV files. Possible Approches:
- Embedding --> VectorDB --> Taking user query --> Similarity or Hybrid Search --> LLM --> Result
- Csv to pandas df --> Ask LLM for py code to query from user prompt --> Query in df --> Give to LLM for analysis --> Result
First approach is giving vague answer for using unstructured approach to structured data and second is doing very good but I suspect its scalability. I need suggesstion.