I’m working on a hybrid RAG (Retrieval-Augmented Generation) system that combines:
- Structured data from PostgreSQL
- A Neo4j graph database
- LightRAG for hybrid (graph + vector) search
I want to use structured data that I already have in PostgreSQL to build a clean, schema-controlled graph inside Neo4j — and still let LightRAG handle embeddings and semantic queries.
If I feed this data through LightRAG’s default ingestion, the LLM tries to infer the graph structure from text and often hallucinates nodes or relationships, such as:
Creating nodes for “start date” or “end date”
Adding irrelevant relationships
I want to bypass the automatic extraction and insert the JSON data into Neo4j exactly according to my schema — while still allowing LightRAG to:
- Generate embeddings for relevant text fields (event descriptions, names, etc.)
- Use the vector database for semantic search later.
What’s the correct approach or best practice to:
- Use structured JSON from PostgreSQL as the source of truth,
- Populate Neo4j with nodes and relationships following a strict schema (no LLM inference),
- Still let LightRAG (or another RAG tool) create embeddings and fill the vector database for semantic queries?
Note here: I have edited LightRAG prompt to strictly follows my json file formats, but it still adds irrelevant nodes.