0
$\begingroup$

I’m working on a hybrid RAG (Retrieval-Augmented Generation) system that combines:

  • Structured data from PostgreSQL
  • A Neo4j graph database
  • LightRAG for hybrid (graph + vector) search

I want to use structured data that I already have in PostgreSQL to build a clean, schema-controlled graph inside Neo4j — and still let LightRAG handle embeddings and semantic queries.

If I feed this data through LightRAG’s default ingestion, the LLM tries to infer the graph structure from text and often hallucinates nodes or relationships, such as:

Creating nodes for “start date” or “end date”

Adding irrelevant relationships

I want to bypass the automatic extraction and insert the JSON data into Neo4j exactly according to my schema — while still allowing LightRAG to:

  • Generate embeddings for relevant text fields (event descriptions, names, etc.)
  • Use the vector database for semantic search later.

What’s the correct approach or best practice to:

  • Use structured JSON from PostgreSQL as the source of truth,
  • Populate Neo4j with nodes and relationships following a strict schema (no LLM inference),
  • Still let LightRAG (or another RAG tool) create embeddings and fill the vector database for semantic queries?

Note here: I have edited LightRAG prompt to strictly follows my json file formats, but it still adds irrelevant nodes.

$\endgroup$

0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.