Embedding model `all-mpnet-base-v2` not able to classify user prompt properly

Question

I am using this model to embed a product catalog for a rag. In the product catalog, there are no red shirts for men, but there are red shirts for women. How can I make sure the model doesnt output women shirts for men based queries?

The following is the example product data

{
    "ColorDesc": "Brown",
    "styleName": "Scarf",
    "productType": "mens accessories",
    "tags": {
      "colorTag": [
        {
          "type": "color",
          "value": "Brown"
        }
      ],
      "newStyleTag": [
        {
          "type": "style",
          "value": "Scarves"
        }
      ],
      "depttTag": [
        {
          "type": "department",
          "value": "Men"
        }
      ]
    },
    "gender": "Men"
  },

When I prompt - "Looking for a brown scarf for women", the model will returns this product instead of returning nothing Is there any way to strictly apply certain filters in rag so that it retrieves only that those products and not output anything if the product is not available for that prompt? I am using FAISS for vectorstore and ollama for llm.

Sean Wu · Accepted Answer · 2025-05-14 08:22:07Z

0

If you want to make sure the model doesn't return men's products when someone asks for women's stuff, just do the semantic search first, then filter by gender in the code.

So even if FAISS finds something similar, like a man's brown scarf when someone asks for a women's brown scarf, you can just filter that out afterward.

Here's a simple example:

def filter_by_gender(results, gender):
    return [product for product in results if product.get("gender", "").lower() == gender.lower()]

# Step 1: Get similar results using FAISS
retrieved_results = vectorstore.similarity_search(query, k=100)

# Step 2: Keep only products with the correct gender
filtered_results = filter_by_gender(retrieved_results, gender="Women")

# Step 3: Handle the output
if not filtered_results:
    print("No matching product found.")
else:
    for product in filtered_results:
        print(product)

To extract the gender from a query, you can use substring matching or an LLM.

You can ask the LLM something like:

Extract structured filters from this user query:
"Looking for a brown scarf for women"

Return the result as JSON. Only include fields like gender if they are clearly mentioned.

The LLM might respond with something like:

{
  "gender": "Women",
}

answered May 14 at 8:22

Sean Wu

363 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Advait Shendage May 15 at 9:21

But this will increase the response time by quite a bit since after fetching the products, we are making another llm call in the backend for the gender filter. That is what I dont want. I can use a simple NLP model for classification but again its another call which increases the response time

Collectives™ on Stack Overflow

Embedding model `all-mpnet-base-v2` not able to classify user prompt properly

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related