1

Current Behavior

I am using pgvector with ASP.NET Core 9.0 to personalize user experience based on their preferences.

I have a posts controller that returns posts (paginated using cursor pagination) to the user base on their interests:

// expensive call (remote embedding model)
var userInterestsEmbedding = await GetUserInterests(); 

The results of this call is cached for 1 hour for each user so that I don't have to make this call with each page fetched during pagination.

This userInterests is then used in the query like so:

// date is set to the previous result's last entry sorted by CreatedAt
var result = Posts
    .Where(p => p.CreatedAt < date)
    .OrderByDescending(p => p.Embedding.CosineDistance(userInterests))
    .ThenByDescending(p => p.CreatedAt)
    .Take(limit + 1);

Now, the problem with the previous query is that, each time next page is requested, the cosine distances will be recalculated for each row in the database. This will not make efficient use of the CreatedAt index like in normal case with cursor pagination.

Improvements?

  1. How can I improve the performance of this query and reduce full-scans of the Posts table?
  2. What if I add more options to sort based on, like (most comments)?
3
  • 1
    With real cursor pagination, the db holds on to the resulting row set for the lifetime of the cursor so there's no risk of re-scans and recalculating anything. The order is fixed unless you fetch everything from it into some sort of a temp table and re-order that or change the query in the cursor to begin with. Commented Oct 22 at 6:25
  • 1
    Thing is, I suspect the cursor pagination you're referring to is this sort of thing, not this. The 2nd link uses cursor pagination for the method that actually uses a db cursor, while this method is called keyset pagination. The real "db cursor"-cursor pagination could solve your problem. Commented Oct 22 at 6:27
  • 1
    Keyset pagination makes sense when the keys it operates on are persisted/pre-calculated and possibly indexed, at least for however long you need to retrieve pages. Your query calculates one of the keys you wish to paginate over (the cosine distance), but since it isn't saved, it will have to be re-evaluated every time unless you set up a db-side cursor or cache it otherwise (a temp table, a matview). Here's .NET app using refcursor Commented Oct 22 at 6:47

1 Answer 1

1

So here is what I came up with after thinking about it for the morning:

I think it's impractical to use a date cursor while trying to sort posts by relevance (cosine distance).

What I can do is, just sort based on the cosine distance and use "last distance" cursor and eliminate the date factor entirely.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.