2

I am trying to implement hybrid search in postgresql with pgvector and sqlalchemy.

Below is the table schema:

class Project_images(Base):
    __tablename__ = "project_images"
    id = Column(Integer, Sequence("project_image_id_seq"), primary_key=True)
    image_link = Column(String(255))
    image_vector = Column(Vector(512))
    keywords = Column(String(255))
    keyword_vector = Column(Vector(768))

And this is the function I call to perform the search:

def query_db(
        image_encoding,
        image_search_weight,
        keyword_encoding,
        keyword_search_weight,
    ):
        search_query = text(
            """
                SELECT *, 
                ((:image_encoding <=> image_vector) * :image_search_weight + (:keyword_encoding <=> keyword_vector) * :keyword_search_weight) 
                AS vector_sum
                FROM project_images
                ORDER BY vector_sum
                LIMIT 50

            """
        )

        params = {
            "image_encoding": image_encoding,
            "image_search_weight": image_search_weight,
            "keyword_encoding": keyword_encoding,
            "keyword_search_weight": keyword_search_weight,
        }

        with session_class() as session:
            result = session.execute(search_query, params)
            return result

Trying to call query_db gives me the following error:

sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedFunction) operator does not exist: record <=> vector LINE 3: ...02891638, 0.08573333, -0.011385784, -0.020549707) <=> image_... ^ HINT: No operator matches the given name and argument types. You might need to add explicit type casts.

I have tried session.execute(text("CREATE EXTENSION IF NOT EXISTS vector")), but I still run into the same error.

1 Answer 1

3

I don't use this library but I was able to get your code to execute using the libraries it seems you are using. It wasn't clear to me if <=> is commutative or not. The python wrapping for the extension seems to support using the operation as a method and that seemed to solve the problem.

class ProjectImage(Base):
    __tablename__ = "project_images"
    id = Column(Integer, Sequence("project_image_id_seq"), primary_key=True)
    image_link = Column(String(255))
    image_vector = Column(Vector(512))
    keywords = Column(String(255))
    keyword_vector = Column(Vector(768))


metadata.create_all(engine)


def query_db(
        image_encoding,
        image_search_weight,
        keyword_encoding,
        keyword_search_weight,
    ):
        # You can put this in the select() but I define it
        # first so you can see it.
        computed_col = (
            (ProjectImage.image_vector.cosine_distance(image_encoding)*image_search_weight)
            + (ProjectImage.keyword_vector.cosine_distance(keyword_encoding)*keyword_search_weight)
        ).label("vector_sum")

        search_query = select(
            computed_col
        ).order_by(
            # Reference the computed column.
            text("vector_sum")
        ).limit(50)

        with Session(engine) as session:
            result = session.execute(search_query)
            return result
# This is just dummy data I put in which appears to match what would be passed in.
query_db([1]*512, 1, [1]*768, 1)

This is output of the query with echo=True set on the engine.

SELECT (project_images.image_vector <=> %(image_vector_1)s) * %(param_1)s + (project_images.keyword_vector <=> %(keyword_vector_1)s) * %(param_2)s AS vector_sum 
FROM project_images ORDER BY vector_sum 
 LIMIT %(param_3)s

If you want the ProjectImage and the vector_sum you should be able to do this:

    search_query = select(
        ProjectImage, computed_col
    )
   # Then later...
   for (project_image, vector_sum) in session.execute(search_query):
       print(project_image, vector_sum)

I used this pgvector-python-sqlalchemy to find the cosine distance method and other examples.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.