0

How can I change the to_tsvector configuration to use a simple tokenization rule like:

  • lowercase
  • split by spaces only

Executing the following query:

SELECT to_tsvector('english', 'birthday=19770531 Name=John-Oliver Age=44 Code=AAA-345')

I get these lexemes:

'-345':9 '19770531':2 '44':6 'aaa':8 'age':5 'birthday':1 'code':7 'john':4 'name':3

The kind of searching I'm looking for is like:

(!birthday | birthday=19770531) & (code=AAA-345)

It means, get me all records that has a text "birthday=19770531" or doesn't have "birthday" at all, and a text equals to "code=AAA-345"). The way lexemes are being created it is not possible. I was expecting to have something like this:

'birthday=19770531':1 'age=44':2 'code=aaa-345':4 'name=john-oliver':3

1 Answer 1

1

You would have to code a custom parser. This can only be done in C.

But you might be able to use the existing testing parser test_parser, it seems to do what you want. If not, it would at least be a good starting point.

The problem may be that this is in src/test/modules/, and I don't think it ships with most installation packaging. So it might take some effort to get it to install. It would depend on your OS, version, and package manager.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.