Simple `to_tsvector` configuration - postgres

Question

How can I change the to_tsvector configuration to use a simple tokenization rule like:

lowercase
split by spaces only

Executing the following query:

SELECT to_tsvector('english', 'birthday=19770531 Name=John-Oliver Age=44 Code=AAA-345')

I get these lexemes:

'-345':9 '19770531':2 '44':6 'aaa':8 'age':5 'birthday':1 'code':7 'john':4 'name':3

The kind of searching I'm looking for is like:

(!birthday | birthday=19770531) & (code=AAA-345)

It means, get me all records that has a text "birthday=19770531" or doesn't have "birthday" at all, and a text equals to "code=AAA-345"). The way lexemes are being created it is not possible. I was expecting to have something like this:

'birthday=19770531':1 'age=44':2 'code=aaa-345':4 'name=john-oliver':3

jjanes · Accepted Answer · 2022-01-13 14:51:20Z

1

You would have to code a custom parser. This can only be done in C.

But you might be able to use the existing testing parser test_parser, it seems to do what you want. If not, it would at least be a good starting point.

The problem may be that this is in src/test/modules/, and I don't think it ships with most installation packaging. So it might take some effort to get it to install. It would depend on your OS, version, and package manager.

answered Jan 13, 2022 at 14:51

jjanes

44.9k5 gold badges39 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Simple `to_tsvector` configuration - postgres

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related