2

I use the tsearch2 in PostgreSQL to extract the urls from text. Everything works fine with default tools, but there's a problem with YouTube links: urls which I get from parser are all lowercased - and YT links cannot be.

I did a little research and found that there is no option that could disable lowering - all I could do is to write my own parser.

Am I right? Maybe there is any magic way to make parser case-sensitive? If no - is there anybody with appropriate parser written? If also no - do you have any advices, how to do it properly? :)

Thanks for help, xaru

1 Answer 1

1

you can recheck tsearch result by LIKE that is case sensitive. If there are not too much conflicts, then this solution should be fast

SELECT * FROM (SELECT url
                  FROM your_tab
                 WHERE to_tsvector(..) @@ to_tsquery(..)
                 OFFSET 0) s
  WHERE s.url LIKE '%Bbx%' 
Sign up to request clarification or add additional context in comments.

2 Comments

I'm afraid it will not work. I use following code to extract urls: SELECT id,to_tsvector('public.urls_extraction', content) AS url FROM pages... I store this tsvectors in separate table, from where I get them when I need to query something. If I use your code, I get this error: ERROR: operator does not exist: tsvector ~~* unknown
you cannot use LIKE to tsvector type, but to original string

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.