0

Lets say we want to extract the substring from url till second occurrence of /.

e.g. https://abc.def.com/abc?102/ extracted string should be abc.def.com/abc without ?102

http://abc.def/jkl/ghi/ extracted string should be abc.def/jkl

I want to achieve this without using regexp_substr/regexp_replace, which I have already tried.

2
  • Why do you specifically want to avoid regexp functions? Commented Jan 27, 2020 at 14:42
  • My observation: REGEXP is quite costly, suppose there is a large table of worth 100+ GB then it will take a lot of time. Commented Jan 27, 2020 at 14:43

1 Answer 1

1

If you specifically want to avoid regexes, you could use split_part() twice:

select split_part(url, '/', 1) || '/' || split_part(url, '/', 2)

I am unsure, however, that this would perform better than a regex-based solution. You would need to benchmark this against your real dataset.

Sign up to request clarification or add additional context in comments.

2 Comments

@SayedAweshRahman: as explained in my answer, it is hard to tell beforehand. You would probably need to test it (I'd be actually interested to know which solution performs better).
Using REGEXP is actually very costly where I benchmarked with 20M rows: REGEXP took 12 mins 30 Sec Where as SPLIT_PART just took 2 min 09 seconds to give the result

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.