3

I have a field called Website with examples that look like:

https://ivalore.com.br
https://cornstalk.com
https://penny.co

I am trying to use REGEXP_SUBSTR to isolate the domain: REGEXP_SUBSTR("Website", '[^https://]+')

Some of the results are working but others are not, for instance I am expecting cornstalk.com and penny.co but I am not receiving those values:

ivalore.com.br
corn
enny.co

Any help would be appreciated.

1
  • Why not a simple replace(col,'https://','')? Commented Nov 19, 2021 at 17:02

2 Answers 2

2

Using built-in PARSE_URL:

Returns a JSON object consisting of all the components (fragment, host, path, port, query, scheme) in a valid input URL/URI.

WITH cte(url) AS (
  SELECT 'https://ivalore.com.br' UNION ALL
  SELECT 'https://cornstalk.com' UNION ALL 
  SELECT 'https://penny.co'
)
SELECT url, PARSE_URL(url):"host"::TEXT AS host
FROM cte;

Output:

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

1

You can use

SELECT REGEXP_SUBSTR("Website", '^(https?://)?(.*)', 1, 1, 'e', 2)

Details:

  • ^ - start of string
  • (https?://)? - an optional Group 1: http:// or https://
  • (.*) - Group 2: the rest of the string.

The last argument, together with e last but one argument, returns the Group 2 value.

However, REGEXP_REPLACE might be better here:

SELECT REGEXP_REPLACE("Website", '^https?://', '')

That is, just remove the http:// or https:// from the start of a string.

1 Comment

Thank for this. Confirming that the REGEXP_SUBSTR worked and great point about the REGEXP_REPLACE. Also appreciate you breaking things down as well.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.