1

Given a url like https://xyz.abc.yahoo.com/issues/80483987/tasks/1

How can write a SQL query to replace/extract urls like this with their integer ID that is 80483987

Using REGEXP_REPLACE(mystr, r'[^\d]+', ' ') gives me 80483987 1

The 1 at the end being a problem.

2
  • Which SQL version? Commented Jun 20, 2019 at 23:11
  • What makes 80483987 any different than any other number(s) ? No use trying to help you if you can't answer the question that is the core. Commented Jun 20, 2019 at 23:41

2 Answers 2

1

You may simply REGEXP_EXTRACT the numbers after /issues/ substring:

REGEXP_EXTRACT(mystr, r'/issues/([0-9]+)')

See the regex demo.

The /issues/ will get matched and ([0-9]+) will capture 1 or more digits into capturing group #1 and that is the value returned by REGEXP_EXTRACT.

Sign up to request clarification or add additional context in comments.

Comments

1

Apply REGEXP_REPLACE twice:

REGEXP_REPLACE(REGEXP_REPLACE(mystr, r'^[^\d]+', ''), r'/.*$', '')

Explanation

The inner call removes all leading non-digits, the outer call removes the suffix after the id. ^ and $ are so-called anchors and do not represent a character but the abstract (0-length) notions of 'beginning / end of the test string'. This will work in all common regex flavors and engines.

The solution hinges on the numerical id constituting a location segment in the url.

Note that the approach is fragile: eg. it will fail for urls with a port number.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.