2

I am trying to extract the product names from the URLs below using regex in Google BigQuery:

https://www.example.com/en/uk/product/clothing/trousers-leggings/cool-grey-joggers-9800977

Product name should be: cool-grey-joggers

https://www.example.com/en/uk/product/hot-denim-dress-7842666

Product name should be: hot-denim-dress

I have tried the following regular expression:

'/product/(.+)-[0-9]+$'

but this seems to be greedy matching and will return:

clothing/trousers-leggings/cool-grey-joggers

hot-denim-dress

I have tried numerous variations of the above but all are either not matching anything or capturing too much.

How can I modify so it captures the product name only and not the the parent sub folders as well?

2 Answers 2

3

If /product/ has to be part of the path:

/product/(?:[^/]+/)*([^/]+)-[0-9]+$

The pattern matches:

  • /product/ Match literally
  • (?:[^/]+/)* Optionally repeat matching parts that end with a /
  • ([^/]+) Capture group 1, match 1+ occurrence of any char except /
  • -[0-9]+$ Match - and 1+ digits till the end of the string

Regex demo

Sign up to request clarification or add additional context in comments.

Comments

1

Try this one:

SELECT REGEXP_EXTRACT('https://www.example.com/en/uk/product/clothing/trousers-leggings/cool-grey-joggers-9800977', r'([^/]+)-[0-9]+$')
UNION ALL
SELECT REGEXP_EXTRACT('https://www.example.com/en/uk/product/hot-denim-dress-7842666', r'([^/]+)-[0-9]+$')

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.