I have a column of URLs and I'd like to extract the strings after the last "/". For example, *www.site.com/string1/string2/string3/**string4***--how would I extract just string4? All the URLs are different lengths so I'm looking for something dynamic. Thank you!
1 Answer
SELECT
column1 as url,
regexp_substr(url, '/([^/]+)$',1,1,'e',1) as tail
from values
('https://stackoverflow.com/questions/72397941/finding-substring-based-on-pattern'),
('9/30/21 22:30'),
('http://www.example.com/string1/string2/string3/**string4***--how')
;
gives:
| URL | TAIL |
|---|---|
| Finding Substring Based on Pattern? | finding-substring-based-on-pattern |
| 9/30/21 22:30 | 21 22:30 |
| http://www.example.com/string1/string2/string3/**string4***--how | string4*--how |
haha I left a date string from a prior question in there, but it shows the tail match working..
4 Comments
doubledribble
Thanks, Simeon. What if I want to only extract the URLs that ended in a document extension. For example, www.site.com/string1/string2/string3/**string4.pdf would be string4.pdf.
Simeon Pilgrim
regexp_substr(url, '/([^/]+(\\.pdf))$',1,1,'e',1) as pdfsdoubledribble
What if it's .html, .ics, etc..? Is it possible to make it dynamic?
Simeon Pilgrim
this is becoming a "how do regex" statically
regexp_substr(url, '/([^/]+((\\.pdf)|(\\.html)|(\\.ics)))$',1,1,'e',1) dynamically, yes, but then you are building strings which is do able, but very out of scope for this question.