Using REGEXP to extract specific text between slashes from URL

Question

We started using Google Data Studio to visualize our data, and we need RegEx to help us extract a specific piece of text from the URL list from our system.

URL example:

/town/articletype/46646-this-is-an-example-article

What we need from the URL by using RegEx:

/town/ (without slashes, and eventually capitalized the first letter, if possible)
/articletype/ (also without slashes)
/46646- (without / and - && this is the article ID we also need)
-this-is-an-example-article (without "-" and capitalized first letter)

We tried numerous RegEx, and we managed to extract /town/ from the URL by using the following Calculated Field:

REGEXP_EXTRACT(Page , '/(.*?)(/)')

If we can consider the URL will always follow that format, this expression will give you 4 groups containing each part of your URL: ^\/([a-z]+)\/([a-z]+)\/([0-9]+)-(\S+)$. Now, about removing the - and capitalizing, I suggest you do that using something else other than regular expressions. — user745235
– user745235, Commented May 7, 2020 at 10:06
Gerep, thanks, this is great! I don't think we can use groups for our needs, our best way is to use separate expressions for each part of the URL. I managed to make it work for the town, article ID and an article name, but can't make it work for article type. I'm not sure what I am missing here. Just to clarify: TOWN: ^\/([a-z]+)\/ --- ARTICLE ID: ([0-9]+) --- ARTICLE NAME: -(\S+)$. — Dan
– Dan, Commented May 7, 2020 at 10:32
Because town and article type has the same pattern, the best thing is to match them at once as suggested by @Gerep. Or you can use the town pattern with g flag and get the second matched item if you must do them separately. — Ikechukwu Eze
– Ikechukwu Eze, Commented May 7, 2020 at 11:21

Nimantha · Accepted Answer · 2020-05-07 11:41:07Z

1

The 4 Calculated Fields below do the trick:

1) Town

CONCAT(UPPER(REGEXP_EXTRACT(Page , "^/(\\w{1})")), LOWER(REGEXP_EXTRACT(Page , "^/\\w{1}([^/]*)")))

2) articletype

REGEXP_EXTRACT(Page , "^/\\w+/([^/]*)")

3) 46646

REGEXP_EXTRACT(Page , "^/\\w+/\\w+/([^-]*)")

4) This is an example article

CONCAT(UPPER(REGEXP_EXTRACT(Page , "/\\w+/\\w+/\\d+-(\\w{1}).*$")), LOWER(REGEXP_REPLACE(REGEXP_EXTRACT(Page , "/\\w+/\\w+/\\d+-\\w{1}(.*)$"), "-", " ")))

Google Data Studio Report and a GIF to elaborate:

edited May 7, 2020 at 11:41

answered May 7, 2020 at 11:18

Nimantha

6,5476 gold badges32 silver badges78 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Using REGEXP to extract specific text between slashes from URL

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related