2

We started using Google Data Studio to visualize our data, and we need RegEx to help us extract a specific piece of text from the URL list from our system.

URL example:

/town/articletype/46646-this-is-an-example-article

What we need from the URL by using RegEx:

  • /town/ (without slashes, and eventually capitalized the first letter, if possible)
  • /articletype/ (also without slashes)
  • /46646- (without / and - && this is the article ID we also need)
  • -this-is-an-example-article (without "-" and capitalized first letter)

We tried numerous RegEx, and we managed to extract /town/ from the URL by using the following Calculated Field:

REGEXP_EXTRACT(Page , '/(.*?)(/)')
3
  • 1
    If we can consider the URL will always follow that format, this expression will give you 4 groups containing each part of your URL: ^\/([a-z]+)\/([a-z]+)\/([0-9]+)-(\S+)$. Now, about removing the - and capitalizing, I suggest you do that using something else other than regular expressions. Commented May 7, 2020 at 10:06
  • Gerep, thanks, this is great! I don't think we can use groups for our needs, our best way is to use separate expressions for each part of the URL. I managed to make it work for the town, article ID and an article name, but can't make it work for article type. I'm not sure what I am missing here. Just to clarify: TOWN: ^\/([a-z]+)\/ --- ARTICLE ID: ([0-9]+) --- ARTICLE NAME: -(\S+)$. Commented May 7, 2020 at 10:32
  • Because town and article type has the same pattern, the best thing is to match them at once as suggested by @Gerep. Or you can use the town pattern with g flag and get the second matched item if you must do them separately. Commented May 7, 2020 at 11:21

1 Answer 1

1

The 4 Calculated Fields below do the trick:

1) Town

CONCAT(UPPER(REGEXP_EXTRACT(Page , "^/(\\w{1})")), LOWER(REGEXP_EXTRACT(Page , "^/\\w{1}([^/]*)")))

2) articletype

REGEXP_EXTRACT(Page , "^/\\w+/([^/]*)")

3) 46646

REGEXP_EXTRACT(Page , "^/\\w+/\\w+/([^-]*)")

4) This is an example article

CONCAT(UPPER(REGEXP_EXTRACT(Page , "/\\w+/\\w+/\\d+-(\\w{1}).*$")), LOWER(REGEXP_REPLACE(REGEXP_EXTRACT(Page , "/\\w+/\\w+/\\d+-\\w{1}(.*)$"), "-", " ")))

Google Data Studio Report and a GIF to elaborate:

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.