1

I am trying to write a Ruby regex that will return a set of named matches. If the first element (defined by slashes) is found anywhere later in the string then I want the match to return that 2nd match onward. Otherwise, return the whole string. The closest I've gotten is (?<p1>top_\w+).*?(?<hier>\k<p1>.*) which doesn't work for the 3rd item. I've tried regex ifthen-else constructs but Rubular says it's invalid. I've tried (?<p1>[\w\/]+?)(?<hier>\k<p1>.*) which correct splits the 1st and 4th lines but doesn't work for the others. Please note: I want all results to return as the same named reference so I can iterate through "hier".

Input:

top_cat/mouse/dog/top_cat/mouse/dog/elephant/horse
top_ab12/hat[1]/top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool
top_bat/car[0]
top_2/top_1/top_3/top_4/top_2/top_1/top_3/top_4/dog

Output:

hier = top_cat/mouse/dog/elephant/horse
hier = top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool
hier = top_bat/car[0]
hier = top_2/top_1/top_3/top_4/dog
0

2 Answers 2

1

Problem

The reason it does not match the second line is because the second instance of hat does not end with a slash, but the first instance does.

Solution

Specify that there is a slash between the first and second match

Regex

(top_.*)/(\1.*$)|(^.*$)

Replacement

hier = \2\3

Example

Regex101 Permalink


More info on the Alternation token

To explain how the | token works in regex, see the example: abc|def
What this regex means in plain english is:

  • Match either the regex below (attempting the next alternative only if this one fails)
    • Match the characters abc literally
  • Or match the regex below (the entire match attempt fails if this one fails to match)
    • Match the characters def literally

Example
Regex: alpha|alphabet
If we had a phrase "I know the alphabet", only the word alpha would be matched.
However, if we changed the regex to alphabet|alpha, we would match alphabet.

So you can see, alternation works in a left-to-right fashion.

Sign up to request clarification or add additional context in comments.

4 Comments

Well, this wasn't exactly what I wanted but it led me down the path to fixing the regex (your use of the | ). (?<p1>[\w\/\[\]]+)?(?<hier>(\k<p1>.*)|(^.*$)) will always store the desired match into the named group "hier".
The alternation is simply there in case the first bit doesn't match, but hey, if that works - awesome! :)
Excellent answer. It gave me new insight into the use of |.
Thanks! I've just added even more about that token for others to see.
1
paths = %w(
  top_cat/mouse/dog/top_cat/mouse/dog/elephant/horse
  top_ab12/hat/top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool
  top_bat/car[0]
  top_2/top_1/top_3/top_4/top_2/top_1/top_3/top_4/dog
  test/test
)

paths.each do |path|
  md = path.match(/^([^\/]*).*\/(\1(\/.*|$))/)
  heir = md ? md[2] : path
  puts heir
end

Output:

top_cat/mouse/dog/elephant/horse
top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool
top_bat/car[0]
top_2/top_1/top_3/top_4/dog
test

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.