1

I have the following string:

nothing to match
<-
this rocks should match as should this still and this rocks and still
->
should not match still or rocks
<- no matches here ->

And i want to find all matches of 'rocks' and 'still', but only when they are within <- ->

The purpose is to markup glossary words but be able to only mark them up in areas of text that are defined by the editor.

I currently have:

<-.*?(rocks|still).*?->

This unfortunately only matches the first 'rocks' and ignores all subsequent instances and all the 'still's

I have this in a Rubular

The usage of this will be somthing like

 Regexp.new( '<-.*?(' + self.all.map{ |gt| gt.name }.join("|") + ').*?->', Regexp::IGNORECASE, Regexp::MULTILINE )

Thanks in advance for any help

3 Answers 3

1

There may be a way to do this with a single regex, but it will probably be simpler to just do it in two steps. First match all of the markups, and then search the markups for the glossary words:

text = <<END
nothing to match
<-
this rocks should match as should this still and this rocks and still
->
should not match still or rocks
<- no matches here ->
END

text.scan(/<-.*?->/m).each do |match| 
    print match.scan(/rocks|still/), "\n"
end

Also, you should probably note that regex is only a good solution here if there is never any nested markup (<-...<-...->...->) and no escaped <- or -> whether it is inside or outside of a markup.

Sign up to request clarification or add additional context in comments.

Comments

1

Don't forget your Ruby string methods. Use them first before considering regular expressions

$ ruby -0777 -ne '$_.split("->").each{|x| x.split("<-").each{|y| puts "#{y}" if (y[/rocks.*still/]) }   }' file

Comments

0

In Ruby, it depends on what you want to do with the regexp. You're matching a regular expression against a string, so you'll be using String methods. Certain of these will have an effect on all matches (e.g. gsub or rpartition); others will have an effect on only the first match (e.g. rindex, =~).

If you're working with any of the latter (that return only the first match), you'll want to make use of a loop that calls the method again, starting from a certain offset. For example:

# A method to print the indices of all matches
def print_match_indices(string, regex)
  i = string.rindex(regex, 0)
  while !i.nil? do 
    puts i
    i = string.rindex(regex, i+1)
  end
end

(Yes, you can use split first, but I expect that a regex loop like the foregoing would require fewer system resources.)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.