2

how would I do the following? I tried doing this gsub but I can't figure out what really efficient if strings_to_highlight array is large. Cheers!

  string = "Roses are red, violets are blue"

  strings_to_highlight = ['red', 'blue']

  # ALGORITHM HERE

  resulting_string = "Roses are (red), violets are (blue)"
2
  • 1
    Possible duplicate of How to replace text in a ruby string Commented Oct 19, 2016 at 17:41
  • Welcome to Stack Overflow. Please read "minimal reproducible example". We'd like to see your effort toward solving the problem, not just a framework with "# ALGORITHM HERE". We're here to help fix problems with your code, not write the code. Commented Oct 20, 2016 at 0:59

4 Answers 4

5

Regexp has a helpful union function for combining regular expressions together. Stick with regexp until you have a performance problem:

string = "Roses are red, violets are blue"
strings_to_highlight = ['red', 'blue']

def highlight(str, words)
  matcher = Regexp.union words.map { |w| /\b(#{Regexp.escape(w)})\b/ }
  str.gsub(matcher) { |word| "(#{word})" }
end

puts highlight(string, strings_to_highlight)
Sign up to request clarification or add additional context in comments.

3 Comments

Great use of Regexp.union here, but do try and link through to the current version of the documentation, and also using the [...](...) notation to avoid ugly bare URLs.
Be really careful with Regexp.union words.map { |w| /\b(#{Regexp.escape(w)})\b/ } # => /(?-mix:\b(red)\b)|(?-mix:\b(blue)\b)/. The pattern generated has potential pitfalls in it. Plus the pattern generated could be a lot more simple.
@theTinMan what are the pitfalls? The resulting regexp is overly complicated because it preserves the options of all the component patterns (the ?-mix means m,i,and x are all not enabled for each sub-pattern), but I'd still rather use Regexp#union over building a big pattern with regexp string interpolation and literal "|".
4
strings_to_highlight = ['red', 'blue']
string = "Roses are red, violets are blue"

strings_to_highlight.each { |i| string.gsub!(/\b#{i}\b/, "(#{i})")}

8 Comments

What happens when string = 'redeem'?
give it a try, but I would assume that you would get (red)eem
@CdotStrifeVII please fix your answer so that it works as required.
This works but gets geometrically slower on longer arrays and longer strings.
Hey @sagarpandya82 so for as I can tell the requested features are an algorithm that replaces sub strings in a string that match strings in an array and my solution does that. What were you referring to?
|
2

I suggest using the form of String#gsub that employs a hash for making substitutions.

strings_to_highlight = ['red', 'blue']

First construct the hash.

h = strings_to_highlight.each_with_object({}) do |s,h|
  h[s] = "(#{s})"
  ss = "#{s[0].swapcase}#{s[1..-1]}"
  h[ss] = "(#{ss})"
end
  #=> {"red"=>"(red)", "Red"=>"(Red)", "Blue"=>"(Blue)", "blue"=>"(blue)"} 

Next define a default proc for it:

h.default_proc = ->(h,k) { k }

so that if h does not have a key k, h[k] returns k (e.g., h["cat"] #=> "cat").

Ready to go!

string = "Roses are Red, violets are blue"

string.gsub(/[[[:alpha:]]]+/, h)
 => "Roses are (Red), violets are (blue)"

This should be relatively efficient as only one pass through the string is needed and hash lookups are very fast.

Comments

1

I'd use:

string = "Roses are red, violets are blue"
strings_to_highlight = ['red', 'blue']

string.gsub(/\b(#{Regexp.union(strings_to_highlight).source})\b/) { |s| "(#{s})" } # => "Roses are (red), violets are (blue)"

Here's how it breaks down:

/\b(#{Regexp.union(strings_to_highlight).source})\b/ # => /\b(red|blue)\b/

It's important to use source when embedding a pattern. Without it results in:

/\b(#{Regexp.union(strings_to_highlight)})\b/ # => /\b((?-mix:red|blue))\b/

and that (?-mix:...) part can cause problems if you don't understand what it means in regex-ese. The Regexp documentation explains the flags but failing to do this can lead to a really hard to diagnose bug if you're not aware of the problem.

\b tells the engine to match words, not substrings. Without that you could end up with:

string = "Fred, bluette"
strings_to_highlight = ['red', 'blue']
string.gsub(/(#{Regexp.union(strings_to_highlight).source})/) { |s| "(#{s})" } 
# => "F(red), (blue)tte"

Using a block with gsub allows us to perform calculations on the matched values.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.