1

exapl I have specific situation. I am trying to replace some words in string. I have two example strings:

string1 = "aaabbb aaa bbb" 
string2 = "a. bbb"

In string1 I want to replace full word "aaa" with "ccc" so I do it right this:

translation = "aaa"
string1.gsub(/\b#{translation}\b/, "ccc") => "aaabbb ccc bbb"

So it work and I am happy but when I try to replace "a." with "aaa" It not work and It returns string2.

I tried also this:

translation = "a."
string2.gsub(translation, "aaa") => "aaa bbb"

But when I use above gsub for string1 I get "cccbbb ccc bbb". Sorry for ma English but I hope that I explained it a little understandable. Thanks for all answers.

3 Answers 3

2

Try

string1.gsub(/\b#{Regexp.escape(translation)}\b/, "ccc")

In regex '.' means "any character". by calling escape you are turning 'a.' to 'a\.' which means "a and then the period character".


Update
As @Daniel has noted in the comments, word boundaries have some subtleties. So for the above to work with "a." you need to replace the \b with look-aheads and look-behinds:

    string1.gsub(/(?<!\w)#{Regexp.escape(translation)}(?!\w)/, "ccc")
    # => "ccc bbb"
Sign up to request clarification or add additional context in comments.

8 Comments

Sure, but "a. bbb".gsub("\b#{Regexp.escape("a.")}\b", "ccc") does not replace "a." by "ccc" like it should. I think the \b does not match after . since . is not a word character. So this does not really answer the question of replacing "a.".
@DaniëlKnippers updated my answer to work with such curious cases
@UriAgassi Good, I was about to post an answer myself using lookahead and lookbehind, but now I'll give you +1 ;) Btw there is still a typo, you need } after (translation) to close the #{ in both your code blocks.
@UriAgassi Please note that !\w is not suitable for cases like this: "b.a. bbb".gsub(/(?<!\w)#{Regexp.escape("a.")}(?!\w)/, "ccc") #=> "b.ccc bbb" I don't think this effect is wanted. Maybe is better to check for: whitespace || start / end of line
@mdesantis I must have read your comment as 'start/end of string', my bad. We agree then :).
|
1

Since \w excludes dots, which I guess OP wants to include between token characters, I propose a whitelist lookarounds approach:

string = "a. b.a. a. bbb"
translation = "a."

# Using !\w b.a. is not considered as a single token
string.gsub(/(?<!\w)#{Regexp.escape(translation)}(?!\w)/, "ccc")
# Notice b.ccc
#=> "ccc b.ccc ccc bbb"

# Using \s b.a. is considered as a single token
string.gsub(/(?<=^|\s)#{Regexp.escape(translation)}(?=\s|$)/, "ccc")
# Notice b.a.
#=> "ccc b.a. ccc bbb"

Anyway, the rightness of my reasoning depends by OP needs ;-)

Comments

0

The . (dot) has a special meaning in regexes: it means match any character.

You should escape it with \.

1 Comment

While your statements are true, this alone does not solve the problems with the replacement due to the behavior of the word boundary \b in the Regexp. See the discussion at Uri Agassi's answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.