2

Writing a globalization module for a web application and I need a regexp to replace all instances of a word with another word (the translation) - except - words found within a URL/URI.

EDIT: I forgot to mention that I'm using Ruby, so I can't use 'Lookbehind'

3
  • 3
    Doing translation by word replacement is doomed to failure. Commented Jan 29, 2010 at 15:30
  • Ive tried using this: '/((?<=>|^)[^<]*)(\bfoo\b)([^<]*(?=<|$))/i' But it requires Lookbehind, which Ruby doesn't support Commented Jan 29, 2010 at 15:40
  • Ruby 1.9 supports lookbehind. Are you using 1.8? Commented Jan 29, 2010 at 17:08

3 Answers 3

4
  • Split on URI regular expression; include the URI's in the result.
  • For each piece:
    • if it is a URI, leave it alone
    • otherwise, do word replacement
  • Join the pieces

Code:

# From RFC 3986 Appendix B, with these modifications:
#   o Spaces disallowed
#   o All groups non-matching, except for added outermost group
#   o Not anchored
#   o Scheme required
#   o Authority required
URI_REGEX = %r"((?:(?:[^ :/?#]+):)(?://(?:[^ /?#]*))(?:[^ ?#]*)(?:\?(?:[^ #]*))?(?:#(?:[^ ]*))?)"

def replace_except_uris(text, old, new)
  text.split(URI_REGEX).collect do |s|
    if s =~ URI_REGEX
      s
    else
      s.gsub(old, new)
    end
  end.join
end

text = <<END
stack http://www.stackoverflow.com stack
stack http://www.somewhere.come/stack?stack=stack#stack stack
END

puts replace_except_uris(text, /stack/, 'LINKED-LIST')

# => LINKED-LIST http://www.stackoverflow.com LINKED-LIST
# => LINKED-LIST http://www.somewhere.come/stack?stack=stack#stack LINKED-LIST
Sign up to request clarification or add additional context in comments.

Comments

0

You can probaby use something like

(?<!://[^ ]*)\bfoo\b

But this probably isn't perfect, it just looks that the word doesn't appear in a single non-whitespace string of characters that don't have :// somewhere before the word.

PS Home:\> "foo foobar http://foo_bar/baz?gak=foobar baz foo" -replace '(?<!://[^ ]*)\bfoo\b', 'FOO'
FOO foobar http://foo_bar/baz?gak=foobar baz FOO

Comments

0

Have you tried splitting your text into words and iterating over the words? Then you can examine each word, determine if it's a URI, translate it if it isn't.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.