0

I want to append </tag> to each line where it's missing:

text = '<tag>line 1</tag>
        <tag>line2         # no closing tag, append
        <tag>line3         # no closing tag, append
             line4</tag>   # no opening tag, but has a closing tag, so ignore
        <tag>line5</tag>'

I tried to create a regular expression to match this but I know its wrong:

text.gsub! /.*?(<\/tag>)Z/, '</tag>'

How can I create a regular expression to conditionally append each line?

3
  • 3
    Are you absolutely sure that each line contains exactly one tag? Are there going to be nested tags? Seeing how support for negative lookbehind seems to be a bit funky in Ruby, it might be easier just to split these lines and look for a </tag> substring and append one if you can't find it. Commented Aug 23, 2013 at 21:51
  • In my example there should always be a </tag> at the end of a line. Commented Aug 23, 2013 at 21:56
  • @NullUserException - what's funky about ruby lookbehind? I think you're imagining pre-1.9 scenarios. Commented Aug 24, 2013 at 0:36

4 Answers 4

2

Here you go:

text.gsub!(%r{(?<!</tag>)$}, "</tag>")

Explanation:

$ means end of line and \z means end of string. \Z means something similar, with complications.

(?<!) work together to create a negative lookbehind.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks! I've deleted my comments and moved the explanation into the answer so it is easier for other people to see.
0

Given the example provided, I'd just do something like this:

text.split(/<\/?tag>/).
     reject {|t| t.strip.length == 0 }.
     map {|t| "<tag>%s</tag>" % t.strip }.
     join("\n")

You're basically treating either and as record delimiters, so you can just split on them, reject any blank records, then construct a new combined string from the extracted values. This works nicely when you can't count on newlines being record delimiters and will generally be tolerant of missing tags.

If you're insistent on a pure regex solution, though, and your data format will always match the given format (one record per line), you can use a negative lookbehind:

text.strip.gsub(/(?<!<\/tag>)(\n|$)/, "</tag>\\1")

Comments

0

One that could work is:

/<tag>[^\n ]+[^>][\s]*(\n)/

This is will return all the newline chars without a ">" before them.

Replace it with "\n", i.e.

text.gsub!( /<tag>[^\n ]+[^>][\s]*(\n)/ , "</tag>\n")

For more polishing, try http://rubular.com/

Comments

0
text = '<tag>line 1</tag>
        <tag>line2        
        <tag>line3
        line4</tag>
        <tag>line5</tag>'

result = ""

text.each_line do |line|
  line.rstrip!
  line << "</tag>" if not line.end_with?("</tag>")
  result << line << "\n"
end

puts result

--output:--
<tag>line 1</tag>
        <tag>line2</tag>
        <tag>line3</tag>
        line4</tag>
        <tag>line5</tag>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.