Ruby REGEX split, any issues with the code

Question

I am a rookie in Regex for Ruby. I read some tutorials and evaluated a piece of code. Please let me know if I can do it in a better way.

Here is my text which needs to be split at {iwsection(*)} and {{usersection}}

    t='{{iwsection(1)}}
    This has some sample text 1 - line 1
    This has some sample text 1 - line 2
    {{iwsection(2)}}
    This has some sample text 2
    {{iwsection(3)}}
    This has some sample text 3
    {{usersection}}
    This is a user section.
    This has some sample text
    This has some sample text'

Here is the ruby regex code I was able to manage.

    t.split(/^({{[i|u][wsection]\w*...}})/)

Thank You.

The Desired Output : A array as,

    [ '{{iwsection(1)}}', 'This has some sample text 1\nThis has some sample text 1 - line 2',
    '{{iwsection(2)}}', 'This has some sample text 2',
    '{{iwsection(3)}}', 'This has some sample text 3',
    '{{usersection}}', 'This is a user section\nThis has some sample text\nThis has some sample text.']

With this I will build a Hash,

    { 
    '{{iwsection(1)}}' => 'This has some sample text 1\nThis has some sample text 1 - line 2',
    '{{iwsection(2)}}' => 'This has some sample text 2',
    '{{iwsection(3)}}' => 'This has some sample text 3',
    '{{usersection}}' => 'This is a user section\nThis has some sample text\nThis has some sample text.'
    }

Edit: .....

The code.

    section_array = text.chomp.split(/\r\n|\n/).inject([]) do |a, v|
    if v =~ /{{.*}}/
      a << [v.gsub(/^{{|}}$/, ""), []]
    else
      a.last[1] << v
    end
    a
    end.select{ |k, v| (k.start_with?("iwsection") || k.start_with?("usersection")) }.map{ |k, v| ["{{#{k}}}", v.join("\n")] }

Whats your desired output array? Please post an example of what you would want the results to look like. — Cody Caughlan
– Cody Caughlan, Commented Aug 17, 2014 at 20:05
You shouldn't have the Rails tag here, as this is a pure-Ruby question. Having a superfluous tag may cause some to waste time, others (who filter out Rails questions) to not see the question. — Cary Swoveland
– Cary Swoveland, Commented Aug 17, 2014 at 20:22
Depending on what you are actually trying to do, it looks like either a config parser (e.g. parseconfig) or a templating solution (e.g. Mustache) could possibly solve your problem in a cleaner way. — Mark Thomas
– Mark Thomas, Commented Aug 18, 2014 at 13:44

konsolebox · Accepted Answer · 2014-08-18 14:37:57Z

1

Using String#scan:

> t.scan(/{{([^}]*)}}\r?\n(.*?)\r?(?=\n{{|\n?$)/)
=> [["iwsection(1)", "This has some sample text 1"], ["iwsection(2)", "This has some sample text 2"], ["iwsection(3)", "This has some sample text 3"], ["usersection", "This is a user section."]]

> h = t.scan(/{{([^}]*)}}\r?\n(.*?)\r?(?=\n{{|\n?$)/).to_h
=> {"iwsection(1)"=>"This has some sample text 1", "iwsection(2)"=>"This has some sample text 2", "iwsection(3)"=>"This has some sample text 3", "usersection"=>"This is a user section."}

> h.values
=> ["This has some sample text 1", "This has some sample text 2", "This has some sample text 3", "This is a user section."]

> h.keys
=> ["iwsection(1)", "iwsection(2)", "iwsection(3)", "usersection"]

> h["usersection"]
=> "This is a user section."

Update:

#!/usr/bin/env ruby
t = "{{iwsection(1)}}\nThis has some sample text 1 - line 1\nThis has some sample text 1 - line 2\n{{iwsection(2)}}\nThis has some sample text 2\n{{iwsection(3)}}\nThis has some sample text 3\nThis has some sample text\nThis has some sample text\n{{usersection}}\nThis is a user section.\nThis has some sample text\nThis has some sample text"
h = t.chomp.split(/\n/).inject([]) do |a, v|
  if v =~ /{{.*}}/
    a << [v.gsub(/^{{|}}$/, ""), []]
  else
    a.last[1] << v
  end
  a
end.select{ |k, v| k.start_with? "iwsection" or k === "usersection" }.map{ |k, v| [k, v.join("\n")] }.to_h
puts h.inspect

Output:

{"iwsection(1)"=>"This has some sample text 1 - line 1\nThis has some sample text 1 - line 2", "iwsection(2)"=>"This has some sample text 2", "iwsection(3)"=>"This has some sample text 3\nThis has some sample text\nThis has some sample text", "usersection"=>"This is a user section.\nThis has some sample text\nThis has some sample text"}

edited Aug 18, 2014 at 14:37

answered Aug 17, 2014 at 20:39

konsolebox

76.3k13 gold badges110 silver badges114 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

rupeshj Over a year ago

Wow..!! I did not know about this method in String class. I will give it a try. This actually fits my desired output.

rupeshj Over a year ago

Looks like this , scans any text between {{}} and prepares a array/hash. Can we limit this to only text with "iwsection" and "usersection"

konsolebox Over a year ago

@rupeshj Rather than making the regex dirty, just select() needed text instead: t.scan(/{{([^}]*)}}\r?\n(.*?)\r?(?=\n{{|\n?$)/).select{ |k, v| k.start_with? "iwsection" || k == "usersection" }

rupeshj Over a year ago

Sure, Thanks. I will use select.

rupeshj Over a year ago

This one captures only one line, I am not getting the result when I have multiple lines in the section. :(

|

Cary Swoveland · Accepted Answer · 2014-08-18 21:47:38Z

You can do that like this:

t.split(/{{iwsection\(\d+\)}}|{{usersection}}/)
  #=> ["", "\n    This has some sample text 1\n    ",
  #    "\n    This has some sample text 2\n    ",
  #    "\n    This has some sample text 3\n    ",
  #    "\n    This is a user section."]

That's what you asked for, but if you want to clean that up, add .map(&:strip):

t.split(/{{iwsection\(\d+\)}}|{{usersection}}/).map(&:strip).map(&:strip)
  #=> ["", "This has some sample text 1", "This has some sample text 2",
  #    "This has some sample text 3", "This is a user section."]

You may not want the empty string at offset zero, but that's how String#split works when you are splitting on a substring that is at the beginning of the string. Suppose the string were instead:

t =
'Some text here{{iwsection(1)}}
This has some sample text 1
{{iwsection(2)}}
This has some sample text 2'

t.split(/{{iwsection\(\d+\)}}|{{usersection}}/).map(&:strip).map(&:strip)
  #=> ["Some text here", "This has some sample text 1",
  #    "This has some sample text 2"]

Here you want "Some text here", so you can't just delete the first element of the array.

Additional requirements

To satisfied your added requirement, you could do this:

t='{{iwsection(1)}}
Text 1 - line 1
Text 1 - line 2
{{iwsection(2)}}
Text 2
{{iwsection(3)}}
Text 3
{{usersection}}
User section.
Text
Text' 

h = t.scan(/(?:{{iwsection\(\d+\)}}|{{usersection}})/)
     .zip(t.split(/{{iwsection\(\d+\)}}|{{usersection}}/)[1..-1])
     .map { |s1,s2| [s1, s2.strip
                           .lines
                           .map(&:strip)
                           .join("\n")] }
     .to_h
  #=> {"{{iwsection(1)}}"=>"Text 1 - line 1\nText 1 - line 2",
  #    "{{iwsection(2)}}"=>"Text 2",
  #    "{{iwsection(3)}}"=>"Text 3",
  #    "{{usersection}}"=>"User section.\nText\nText"}

Note that this formatting may not be understood by IRB or PRY, but will work fine from the command line.

Explanation

a = t.scan(/(?:{{iwsection\(\d+\)}}|{{usersection}})/)
  #=> ["{{iwsection(1)}}", "{{iwsection(2)}}", "{{iwsection(3)}}", "{{usersection}}"]
b = t.split(/{{iwsection\(\d+\)}}|{{usersection}}/)
  #=> ["", "\n    Text 1 - line 1\n    Text 1 - line 2\n    ",
  #    "\n    Text 2\n    ", "\n    Text 3\n    ",
  #    "\n    User section.\n    Text\n    Text"]
c = b[1..-1]
  #=> ["\n    Text 1 - line 1\n    Text 1 - line 2\n    ",
  #    "\n    Text 2\n    ", "\n    Text 3\n    ",
  #    "\n    User section.\n    Text\n    Text"]
h = a.zip(c)
  #=> [["{{iwsection(1)}}", "\n    Text 1 - line 1\n    Text 1 - line 2\n    "],
  #    ["{{iwsection(2)}}", "\n    Text 2\n    "],
  #    ["{{iwsection(3)}}", "\n    Text 3\n    "],
  #    ["{{usersection}}", "\n    User section.\n    Text\n    Text"]]
d = h.map { |s1,s2| [s1, s2.strip
                           .lines
                           .map(&:strip)
                           .join("\n")] }
  #=> [["{{iwsection(1)}}", "Text 1 - line 1\nText 1 - line 2"],
  #    ["{{iwsection(2)}}", "Text 2"], ["{{iwsection(3)}}", "Text 3"],
  #    ["{{usersection}}", "User section.\nText\nText"]]
d.to_h
  #=> {"{{iwsection(1)}}"=>"Text 1 - line 1\nText 1 - line 2",
  #    "{{iwsection(2)}}"=>"Text 2",
  #    "{{iwsection(3)}}"=>"Text 3",
  #    "{{usersection}}"=>"User section.\nText\nText"}

How can I retain the iwsection or at the number inside iwsection/usersection when I split it ?

Collectives™ on Stack Overflow

Ruby REGEX split, any issues with the code

2 Answers 2

Update:

8 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Update:

8 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related