Converting a multi line string to an array in Ruby using line breaks as delimiters

Question

I would like to turn this string

"P07091 MMCNEFFEG

P06870 IVGGWECEQHS

SP0A8M0 VVPVADVLQGR

P01019 VIHNESTCEQ"

into an array that looks like in ruby.

["P07091 MMCNEFFEG", "P06870 IVGGWECEQHS", "SP0A8M0 VVPVADVLQGR", "P01019 VIHNESTCEQ"]

using split doesn't return what I would like because of the line breaks.

What split returns? This information may be important.

Darek Nędza
– Darek Nędza

2014-06-23 18:45:32 +00:00
Commented Jun 23, 2014 at 18:45 — Darek Nędza
– Darek Nędza, Commented Jun 23, 2014 at 18:45

Cary Swoveland · Accepted Answer · 2021-12-02 18:22:42Z

43

This is one way to deal with blank lines:

string.split(/\n+/)

For example,

string = "P07091 MMCNEFFEG

P06870 IVGGWECEQHS

SP0A8M0 VVPVADVLQGR




P01019 VIHNESTCEQ"

string.split(/\n+/)
  #=> ["P07091 MMCNEFFEG", "P06870 IVGGWECEQHS",
  #    "SP0A8M0 VVPVADVLQGR", "P01019 VIHNESTCEQ"]

To accommodate files created under Windows (having line terminators \r\n) replace the regular expression with /(?:\r?\n)+/.

edited Dec 2, 2021 at 18:22

answered Jun 23, 2014 at 17:02

Cary Swoveland

111k6 gold badges69 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

inquisitive Over a year ago

what if we want to split in such a way that for each line the first word becomes the key and remaining (whatever no of words) becomes value as an array?

Cary Swoveland Over a year ago

Suppose string = "Now is the time\nfor all good people". You could do this: string.split(/\n+/).map { |line| line.split(/\s+/, 2) }.to_h #=> {"Now"=>"is the time", "for"=>"all good people"}. This uses the form of String#split that takes a "limit" argument. Note that if another line begins with (say) "for", h["for"] will be overwritten.

lightyrs · Accepted Answer · 2019-05-06 00:36:06Z

7

I like to use this as a pretty generic method for handling newlines and returns:

lines = string.split(/\n+|\r+/).reject(&:empty?)

answered May 6, 2019 at 0:36

lightyrs

2,8392 gold badges29 silver badges35 bronze badges

Comments

Arup Rakshit · Accepted Answer · 2014-06-23 16:46:25Z

5

string = "P07091 MMCNEFFEG

P06870 IVGGWECEQHS

SP0A8M0 VVPVADVLQGR

P01019 VIHNESTCEQ"

Using CSV::parse

require 'csv'

CSV.parse(string).flatten
# => ["P07091 MMCNEFFEG", "P06870 IVGGWECEQHS", "SP0A8M0 VVPVADVLQGR", "P01019 VIHNESTCEQ"]

Another way using String#each_line :-

ar = []
string.each_line { |line| ar << line.strip unless line == "\n" }
ar # => ["P07091 MMCNEFFEG", "P06870 IVGGWECEQHS", "SP0A8M0 VVPVADVLQGR", "P01019 VIHNESTCEQ"]

edited Jun 23, 2014 at 16:46

answered Jun 23, 2014 at 16:41

Arup Rakshit

119k30 gold badges270 silver badges328 bronze badges

Comments

Chris Bloom · Accepted Answer · 2014-06-23 17:00:49Z

4

Building off of @Martin's answer:

lines = string.split("\n").reject(&:blank?)

That'll give you only the lines that are valued

answered Jun 23, 2014 at 17:00

Chris Bloom

3,5541 gold badge35 silver badges49 bronze badges

3 Comments

Arup Rakshit Over a year ago

In Ruby we don't have blank? That is Rails.

Cary Swoveland Over a year ago

...but you can use empty?.

Arup Rakshit Over a year ago

@CarySwoveland #split with regex will work fine.. but not string. None of the #split method without regex argument is correct..

Martin · Accepted Answer · 2014-06-23 16:37:31Z

1

Split can take a parameter in the form of the character to use to split, so you can do:

lines = string.split("\n")

answered Jun 23, 2014 at 16:37

Martin

7,7531 gold badge23 silver badges25 bronze badges

1 Comment

Cary Swoveland Over a year ago

Dealing with the blank lines is the central issue of the question.

score 0 · Accepted Answer · 2021-12-28 19:20:27Z

I think it should be noted that in some situations, line breaks can include not only newlines (\n) but also carriage returns (\r) and that there could potentially be any combination or quantity thereof. Let's take the following string for example:

str = "Useful Line 1  ....
  Useful Line 2
           
Useful Line 3
  Useful Line 4...                                           \n
Useful Line 5\r      \n
  Useful Line 6\n\r
Useful Line 7\n\r\n\r
  Useful Line 8       \r\n\r\n
Useful Line 9\r\r\r  Useful Line 10\n\n\n\n\nUseful Line 11        \r  Useful Line 12"

To deal with all instances of \n and \r, I would do the following to replace all instances of \r with \n using gsub, and then I would combine all consecutive instances of \n using squeeze(arg):

str.gsub("\r", "\n").squeeze("\n")

which would result in :

#=>
"Useful Line 1  ....
  Useful Line 2
           
Useful Line 3
  Useful Line 4...                                           
Useful Line 5
      
  Useful Line 6
Useful Line 7
  Useful Line 8       
Useful Line 9
  Useful Line 10
Useful Line 11        
  Useful Line 12"

...which brings me to our next issue. Sometimes those extra line breaks contain unwanted whitespace and not truly blank or empty lines. To deal with not only line breaks but also unwanted empty lines, I would add the each_line, reject, and strip method like so:

str.gsub("\r", "\n").squeeze("\n").each_line.reject{|x| x.strip == ""}.join

which would result in the desired string:

#=>
Useful Line 1  ....
  Useful Line 2
Useful Line 3
  Useful Line 4...                                           
Useful Line 5
  Useful Line 6
Useful Line 7
  Useful Line 8       
Usefule Line 9
  Useful Line 10
Useful Line 11        
  Useful Line 12

Now more specifically to the OP, we could then simply use split("\n") to finish it all off (as was already mentioned by others):

str.gsub("\r", "\n").squeeze("\n").each_line.reject{|x| x.strip == ""}.join.split("\n")

or we could simply skip straight to the desired array by replacing each_line with map and leaving off the unnecessary join like so:

str.gsub("\r", "\n").squeeze("\n").split("\n").map.reject{|x| x.strip == ""}

both of which would result in:

#=>    
["Useful Line 1  ....", "  Useful Line 2", "Useful Line 3", "  Useful Line 4...                                           ", "Useful Line 5", "  Useful Line 6", "Useful Line 7", "  Useful Line 8       ", "Usefule Line 9", "  Useful Line 10", "Useful Line 11        ", "  Useful Line 12"]

NOTE: You may also want to strip off leading and trailing whitespace from each line in which case we could replace .join.split("\n") with .map(&:strip) like so:

str.gsub("\r", "\n").squeeze("\n").each_line.reject{|x| x.strip == ""}.map(&:strip)

or

str.gsub("\r", "\n").squeeze("\n").split("\n").map.reject{|x| x.strip == ""}.map(&:strip)

which would both result in:

#=>
["Useful Line 1  ....", "Useful Line 2", "Useful Line 3", "Useful Line 4...", "Useful Line 5", "Useful Line 6", "Useful Line 7", "Useful Line 8", "Usefule Line 9", "Useful Line 10", "Useful Line 11", "Useful Line 12"]

Collectives™ on Stack Overflow

Converting a multi line string to an array in Ruby using line breaks as delimiters

6 Answers 6

2 Comments

Comments

Comments

3 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

2 Comments

Comments

Comments

3 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related