19

I would like to turn this string

"P07091 MMCNEFFEG

P06870 IVGGWECEQHS

SP0A8M0 VVPVADVLQGR

P01019 VIHNESTCEQ"

into an array that looks like in ruby.

["P07091 MMCNEFFEG", "P06870 IVGGWECEQHS", "SP0A8M0 VVPVADVLQGR", "P01019 VIHNESTCEQ"]

using split doesn't return what I would like because of the line breaks.

1
  • What split returns? This information may be important. Commented Jun 23, 2014 at 18:45

6 Answers 6

43

This is one way to deal with blank lines:

string.split(/\n+/)

For example,

string = "P07091 MMCNEFFEG

P06870 IVGGWECEQHS

SP0A8M0 VVPVADVLQGR




P01019 VIHNESTCEQ"

string.split(/\n+/)
  #=> ["P07091 MMCNEFFEG", "P06870 IVGGWECEQHS",
  #    "SP0A8M0 VVPVADVLQGR", "P01019 VIHNESTCEQ"]

To accommodate files created under Windows (having line terminators \r\n) replace the regular expression with /(?:\r?\n)+/.

Sign up to request clarification or add additional context in comments.

2 Comments

what if we want to split in such a way that for each line the first word becomes the key and remaining (whatever no of words) becomes value as an array?
Suppose string = "Now is the time\nfor all good people". You could do this: string.split(/\n+/).map { |line| line.split(/\s+/, 2) }.to_h #=> {"Now"=>"is the time", "for"=>"all good people"}. This uses the form of String#split that takes a "limit" argument. Note that if another line begins with (say) "for", h["for"] will be overwritten.
7

I like to use this as a pretty generic method for handling newlines and returns:

lines = string.split(/\n+|\r+/).reject(&:empty?)

Comments

5
string = "P07091 MMCNEFFEG

P06870 IVGGWECEQHS

SP0A8M0 VVPVADVLQGR

P01019 VIHNESTCEQ"

Using CSV::parse

require 'csv'

CSV.parse(string).flatten
# => ["P07091 MMCNEFFEG", "P06870 IVGGWECEQHS", "SP0A8M0 VVPVADVLQGR", "P01019 VIHNESTCEQ"]

Another way using String#each_line :-

ar = []
string.each_line { |line| ar << line.strip unless line == "\n" }
ar # => ["P07091 MMCNEFFEG", "P06870 IVGGWECEQHS", "SP0A8M0 VVPVADVLQGR", "P01019 VIHNESTCEQ"]

Comments

4

Building off of @Martin's answer:

lines = string.split("\n").reject(&:blank?)

That'll give you only the lines that are valued

3 Comments

In Ruby we don't have blank? That is Rails.
...but you can use empty?.
@CarySwoveland #split with regex will work fine.. but not string. None of the #split method without regex argument is correct..
1

Split can take a parameter in the form of the character to use to split, so you can do:

lines = string.split("\n")

1 Comment

Dealing with the blank lines is the central issue of the question.
0

I think it should be noted that in some situations, line breaks can include not only newlines (\n) but also carriage returns (\r) and that there could potentially be any combination or quantity thereof. Let's take the following string for example:

str = "Useful Line 1  ....
  Useful Line 2
           
Useful Line 3
  Useful Line 4...                                           \n
Useful Line 5\r      \n
  Useful Line 6\n\r
Useful Line 7\n\r\n\r
  Useful Line 8       \r\n\r\n
Useful Line 9\r\r\r  Useful Line 10\n\n\n\n\nUseful Line 11        \r  Useful Line 12"

To deal with all instances of \n and \r, I would do the following to replace all instances of \r with \n using gsub, and then I would combine all consecutive instances of \n using squeeze(arg):

str.gsub("\r", "\n").squeeze("\n")

which would result in :

#=>
"Useful Line 1  ....
  Useful Line 2
           
Useful Line 3
  Useful Line 4...                                           
Useful Line 5
      
  Useful Line 6
Useful Line 7
  Useful Line 8       
Useful Line 9
  Useful Line 10
Useful Line 11        
  Useful Line 12"

...which brings me to our next issue. Sometimes those extra line breaks contain unwanted whitespace and not truly blank or empty lines. To deal with not only line breaks but also unwanted empty lines, I would add the each_line, reject, and strip method like so:

str.gsub("\r", "\n").squeeze("\n").each_line.reject{|x| x.strip == ""}.join 

which would result in the desired string:

#=>
Useful Line 1  ....
  Useful Line 2
Useful Line 3
  Useful Line 4...                                           
Useful Line 5
  Useful Line 6
Useful Line 7
  Useful Line 8       
Usefule Line 9
  Useful Line 10
Useful Line 11        
  Useful Line 12

Now more specifically to the OP, we could then simply use split("\n") to finish it all off (as was already mentioned by others):

str.gsub("\r", "\n").squeeze("\n").each_line.reject{|x| x.strip == ""}.join.split("\n")

or we could simply skip straight to the desired array by replacing each_line with map and leaving off the unnecessary join like so:

str.gsub("\r", "\n").squeeze("\n").split("\n").map.reject{|x| x.strip == ""}

both of which would result in:

#=>    
["Useful Line 1  ....", "  Useful Line 2", "Useful Line 3", "  Useful Line 4...                                           ", "Useful Line 5", "  Useful Line 6", "Useful Line 7", "  Useful Line 8       ", "Usefule Line 9", "  Useful Line 10", "Useful Line 11        ", "  Useful Line 12"]

NOTE: You may also want to strip off leading and trailing whitespace from each line in which case we could replace .join.split("\n") with .map(&:strip) like so:

str.gsub("\r", "\n").squeeze("\n").each_line.reject{|x| x.strip == ""}.map(&:strip)

or

str.gsub("\r", "\n").squeeze("\n").split("\n").map.reject{|x| x.strip == ""}.map(&:strip)

which would both result in:

#=>
["Useful Line 1  ....", "Useful Line 2", "Useful Line 3", "Useful Line 4...", "Useful Line 5", "Useful Line 6", "Useful Line 7", "Useful Line 8", "Usefule Line 9", "Useful Line 10", "Useful Line 11", "Useful Line 12"]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.