2

I'm using the IO.foreach loop to find a string using regular expressions. I want to append the next block (next line) to the file_names list. How can I do that?

file_names = [""]                                                                                                                                                                                                                           
IO.foreach("a.txt") { |block|                                                                                                                         
  if block =~ /^file_names*/                                                                                              
    dir = # get the next block                                                                                                                                                                                           
    file_names.append(dir)                                                                                      
  end                                                                                                                    

}     


Actually my input looks like this:

file_names[174]:                                                                                                         
           name: "vector"                                                                                                
      dir_index: 1                                                                                                       
       mod_time: 0x00000000                                                                                              
         length: 0x00000000                                                                                              
file_names[175]:                                                                                                         
           name: "stl_bvector.h"                                                                                         
      dir_index: 2                                                                                                       
       mod_time: 0x00000000                                                                                              
         length: 0x00000000    

I have a list of file_names, and I want to capture each of the name, dir_index, mod_time and length properties and put them into the files_names array index according to the file_names index in the text.

8
  • Can you be a little more specific about your input format? Just the next line or until another delimiter? As a note /file_names*/ matches "file_name", "some kind of file_name" and "file_namessssssssssss". Commented Oct 23, 2019 at 17:22
  • You're right. It should be /^file_names*/ instead. Check out my edit for an elaboration. Commented Oct 23, 2019 at 17:27
  • In Ruby it's actually /\Afile_names/ where \A anchors to the beginning of string, ^ at the beginning of a line. * means "zero or more of character" which doesn't appear to be what you mean. You seem to be using it as "more stuff", which it isn't. Commented Oct 23, 2019 at 17:35
  • Another thing to node here is that file_names = [""] initializes your array with an empty string already in it. That's probably a mistake, as it should be an empty array: file_names = [ ]. You can append as necessary without the empty string. Commented Oct 23, 2019 at 17:42
  • 1
    Everything in Ruby, like most programming languages, indexes from zero. You should just accept that and either add or subtract one as necessary when manipulating that array. Sticking in a placeholder causes all kinds of trouble from an architectural perspective. Commented Oct 23, 2019 at 17:49

2 Answers 2

1

You can use #each_cons to get the value of the next 4 rows from the text file:

files = IO.foreach("text.txt").each_cons(5).with_object([]) do |block, o|
  if block[0] =~ /file_names.*/
    o << block[1..4].map{|e| e.split(':')[1]}
  end
end

puts files
#=> "vector"                                                                                                
#    1                                                                                                       
#    0x00000000                                                                                              
#    0x00000000                                                                                              
#    "stl_bvector.h"                                                                                         
#    2                                                                                                       
#    0x00000000                                                                                              
#    0x00000000 

Keep in mind that the files array contains subarrays of 4 elements. If the : symbol occurs later in the lines, you could replace the third line of my code with this:

o << block[1..4].map{ |e| e.partition(':').last.strip}

I also added #strip in case you want to remove the whitespaces around the values. With this line changed, the actual array will look something like this:

p files
#=>[["\"vector\"", "1", "0x00000000", "0x00000000"], ["\"stl_bvector.h\"", "2", "0x00000000", "0x00000000"]]

(the values don't contain the \ escape character, that's just the way #p shows it).

Another option, if you know the pattern 1 filename, 4 values will be persistent through the entire text file and the textfile always starts with a filename, you can replace #each_cons with #each_slice and remove the regex completely, this will also speed up the entire process:

IO.foreach("text.txt").each_slice(5).with_object([]) do |block, o|
  o << block[1..4].map{ |e| e.partition(':').last.strip }
end
Sign up to request clarification or add additional context in comments.

Comments

1

It's actually pretty easy to carve up a series of lines based on a pattern using slice_before:

File.readlines("data.txt").slice_before(/\Afile_names/)

Now you have an array of arrays that looks like:

[
  [
    "file_names[174]:\n",
    "           name: \"vector\"\n",
    "      dir_index: 1\n",
    "       mod_time: 0x00000000\n",
    "         length: 0x00000000\n"
  ],
  [
    "file_names[175]:\n",
    "           name: \"stl_bvector.h\"\n",
    "      dir_index: 2\n",
    "       mod_time: 0x00000000\n",
    "         length: 0x00000000"
  ]
]

Each of these groups could be transformed further, like for example into a Ruby Hash using those keys.

5 Comments

I suggest usingIO::foreach, which, without a block, returns an enumerator, to avoid the creation of a temporary array.
@CarySwoveland There's also File.each_line which might read better.
I'm not aware of File.each_line. A Rail's method?
@CarySwoveland It's actually IO#each_line which File inherits from. It may in fact be related to foreach, but that method is just awkwardly named.
I had looked in IO, but for a class method, not an instance method. You'd have to write something like f = File.new("testfile"); f.each_line {...}; f.close.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.