3

I have a big text file that contains - among others- lines like these:

"X" : "452345230"

I want to find all lines that contain "X" , and take just the number (without the quotation marks), and then output the numbers in another file, in this fashion:

452349532

234523452

213412411

219456433

etc.

What I did so far is this:

myfile = File.open("myfile.txt")
x = [] 
myfile.grep(/"X"/) {|line|
   x << line.match( /"(\d{9})/ ).values_at( 1 )[0]
   puts x
   File.open("output.txt", 'w') {|f| f.write(x) }
}

it works, but the list it produces is of this form:

["23419230", "2349345234" , ... ]

How do I output it like I showed before, just numbers and each number in a line?

Thanks.

3 Answers 3

5

Here's a solution that doesn't leave files open:

File.open("output.txt", 'w') do |output|
    File.open("myfile.txt").each do |line|
        output.puts line[/\d{9}/] if line[/"X"/]
    end
end
Sign up to request clarification or add additional context in comments.

2 Comments

As usual, the least-buggy code is often very beautiful. Well written. Thanks.
The inner loop can be written a bit more succinctly File.foreach("myfile.txt").select{ |l| l[/"X"/] }.each { |line| output.puts line[/\d+/] }
2

I couldn't reproduce what you saw:

$ cat myfile.txt 
"X" : "452345230"
"X" : "452345231"
"X" : "452345232"
"X" : "452345233"
$ ./scanner.rb 
452345230
452345230
452345231
452345230
452345231
452345232
452345230
452345231
452345232
452345233
$ cat output.txt 
452345230452345231452345232452345233$ 

However, I did notice that your application is incredibly wasteful and probably not doing what you expect: You open output.txt, write some content to it, then close it again. The next time it is opened in the loop, it is overwritten. If your file is 1000 lines long, this won't be so bad, you're only making 1000 files. If your file is 1,000,000 lines long, this is going to represent a pretty horrible performance penalty as you create a file, write into it, and then delete it again, one million times. Oops.

I re-wrote your tool a little bit:

$ cat scanner.rb 
#!/usr/bin/ruby -w

myfile = File.open("myfile.txt")
output = File.open("output.txt", 'w')
myfile.grep(/"X"/) {|line|
   x = line.match( /"(\d{9})/ ).values_at( 1 )[0]
   puts x
   output.write(x + "\n")
}

This opens each file exactly onces, writes each new line one at a time, and then lets them both be closed when the application quits. Depending upon if this is a small portion of your application or the entire thing, this might be alright. (If this is a small portion of the program, then definitely close the files when you're done with them.)

This might still be wasteful for one million matched lines -- those writes are almost certainly handed straight to the system call write(2), which will involve some overhead.

How many of these will you be running? Millions? Billions? If this needs more refinement feel free to ask...

6 Comments

It's ok but you've left both your files open :(
They'd close as soon as Ruby exited.
@pguardiario: indeed, but I don't know what else his script is doing. I just fixed the bits that are here that needed to be fixed. Perhaps he needs these file descriptors for something else later in the program? Perhaps this is it? Either way, leaving them open is fine. If the script continues but these files are finished, then indeed, they should be closed before continuing. But that portion of the script is left unstated in the question. Note my parenthetical statement: (If this is a small portion of the program, then definitely close the files when you're done with them.)
@sarnold and Tin Man - That's true of course, it just seems better to me they not be left open.
@pguardiario, I agree 100% that files shouldn't be left open, and it is recommended in Ruby to use the block forms so the files are closed automatically. But, realistically, in a small app we can get away with it. You just don't want to open a bunch of them and leave them open for the life of a long-running app.
|
2

Solution:

myfile = File.open("myfile.txt")

File.open("output.txt", 'w') do |output|
  content = myfile.lines.map { |line| line.scan(/^"X".*(\d{9})/) }.flatten.join("\n")

  output.write(content)
end

Edited: I updated the code reducing it a bit. If the example above seems complicated, you can also grab the data you want with the following statement (could be a little bit clear of what's happening):

content = myfile.lines.select { |line| line =~ /"X"/ }.map { |line| line.scan(/\d{9}/) }.join("\n")

1 Comment

This approach is not so great since he says it's a big file and it read the entire contents into memory.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.