file handling in python

Question

Thanks in advance. I have written a program which works for small files. But that doesn't work for files of 1 GB. Please tell me is there is any way to handle big file. Here is the code.

fh=open('reg.fa','r')
c=fh.readlines()
fh.close() 
s=''  
for i in range(0,(len(c))):  
    s=s+c[i]  
    lines=s.split('\n')
    for line in s:
            s=s.replace('\n','')
s=s.replace('\n','')          
print s

You should probably add more explanation. If reg.fa is to big for memory then I suspect s would also be too large. While it is easy enough to iterate in Python over some units you are still going to be constrained by memory. I don't think you want to read a line at a time and write it back out that would take a while. I think you will need to write to a new file because as you append your string you will be messing with the pointer. — PyNEwbie
– PyNEwbie, Commented May 6, 2009 at 19:44
You also don't need to specify range(0,len(c))) Until you get comfortable with the various iterators you can always do something like i in range(len(c)): — PyNEwbie
– PyNEwbie, Commented May 6, 2009 at 19:55

user25148 · Accepted Answer · 2009-05-06 06:05:40Z

17

The readlines method reads in the entire file. You don't want to do that for a file that is large in relation to your physical memory size.

The fix is to read the file in small chunks, and process those individually. You can, for example, do something like this:

for line in f.xreadlines():
    ... do something with the line

The xreadlines does not return a list of lines, but an iterator, which returns one line at a time, when the for loop calls it. An even simpler way of doing that is:

for line in f:
    ... do something with the line

Depending on what you do, processing the file line-by-line may be easy or hard. I didn't really get what your sample code is trying to do, but it looks like it should be doable to do it by line.

answered May 6, 2009 at 6:05

user25148

Sign up to request clarification or add additional context in comments.

Comments

fforw · Accepted Answer · 2009-05-06 08:41:32Z

7

The script is not working because it reads all lines of the file in advance, making it nescessary to keep the whole file in memory. The easiest way to iterate over all lines in a file is

for line in open("test.txt", "r"):
    # do something with the "line"

answered May 6, 2009 at 8:41

fforw

5,5611 gold badge20 silver badges17 bronze badges

Comments

Michał Niklas · Accepted Answer · 2009-05-06 06:09:12Z

5

With readlines() you read whole file at once, so you use 1 GB of memory. Insted of this try:

f = open(...)
while 1:
   line = f.readline()
   if not line:
     break
   line = line.rstrip()
   ... do something with line
   ... 
f.close()

If all you need is to remove \n then do not do it line by line, but do it with chunks of text:

import sys

f = open('query.txt','r')
while 1:
    part = f.read(1024)
    if not part:
        break
    part = part.replace('\n', '')
    sys.stdout.write(part)

edited May 6, 2009 at 6:09

answered May 6, 2009 at 5:59

Michał Niklas

54.5k19 gold badges76 silver badges125 bronze badges

2 Comments

Cheery Over a year ago

1024 is dumb low buffer size. You should increase it to at least 64KiB. Also it's stupid from python to not use generator in readlines-method.

user25148 Over a year ago

The readlines method was added before Python had generators, and changing it later would have caused existing programs to break. That's the curse of evolving languages.

nosklo · Accepted Answer · 2009-05-06 11:23:06Z

2

Your program is very redundant. Looks like everything you do can be done using these lines:

import sys
for line in open('reg.fa'):
    sys.stdout.write(line.rstrip())

That is enough. This program gives the same result from your original code in the question but is much simpler and clearer. And it can also handle files of any size.

answered May 6, 2009 at 11:23

nosklo

224k58 gold badges300 silver badges299 bronze badges

1 Comment

Miles Over a year ago

Doesn't give exactly the same result: This strips all trailing whitespace on lines (not just the line terminator), and doesn't print a final newline

Charan · Accepted Answer · 2009-06-08 07:41:09Z

0

From your coding it is clear that you want string buffer of single line. As a point of view of coding it is bad that you storethe whole file content in one string buffer. And then you processed your requirement. And code contain too many local variables.

You could have used following chunk of code.

f = open (file_name,mode)

for line in f:

"""

Do the processing 

"""

answered Jun 8, 2009 at 7:41

Charan

Comments

Eric Aya · Accepted Answer · 2017-07-07 09:10:52Z

0

import sys
import os

Use wb+ mode if file is not created, this will create file and also write data!

f = open('f_name.txt','wb+')
while 1:
    part = f.read(1024)
    if not part:
        break
    part = part.replace('\n', '')
    sys.stdout.write(part) 
 f.close()

edited Jul 7, 2017 at 9:10

Eric Aya

70.2k36 gold badges190 silver badges266 bronze badges

answered Jul 7, 2017 at 6:03

Palash Khaire

11 bronze badge

Collectives™ on Stack Overflow

file handling in python

6 Answers 6

Comments

Comments

2 Comments

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Comments

Comments

2 Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related