How to manage memory error in python?

Question

This is the code I have, to count the frequency

import collections
import codecs
import io
from collections import Counter
with io.open('Combine.txt', 'r', encoding='utf8') as infh:
    words =infh.read().split()
    with open('Counts2.txt', 'wb') as f:
        for word, count in Counter(words).most_common(100000000):
            f.write(u'{} {}\n'.format(word, count).encode('utf-8'))

When I try to read a big file( 4 GB) I am getting error

Traceback (most recent call last):
  File "counter.py", line 7, in <module>
    words =infh.read().split()
  File "/usr/lib/python2.7/codecs.py", line 296, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
MemoryError

I am using Ubuntu 12.4, 8 GB RAM Intel Core i7 How to fix this error ? /

usr/lib/python2.7/codecs.py", line 296, in decode
        (result, consumed) = self._buffer_decode(data, self.errors, final)
    MemoryError

To read a file word by word, use a space as the delimiter with the answer to How to read records terminated by custom separator from file in python? — Janne Karila
– Janne Karila, Commented Feb 11, 2014 at 12:50

Michael F · Accepted Answer · 2014-02-11 12:17:10Z

2

This is the pythonic way to process a file line-by-line:

with open(...) as fh:
    for line in fh:
        pass

This will take care of opening and closing the file, including if an exception is raised in the inner block, plus it treats the file object fh as an iterable, which automatically uses buffered I/O and manages memory so you don't have to worry about large files.

answered Feb 11, 2014 at 12:17

Michael F

41.1k6 gold badges95 silver badges127 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Jayanth Koushik Over a year ago

What if all the words are on a single line?

Michael F Over a year ago

It should be trivial to either: a) convert it to one-word-per-line via your shell or b) read from a file in chunks (ie. manually manage memory) and process accordingly.

user2085779 Over a year ago

@MichaelFoukarakis errors is at usr/lib/python2.7/codecs.py", line 296, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) MemoryError

user2814648 · Accepted Answer · 2014-02-11 12:16:26Z

-2

How about readline instead of read()

http://docs.python.org/2/tutorial/inputoutput.html

answered Feb 11, 2014 at 12:16

user2814648

4404 silver badges13 bronze badges

Collectives™ on Stack Overflow

How to manage memory error in python?

2 Answers 2

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related