9

Python (and spyder) return a MemoryError when I load a JSON file which is 500Mo large.

But my computer have a 32Go RAM and the "memory" displayed by spyder go from 15% to 19% when I try to load it! It seems that I sould have much more space...

Something I didn't think of?

4
  • which OS are you using? Commented Nov 3, 2016 at 11:10
  • Windows 10, and I use Spyder to code and execute. Commented Nov 3, 2016 at 11:11
  • If you are using x32 python you are limited to 4GB of memory per process. You probably hitting this limit. Commented Nov 3, 2016 at 11:13
  • Really?! I thought it was a question of RAM... so may be I could go 8GB by using x64 python? (Is there a way to go to much larger memory per process?) Commented Nov 3, 2016 at 11:14

2 Answers 2

18

500MB of JSON data does not result in 500MB of memory usage. It will result in a multiple of that. Exactly by what factor depends on the data, but a factor of 10 - 25 is not uncommon.

For example, the following simple JSON string of 14 characters (bytes on disk) results in a Python object is almost 25 times larger (Python 3.6b3):

>>> import json
>>> from sys import getsizeof
>>> j = '{"foo": "bar"}'
>>> len(j)
14
>>> p = json.loads(j)
>>> getsizeof(p) + sum(getsizeof(k) + getsizeof(v) for k, v in p.items())
344
>>> 344 / 14
24.571428571428573

That's because Python objects require some overhead; instances track the number of references to them, what type they are, and their attributes (if the type supports attributes) or their contents (in the case of containers).

If you are using the json built-in library to load that file, it'll have to build larger and larger objects from the contents as they are parsed, and at some point your OS will refuse to provide more memory. That won't be at 32GB, because there is a limit per process how much memory can be used, so more likely to be at 4GB. At that point all those objects already created are freed again, so in the end the actual memory use doesn't have to have changed that much.

The solution is to either break up that large JSON file into smaller subsets, or to use an event-driven JSON parser like ijson.

An event-driven JSON parser doesn't create Python objects for the whole file, only for the currently parsed item, and notifies your code for each item it created with an event (like 'starting an array, here is a string, now starting a mapping, this is the end of the mapping, etc.). You can then decide what data you need and keep, and what to ignore. Anything you ignore is discarded again and memory use is kept low.

Sign up to request clarification or add additional context in comments.

12 Comments

Percect answer, as always from you ;) For the other, I add a link to an explanation of how ijson works: stackoverflow.com/questions/40330820/…
And is there a way to change the limit memory per process?
@AgapeGal'lo: I'm not sure you can, it looks to me that on Windows, you can only decrease the limits: Set Windows process (or user) memory limit. I strongly doubt that that is going to work out for you.
But, as I have a win10, 64 bytes, I sould be limited to 2TB. I tested the same program under umbuntu, and it perfectly works!! I tried to uninstall and re-install pythonxy, but nothing to do, it does not want to work...
|
1

So, I will explain how I finally solved this problem. The first answer will work. But you have to know that loading elements one per one with ijson will be very long... and by the end, you do not have the loaded file.

So, the important information is that windows limit your memory per process to 2 or 4 GB, depending on wich windows you use (32 or 64). If you use pythonxy, that will be 2 GB (it only exists in 32). Anyway, in both way, that's very very low!

I solved this problem by installing a virtual Linux in my windows, and it works. Here are the main step to do so:

  1. Install Virtual Box
  2. Install Ubuntu (for exemple)
  3. Install python for scientist on your computer, like SciPy
  4. Create a share file between the 2 "computers" (you will find tutorial on google)
  5. Execute your code on your ubuntu "computer": it sould work ;)

NB: Do not forget to allow sufficient RAM and memory to you virtual computer.

This works for me. I don't have anymore this "memory error" problem.

I post here this asnwer from there.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.