5

enter image description here

As seen in the picture. 50 000 000 records only take 404M memory, why? Since one record takes 83 Bytes, 50 000 000 records should take 3967M memory.

>>> import sys
>>> a=[]
>>> for it in range(5*10**7):a.append("miJ8ZNFG9iFqiQQohvyTWwqsij2rJCiZ7v"+str(it))
... 
>>> print(sys.getsizeof(a)/1024**2)
404.4306411743164
>>> print(sys.getsizeof("miJ8ZNFG9iFqiQQohvyTWwqsij2rJCiZ7v"))
83
>>> print(83*5*10**7/1024**2)
3957.7484130859375
>>> 
1
  • Someone else had a similar query as you did but went a bit further, so this is more of a related thread: Deep version of sys.getsizeof Commented Jan 17, 2019 at 3:08

1 Answer 1

5

sys.getsizeof only reports the cost of the list itself, not its contents. So you're seeing the cost of storing the list object header, plus (a little over) 50M pointers; you're likely on a 64 bit (eight byte) pointer system, thus storage for 50M pointers is ~400 MB. Getting the true size would require sys.getsizeof to be called for each object, each object's __dict__ (if applicable), etc., recursively, and it won't be 100% accurate since some of the objects (e.g. small ints) are likely shared; this is not a rabbit hole you want to go down.

Sign up to request clarification or add additional context in comments.

2 Comments

yes. 64bit. I care about the actual storage(obtain list contents). so it mean if a list that have 50 000 000 records. actually all related storage is 3957 + 404 M?
@purplecity: Well, your records are 83 bytes, plus the length of the stringified int you're adding on, so a bit larger than that, more like 4339 + 404 M, but yes, that's roughly correct.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.