0

I thought that NumPy arrays always consume less memory than Python lists. However, when I tested this with an empty list, the pure Python list had 56 bytes and the NumPy array was 112 bytes in size. Why?

Jupyter Notebook Example sample related to question showing storage of list and Numpy

3
  • 2
    Welcome to SO! Numpy outpeforms list in both memory size and performance for large arrays. An empty list requires less memory that an empty numpy array because the numpy object has a higher "overhead" the the list pbject. Try comparing the memory size of objects with more elements and you'll see that the numpy array soon becomes much smaller. Commented Jul 23, 2023 at 13:11
  • 1
    sys.getsize is not measuring all of the list memory use. Commented Jul 23, 2023 at 14:13
  • Even without going into data storage models, a numpy array has more information to manage than a Python list. At the very least, numpy has to know the shape (number of dimensions) and data type in addition to the size and location of data. A Python lists does not know the type of its data because it can contain any combination of things, including other lists. Commented Jul 24, 2023 at 14:38

1 Answer 1

3

I reopened this because the duplicate focused on how np.reshape produces a view and changes what getsizeof sees. Here the issue is the size of a list versus an array.

Let me illustrate: (posting an image of this code is not good SO style. We prefer copy-n-paste code )

Your list and array:

In [458]: alist = [1,2,3,4,5,7,'a','b','c','@']
In [459]: alist
Out[459]: [1, 2, 3, 4, 5, 7, 'a', 'b', 'c', '@']
In [460]: arr = np.array(alist)
In [461]: arr
Out[461]: array(['1', '2', '3', '4', '5', '7', 'a', 'b', 'c', '@'], dtype='<U21')

Note the dtype. The array contains strings, not numbers.

In [462]: arr.nbytes
Out[462]: 840
In [463]: import sys
In [464]: sys.getsizeof(arr)
Out[464]: 952

getsizeof gets that 840, plus 112 'overhead'. For regular arrays, getsizeof gives a reasonable number, but really isn't needed.

But for the list:

In [465]: sys.getsizeof(alist)
Out[465]: 136

We can get the 840 bytes by checking the length and dtype:

In [466]: len(arr)
Out[466]: 10
In [467]: 4*21*10
Out[467]: 840

For the list, 'overhead' is 56, and the rest is storage for pointers - 10 of them.

In [468]: sys.getsizeof([])
Out[468]: 56
In [469]: 56+80
Out[469]: 136

Lists can also have memory for 'growth space'. getsizeof does not measure the memory used by the objects pointed to. In this case, the small integers already exist, and don't require any additional memory. The strings take up an extra 50 bytes each. Lists can store objects of various types, including other lists and dicts and arrays, etc. getsizeof tells us nothing about those.

The array could have been given a different dtype, with a reduction in memory:

In [470]: arr1 = np.array(alist,'U1')
In [471]: arr1
Out[471]: array(['1', '2', '3', '4', '5', '7', 'a', 'b', 'c', '@'], dtype='<U1')
In [472]: arr1.nbytes
Out[472]: 40
In [473]: sys.getsizeof(arr1)
Out[473]: 152

In sum, to get anything useful from getsizeof you have to understand how the object/class is stored, and just what that function measures. Neither is a trivial topic for a python beginner. Well, the beginner should learn, soon if not later, how lists and arrays are stored.

Sign up to request clarification or add additional context in comments.

1 Comment

Because the size of a NumPy array also includes the contents of the array, an accurate comparison must take into account the size of all objects recursively contained in the list. Even ignoring arrays, [1] and [[1,2,3,4,5,6,7,8,9]] both have the same size according to getsizeof, because each is a list with a single reference to some other object.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.