I have a use case, and I simplify it to following question:
import numpy as np
def get_matrix(i): # get a matrix N * M
return (
(i, i + 1, i + 1.2),
(i + 1, i / 2, i * 3.2),
(i / 3, i * 2, i / 4),
(i / 5, i * 2.1, i + 2.2),
)
K = 10000
# build a n-d array K * N * M
arr = np.array(
tuple(get_matrix(i) for i in range(K)),
np.float32,
)
However, when I want to get K*N*M numpy array, I need to create a temporary tuple with shape K*N*M. Only when numpy array has been built, the tuple can be garbage collected. Therefore above construction has extra space O(K*N*M).
If I can create the numpy array from iterator (get_matrix(i) for i in range(K)), then every matrix N*M can be garbage collected, when it has been used. Therefore the extra space is O(N*M).
I found there is a method numpy.fromiter(), but I don't know how to write the dtype, since there is a similar example in the last.
import numpy as np
K = 10000
# build a n-d array K * N * M
arr = np.fromiter(
(get_matrix(i) for i in range(K)),
dtype=np.float32, # there is error
)