1

I have a use case, and I simplify it to following question:

import numpy as np

def get_matrix(i): # get a matrix N * M
    return (
        (i, i + 1, i + 1.2),
        (i + 1, i / 2, i * 3.2),
        (i / 3, i * 2, i / 4),
        (i / 5, i * 2.1, i + 2.2),
    )

K = 10000
# build a n-d array K * N * M
arr = np.array(
    tuple(get_matrix(i) for i in range(K)), 
    np.float32,
)

However, when I want to get K*N*M numpy array, I need to create a temporary tuple with shape K*N*M. Only when numpy array has been built, the tuple can be garbage collected. Therefore above construction has extra space O(K*N*M).

If I can create the numpy array from iterator (get_matrix(i) for i in range(K)), then every matrix N*M can be garbage collected, when it has been used. Therefore the extra space is O(N*M).

I found there is a method numpy.fromiter(), but I don't know how to write the dtype, since there is a similar example in the last.

import numpy as np

K = 10000
# build a n-d array K * N * M
arr = np.fromiter(
    (get_matrix(i) for i in range(K)), 
    dtype=np.float32, # there is error
)
0

1 Answer 1

1

Ah, so this is a new feature for np.fromiter. Just going by the example in the docs, the following worked:

K = 10000
N = 4
M = 3

# build a n-d array K * N * M
arr = np.fromiter(
    (get_matrix(i) for i in range(K)), 
    dtype=np.dtype((np.float32, (N, M))),
    count=K
)

Note, I used the count argument for good measure, but it works without it.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.