2

What is the most efficient way of computing mean and std of a Python list containing NumPy arrays of different sizes? For example:

l = [np.array([1,2,3]), np.array([4,5,6,7]), np.array([8])]

Using loop and manually adding it up is a valid solution but I am looking for something more sophisticated.

4
  • 2
    I am not sure, what you mean by sophisticated.But personally I prefer to do l_mean = [i.mean() for i in l] and I_std = [i.std() for i in l] Commented Sep 2, 2019 at 9:22
  • Sorry, maybe I wasn't perfectly clear but I want to compute mean and std of the whole list and not each NumPy array independently. Commented Sep 2, 2019 at 9:25
  • hstack can join them into one array. If you don't like that, show use how you'd do the loop. Commented Sep 2, 2019 at 10:58
  • The full requirements shouldn't come out in bits and dribbles. If you want the mean of the whole list, and the arrays are really 2d, then say so. The key to gaining numpy efficiency is to create a numeric numpy array. With a list of arrays that can be tricky depending on how the arrays vary in shape. Commented Sep 2, 2019 at 15:51

3 Answers 3

2

This seems to work:

np.mean( map( np.mean, a ) )

"Look, ma, no loops!!" =)

Another way would be:

np.mean( np.array( a ).flatten() )
Sign up to request clarification or add additional context in comments.

8 Comments

Could you please explain this a little bit. My problem is a tad more complex. I have a list of 2D arrays where the first dimension vary and the second one is always 10. I want to compute mean and std over all 2D arrays, for each index in the second dimension.
@carobnodrvo map() takes the second argument a (your list of arrays) and applies the first argument np.mean() calculating the average of every array and putting them in the list. another np.mean() takes care about calculating the average of the resulting list.
I see, but the same thing couldn't be applied for std, right?
@carobnodrvo for your complex problem, please, take a look at the documentation of np.mean(), because I could be easily mistaken. basically, you have to apply np.mean( a, axis = 0 ) to get an average over the first dimension. the rest is up to you, one simple loop won't hurt anyone.
@carobnodrvo and I'm pretty sure np.std() takes axis argument as well.
|
1
In [208]: l = [np.array([1,2,3]), np.array([4,5,6,7]), np.array([8])]  

Making an array from l doesn't do much for us, since the arrays differ in shape:

In [209]: np.array(l)                                                                                        
Out[209]: array([array([1, 2, 3]), array([4, 5, 6, 7]), array([8])], dtype=object)

Out[209] is 1d object dtype. It can't be flattened any further.

hstack is useful, turning the list of arrays into one array:

In [210]: np.hstack(l)                                                                                       
Out[210]: array([1, 2, 3, 4, 5, 6, 7, 8])
In [211]: np.mean(_)                                                                                         
Out[211]: 4.5

If the list contains 2d arrays as revealed in a comment:

In [212]: ll = [np.ones((2,4)), np.zeros((3,4)), np.ones((1,4))*2]                                           
In [213]: ll                                                                                                 
Out[213]: 
[array([[1., 1., 1., 1.],
        [1., 1., 1., 1.]]), array([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]]), array([[2., 2., 2., 2.]])]
In [214]: np.vstack(ll)                                                                                      
Out[214]: 
array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [2., 2., 2., 2.]])
In [215]: np.mean(_, axis=0)                                                                                 
Out[215]: array([0.66666667, 0.66666667, 0.66666667, 0.66666667])

np.concatenate(..., axis=0) would work for both cases.

Comments

0

Method-1 You can use itertools:-

import itertools
l = [np.array([1,2,3]), np.array([4,5,6,7]), np.array([8])]
new_l = list(itertools.chain(*l))
print(new_l)
print(f"The mean is:\t{np.mean(new_l)} ") 

Output

[1, 2, 3, 4, 5, 6, 7, 8]
The mean is:    4.5 

Method-2 But I think you should use basic for loop:-

l = [np.array([1,2,3]), np.array([4,5,6,7]), np.array([8])]
new_l = [var for my_list in l for var in my_list]
np.mean(new_l)

3 Comments

OP asked to avoid loops and you put the double loop in method 2
@lenik yes, I knew it. That's why it is in method-2(extra method).
itertools.chain flattens a list. But the result is still a list, which np.mean has to convert to an array.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.