Compute mean from list of NumPy array of different sizes

Question

What is the most efficient way of computing mean and std of a Python list containing NumPy arrays of different sizes? For example:

l = [np.array([1,2,3]), np.array([4,5,6,7]), np.array([8])]

Using loop and manually adding it up is a valid solution but I am looking for something more sophisticated.

I am not sure, what you mean by sophisticated.But personally I prefer to do l_mean = [i.mean() for i in l] and I_std = [i.std() for i in l] — Mari
– Mari, Commented Sep 2, 2019 at 9:22
Sorry, maybe I wasn't perfectly clear but I want to compute mean and std of the whole list and not each NumPy array independently. — carobnodrvo
– carobnodrvo, Commented Sep 2, 2019 at 9:25
hstack can join them into one array. If you don't like that, show use how you'd do the loop. — hpaulj
– hpaulj, Commented Sep 2, 2019 at 10:58
The full requirements shouldn't come out in bits and dribbles. If you want the mean of the whole list, and the arrays are really 2d, then say so. The key to gaining numpy efficiency is to create a numeric numpy array. With a list of arrays that can be tricky depending on how the arrays vary in shape. — hpaulj
– hpaulj, Commented Sep 2, 2019 at 15:51

lenik · Accepted Answer · 2019-09-02 09:39:59Z

2

This seems to work:

np.mean( map( np.mean, a ) )

"Look, ma, no loops!!" =)

Another way would be:

np.mean( np.array( a ).flatten() )

edited Sep 2, 2019 at 9:39

answered Sep 2, 2019 at 9:20

lenik

23.6k4 gold badges38 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

carobnodrvo Over a year ago

Could you please explain this a little bit. My problem is a tad more complex. I have a list of 2D arrays where the first dimension vary and the second one is always 10. I want to compute mean and std over all 2D arrays, for each index in the second dimension.

lenik Over a year ago

@carobnodrvo map() takes the second argument a (your list of arrays) and applies the first argument np.mean() calculating the average of every array and putting them in the list. another np.mean() takes care about calculating the average of the resulting list.

carobnodrvo Over a year ago

I see, but the same thing couldn't be applied for std, right?

lenik Over a year ago

@carobnodrvo for your complex problem, please, take a look at the documentation of np.mean(), because I could be easily mistaken. basically, you have to apply np.mean( a, axis = 0 ) to get an average over the first dimension. the rest is up to you, one simple loop won't hurt anyone.

lenik Over a year ago

@carobnodrvo and I'm pretty sure np.std() takes axis argument as well.

|

hpaulj · Accepted Answer · 2019-09-03 03:10:26Z

In [208]: l = [np.array([1,2,3]), np.array([4,5,6,7]), np.array([8])]

Making an array from l doesn't do much for us, since the arrays differ in shape:

In [209]: np.array(l)                                                                                        
Out[209]: array([array([1, 2, 3]), array([4, 5, 6, 7]), array([8])], dtype=object)

Out[209] is 1d object dtype. It can't be flattened any further.

hstack is useful, turning the list of arrays into one array:

In [210]: np.hstack(l)                                                                                       
Out[210]: array([1, 2, 3, 4, 5, 6, 7, 8])
In [211]: np.mean(_)                                                                                         
Out[211]: 4.5

If the list contains 2d arrays as revealed in a comment:

In [212]: ll = [np.ones((2,4)), np.zeros((3,4)), np.ones((1,4))*2]                                           
In [213]: ll                                                                                                 
Out[213]: 
[array([[1., 1., 1., 1.],
        [1., 1., 1., 1.]]), array([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]]), array([[2., 2., 2., 2.]])]
In [214]: np.vstack(ll)                                                                                      
Out[214]: 
array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [2., 2., 2., 2.]])
In [215]: np.mean(_, axis=0)                                                                                 
Out[215]: array([0.66666667, 0.66666667, 0.66666667, 0.66666667])

np.concatenate(..., axis=0) would work for both cases.

Rahul charan · Accepted Answer · 2019-09-02 10:42:01Z

0

Method-1 You can use itertools:-

import itertools
l = [np.array([1,2,3]), np.array([4,5,6,7]), np.array([8])]
new_l = list(itertools.chain(*l))
print(new_l)
print(f"The mean is:\t{np.mean(new_l)} ")

Output

[1, 2, 3, 4, 5, 6, 7, 8]
The mean is:    4.5

Method-2 But I think you should use basic for loop:-

l = [np.array([1,2,3]), np.array([4,5,6,7]), np.array([8])]
new_l = [var for my_list in l for var in my_list]
np.mean(new_l)

answered Sep 2, 2019 at 10:42

Rahul charan

8559 silver badges17 bronze badges

3 Comments

lenik Over a year ago

OP asked to avoid loops and you put the double loop in method 2

Rahul charan Over a year ago

@lenik yes, I knew it. That's why it is in method-2(extra method).

hpaulj Over a year ago

itertools.chain flattens a list. But the result is still a list, which np.mean has to convert to an array.

Collectives™ on Stack Overflow

Compute mean from list of NumPy array of different sizes

3 Answers 3

8 Comments

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

8 Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related