Calculate mean of lists in a DataFrame ignoring empty ones

Question

I have a dataframe that looks like this:

               A                    
    1  [67.0, 51.0, 23.0, 49.0, 3.0]    
    2  0
    3  [595.0]
    4  0
    5  [446.0, 564.0, 402.0]
    6  0 
    7  0

I would like to find the mean for each list ignoring the zeros. I want to get something like:

               A                     Mean
1  [67.0, 51.0, 23.0, 49.0, 3.0]     38.6
2  0                                    0
3  [595.0]                          595.0
4  0                                    0
5  [446.0, 564.0, 402.0]            470.7
6  0                                    0 
7  0                                    0

I tried many possible solutions listed here and none of them worked. This is what I tried so far:

df['Mean'] = df.A.apply(lambda x: mean(x))

which gives me this error

TypeError: 'int' object is not iterable

Also this

df['Mean'] = df['A'].mean(axis=1)

ValueError: No axis named 1 for object type

Tried these as well with no luck:

a = np.array( df['A'].tolist())
a.mean(axis=1)

mean(d for d in a if d)

Is there something else I can try that would give me the expected outcome? Thanks for your help.

what is the dtype() of A ??

anky
– anky

2019-05-01 11:17:23 +00:00
Commented May 1, 2019 at 11:17 — anky
– anky, Commented May 1, 2019 at 11:17

iamklaus · Accepted Answer · 2019-05-01 11:21:35Z

1

okay this works for me

                A                    
1   [67.0, 51.0, 23.0, 49.0, 3.0]    
2                               0
3                         [595.0]
4                               0
5           [446.0, 564.0, 402.0]
6                               0 
7                               0

using np.mean

data['A'].apply(lambda x: np.mean(eval(x)))

Output

                A                            Mean
1   [67.0, 51.0, 23.0, 49.0, 3.0]       38.600000
2                               0       0.000000
3                         [595.0]       595.000000
4                               0       0.000000
5           [446.0, 564.0, 402.0]       470.666667
6                               0       0.000000
7                               0       0.000000

answered May 1, 2019 at 11:21

iamklaus

3,7682 gold badges14 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

yatu · Accepted Answer · 2019-05-01 11:34:07Z

One way is to use a list comprehension and compute the mean where a given row is a list, which can be checked with isinstance. This is necessary or otherwise you will be getting:

TypeError: 'int' object is not iterable

As the function is expecting an iterable. So you can do:

from statistics import mean
df['mean'] = [mean(i) if isinstance(i, list) else i for i in df.A]

              A                      mean
0  [67.0, 51.0, 23.0, 49.0, 3.0]   38.600000
1                              0    0.000000
2                        [595.0]  595.000000
3                              0    0.000000
4          [446.0, 564.0, 402.0]  470.666667
5                              0    0.000000
6                              0    0.000000

Or you can also use np.mean which does handle both ints and iterables:

import numpy as np
df['mean'] = df.A.map(np.mean)

               A                      mean
0  [67.0, 51.0, 23.0, 49.0, 3.0]   38.600000
1                              0    0.000000
2                        [595.0]  595.000000
3                              0    0.000000
4          [446.0, 564.0, 402.0]  470.666667
5                              0    0.000000
6                              0    0.000000

Tom Ron · Accepted Answer · 2019-05-01 11:47:48Z

0

from collections.abc import Iterable
import numpy as np

def calculate_mean(x):
    if isinstance(x["A"], Iterable):
        x["mean"] = np.mean(x["A"])
    else:
        x["mean"] = x["A"]
    return x

df = df.apply(lambda x: calculate_mean(x), axis=1)

Edit -

df["mean"] = df.apply(lambda x: np.mean(x["A"]), axis=1)

edited May 1, 2019 at 11:47

answered May 1, 2019 at 11:19

Tom Ron

6,2413 gold badges28 silver badges43 bronze badges

1 Comment

yatu Over a year ago

Actually np.mean handles both iterables and ints, so no need for checking if each row is an iterable. df.A.map(np.mean) works

Collectives™ on Stack Overflow

Calculate mean of lists in a DataFrame ignoring empty ones

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related