0

I have a dataframe that looks like this:

               A                    
    1  [67.0, 51.0, 23.0, 49.0, 3.0]    
    2  0
    3  [595.0]
    4  0
    5  [446.0, 564.0, 402.0]
    6  0 
    7  0

I would like to find the mean for each list ignoring the zeros. I want to get something like:

               A                     Mean
1  [67.0, 51.0, 23.0, 49.0, 3.0]     38.6
2  0                                    0
3  [595.0]                          595.0
4  0                                    0
5  [446.0, 564.0, 402.0]            470.7
6  0                                    0 
7  0                                    0

I tried many possible solutions listed here and none of them worked. This is what I tried so far:

df['Mean'] = df.A.apply(lambda x: mean(x)) 

which gives me this error

TypeError: 'int' object is not iterable

Also this

df['Mean'] = df['A'].mean(axis=1)

ValueError: No axis named 1 for object type

Tried these as well with no luck:

a = np.array( df['A'].tolist())
a.mean(axis=1)

mean(d for d in a if d)

Is there something else I can try that would give me the expected outcome? Thanks for your help.

1
  • what is the dtype() of A ?? Commented May 1, 2019 at 11:17

3 Answers 3

1

okay this works for me

                A                    
1   [67.0, 51.0, 23.0, 49.0, 3.0]    
2                               0
3                         [595.0]
4                               0
5           [446.0, 564.0, 402.0]
6                               0 
7                               0

using np.mean

data['A'].apply(lambda x: np.mean(eval(x)))

Output

                A                            Mean
1   [67.0, 51.0, 23.0, 49.0, 3.0]       38.600000
2                               0       0.000000
3                         [595.0]       595.000000
4                               0       0.000000
5           [446.0, 564.0, 402.0]       470.666667
6                               0       0.000000
7                               0       0.000000
Sign up to request clarification or add additional context in comments.

Comments

1

One way is to use a list comprehension and compute the mean where a given row is a list, which can be checked with isinstance. This is necessary or otherwise you will be getting:

TypeError: 'int' object is not iterable

As the function is expecting an iterable. So you can do:

from statistics import mean
df['mean'] = [mean(i) if isinstance(i, list) else i for i in df.A]

              A                      mean
0  [67.0, 51.0, 23.0, 49.0, 3.0]   38.600000
1                              0    0.000000
2                        [595.0]  595.000000
3                              0    0.000000
4          [446.0, 564.0, 402.0]  470.666667
5                              0    0.000000
6                              0    0.000000

Or you can also use np.mean which does handle both ints and iterables:

import numpy as np
df['mean'] = df.A.map(np.mean)

               A                      mean
0  [67.0, 51.0, 23.0, 49.0, 3.0]   38.600000
1                              0    0.000000
2                        [595.0]  595.000000
3                              0    0.000000
4          [446.0, 564.0, 402.0]  470.666667
5                              0    0.000000
6                              0    0.000000

Comments

0
from collections.abc import Iterable
import numpy as np

def calculate_mean(x):
    if isinstance(x["A"], Iterable):
        x["mean"] = np.mean(x["A"])
    else:
        x["mean"] = x["A"]
    return x

df = df.apply(lambda x: calculate_mean(x), axis=1)

Edit -

df["mean"] = df.apply(lambda x: np.mean(x["A"]), axis=1)

1 Comment

Actually np.mean handles both iterables and ints, so no need for checking if each row is an iterable. df.A.map(np.mean) works

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.