0

I have a dataframe called teams. Each column is a team in the NFL, each row is how much a given fan would pay to attend a team's game. Looks like:

team1 team2 team3
40 NaN 50
NaN NaN 80
75 30 NaN

I want to compare the standard deviations of each column, so obviously I need to remove the NaNs. I want to do this column-wise though, so that I don't just remove all rows where one value is NaN because I'll lose a lot of data. What's the best way to do this? I have a lot of columns, otherwise I would just make a numpy array representing each column.

2 Answers 2

1

Your assumption is incorrect.

I want to compare the standard deviations of each column, so obviously I need to remove the NaNs

By default std ignores the NaN (skipna=True), so just use:

df.std()

Output:

team1    24.748737
team2          NaN
team3    21.213203
dtype: float64
Sign up to request clarification or add additional context in comments.

Comments

0

Using pandas' .describe(), it shoul already account for any Nans:

import pandas as pd
import numpy as np

columns = ['team1', 'team2',    'team3']
data = [
        [40, np.nan,    50],
        [np.nan, np.nan,    80],
        [75,    30, np.nan]]



df = pd.DataFrame(data=data, columns=columns)
std = df.describe().loc['std']

Output:

print(std)
team1    24.748737
team2          NaN
team3    21.213203

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.