pandas question: Remove missing values by column

Question

I have a dataframe called teams. Each column is a team in the NFL, each row is how much a given fan would pay to attend a team's game. Looks like:

team1	team2	team3
40	NaN	50
NaN	NaN	80
75	30	NaN

I want to compare the standard deviations of each column, so obviously I need to remove the NaNs. I want to do this column-wise though, so that I don't just remove all rows where one value is NaN because I'll lose a lot of data. What's the best way to do this? I have a lot of columns, otherwise I would just make a numpy array representing each column.

mozway · Accepted Answer · 2022-11-14 22:46:36Z

1

Your assumption is incorrect.

I want to compare the standard deviations of each column, so obviously I need to remove the NaNs

By default std ignores the NaN (skipna=True), so just use:

df.std()

Output:

team1    24.748737
team2          NaN
team3    21.213203
dtype: float64

answered Nov 14, 2022 at 22:46

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

chitown88 · Accepted Answer · 2022-11-20 09:17:23Z

0

Using pandas' .describe(), it shoul already account for any Nans:

import pandas as pd
import numpy as np

columns = ['team1', 'team2',    'team3']
data = [
        [40, np.nan,    50],
        [np.nan, np.nan,    80],
        [75,    30, np.nan]]



df = pd.DataFrame(data=data, columns=columns)
std = df.describe().loc['std']

Output:

print(std)
team1    24.748737
team2          NaN
team3    21.213203

answered Nov 20, 2022 at 9:17

chitown88

29.1k6 gold badges34 silver badges67 bronze badges

Collectives™ on Stack Overflow

pandas question: Remove missing values by column

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related