Pandas dataFrame standard deviation issue

Question

I have this details.

import pandas as pd
import numpy as np

# Sample dataset
data = {
    "date": pd.date_range("2025-10-01", periods=7),
    "sales": [200, 220, 250, np.nan, 300, 310, 305],
    "region": ["East", "West", "East", "West", "East", "West", "East"]
}
df = pd.DataFrame(data)

# 1. Handling missing data
df['sales'].fillna(df['sales'].mean(), inplace=True)

# 2. Filtering rows
east_sales = df[df['region'] == 'East']

# 3. Creating new columns
df['prev_sales'] = df['sales'].shift(1)
df['increase'] = df['sales'] > df['prev_sales']

# 4. Aggregations
agg = df.groupby('region')['sales'].mean()

Why the outputs of df["sales"].std() and df.describe()["sales"].std() are different?

s1 = df['sales'].std()
s2 = df.describe()['sales'].std()
print(s1, s2)

Here is the output from.the above snippet: 43.43737509974049 115.72168519628995

df.describe()["sales"].std() is the std of the descriptive statistics (count, mean, min, 25%, 50%, 75%, max, std) of df['sales']. It's value can be different from df['sales'].std() — Adeva1
– Adeva1, Commented Oct 1 at 19:08
Note that pandas 2.3.3 will give you a FutureWarning for your use of inplace — jackal
– jackal, Commented Oct 2 at 7:55

pixel-process · Accepted Answer · 2025-10-01 22:24:48Z

When calling .std() on a single column, a float value is returned. By contrast, describe() returns summary information on the dataframe that includes std.

import pandas as pd

data = {
    "sales": [200, 220, 250, 300, 310, 305],
    "sales2": [100, 20, 20, 30, 10, 35],
}
df = pd.DataFrame(data)

# Getting std of a single column
s1 = df['sales'].std()
print(s1)
# 47.58326036188217
type(s1)
# float

# Calling std on multiple columns/whole df
all_std = df.std()
# Result df shown below
type(all_std())
# pandas.core.series.Series

# Describe for Summary Info on df
desc = df.describe()
# Result df shown below
type(desc)
# pandas.core.frame.DataFrame

desc['sales']['std'] == s1
# True

all_std df

	0
sales	47.5833
sales2	32.6216

desc df

	sales	sales2
count	6	6
mean	264.167	35.8333
std	47.5833	32.6216
min	200	10
25%	227.5	20
50%	275	25
75%	303.75	33.75
max	310	100

Summary

For getting the value of std of a column, std is most helpful. Calling std on a whole df will return that info for all numeric columns as a series. Describe generates more robust summary information that includes the std which can be accessed using indexing like desc['sales']['std'], which as shown above is equal to the std called on that column.

furas · Accepted Answer · 2025-10-01 22:55:06Z

df.describe()["sales"] is not the same as df["sales"]
so .std() calculates result on different values which are in dataframes.

But you don't have to calculate .std() for df.describe()["sales"].
You only have to get value which already exists in this dataframe.
And it needs ["std"] instead of .std()

print( df['sales'].std() )              # calculate value
print( df.describe()['sales']['std'] )  # get already existing value

And the same is with other values in describe() - count, mean, min, max

print( df["sales"].count() )
print( df.describe()["sales"]["count"] )

print( df["sales"].mean() )
print( df.describe()["sales"]["mean"] )

print( df["sales"].min() )
print( df.describe()["sales"]["min"] )

print( df["sales"].max() )
print( df.describe()["sales"]["max"] )

All this problem can be reduced to this short code:

import pandas as pd

df = pd.DataFrame({"sales": [200, 220, 250, 264, 300, 310, 305]})

print( df["sales"].std() )
print( df.describe()["sales"]["std"] )

Collectives™ on Stack Overflow

Pandas dataFrame standard deviation issue

2 Answers 2

all_std df

desc df

Summary

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

all_std df

desc df

Summary

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related