2

I have this details.

import pandas as pd
import numpy as np

# Sample dataset
data = {
    "date": pd.date_range("2025-10-01", periods=7),
    "sales": [200, 220, 250, np.nan, 300, 310, 305],
    "region": ["East", "West", "East", "West", "East", "West", "East"]
}
df = pd.DataFrame(data)

# 1. Handling missing data
df['sales'].fillna(df['sales'].mean(), inplace=True)

# 2. Filtering rows
east_sales = df[df['region'] == 'East']

# 3. Creating new columns
df['prev_sales'] = df['sales'].shift(1)
df['increase'] = df['sales'] > df['prev_sales']

# 4. Aggregations
agg = df.groupby('region')['sales'].mean()

Why the outputs of df["sales"].std() and df.describe()["sales"].std() are different?

s1 = df['sales'].std()
s2 = df.describe()['sales'].std()
print(s1, s2)

Here is the output from.the above snippet: 43.43737509974049 115.72168519628995

3
  • i cannot reproduce this with your sample dataset. Commented Oct 1 at 18:55
  • 5
    df.describe()["sales"].std() is the std of the descriptive statistics (count, mean, min, 25%, 50%, 75%, max, std) of df['sales']. It's value can be different from df['sales'].std() Commented Oct 1 at 19:08
  • Note that pandas 2.3.3 will give you a FutureWarning for your use of inplace Commented Oct 2 at 7:55

2 Answers 2

1

When calling .std() on a single column, a float value is returned. By contrast, describe() returns summary information on the dataframe that includes std.

import pandas as pd

data = {
    "sales": [200, 220, 250, 300, 310, 305],
    "sales2": [100, 20, 20, 30, 10, 35],
}
df = pd.DataFrame(data)

# Getting std of a single column
s1 = df['sales'].std()
print(s1)
# 47.58326036188217
type(s1)
# float

# Calling std on multiple columns/whole df
all_std = df.std()
# Result df shown below
type(all_std())
# pandas.core.series.Series

# Describe for Summary Info on df
desc = df.describe()
# Result df shown below
type(desc)
# pandas.core.frame.DataFrame

desc['sales']['std'] == s1
# True

all_std df

0
sales 47.5833
sales2 32.6216

desc df

sales sales2
count 6 6
mean 264.167 35.8333
std 47.5833 32.6216
min 200 10
25% 227.5 20
50% 275 25
75% 303.75 33.75
max 310 100

Summary

For getting the value of std of a column, std is most helpful. Calling std on a whole df will return that info for all numeric columns as a series. Describe generates more robust summary information that includes the std which can be accessed using indexing like desc['sales']['std'], which as shown above is equal to the std called on that column.

Sign up to request clarification or add additional context in comments.

Comments

1

df.describe()["sales"] is not the same as df["sales"]
so .std() calculates result on different values which are in dataframes.

But you don't have to calculate .std() for df.describe()["sales"].
You only have to get value which already exists in this dataframe.
And it needs ["std"] instead of .std()

print( df['sales'].std() )              # calculate value
print( df.describe()['sales']['std'] )  # get already existing value

And the same is with other values in describe() - count, mean, min, max

print( df["sales"].count() )
print( df.describe()["sales"]["count"] )

print( df["sales"].mean() )
print( df.describe()["sales"]["mean"] )

print( df["sales"].min() )
print( df.describe()["sales"]["min"] )

print( df["sales"].max() )
print( df.describe()["sales"]["max"] )

All this problem can be reduced to this short code:

import pandas as pd

df = pd.DataFrame({"sales": [200, 220, 250, 264, 300, 310, 305]})

print( df["sales"].std() )
print( df.describe()["sales"]["std"] )

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.