I have this details.
import pandas as pd
import numpy as np
# Sample dataset
data = {
"date": pd.date_range("2025-10-01", periods=7),
"sales": [200, 220, 250, np.nan, 300, 310, 305],
"region": ["East", "West", "East", "West", "East", "West", "East"]
}
df = pd.DataFrame(data)
# 1. Handling missing data
df['sales'].fillna(df['sales'].mean(), inplace=True)
# 2. Filtering rows
east_sales = df[df['region'] == 'East']
# 3. Creating new columns
df['prev_sales'] = df['sales'].shift(1)
df['increase'] = df['sales'] > df['prev_sales']
# 4. Aggregations
agg = df.groupby('region')['sales'].mean()
Why the outputs of df["sales"].std() and df.describe()["sales"].std() are different?
s1 = df['sales'].std()
s2 = df.describe()['sales'].std()
print(s1, s2)
Here is the output from.the above snippet: 43.43737509974049 115.72168519628995
df.describe()["sales"].std()is the std of the descriptive statistics (count, mean, min, 25%, 50%, 75%, max, std) ofdf['sales']. It's value can be different fromdf['sales'].std()inplace