2

I'm trying to sum the total monthly amount by the code below,

month_sum = df.groupby(([df['Year'], df['Month']]))['amount'].agg(np.sum)

But I need to drop those data or change the sum result to NaN if they do not contain enough days' data(eg: only 10 groups of data for January).

I only know I can drop data by dp.drop(), which drop data according to column characteristics...And I cannot use it in this situation. Can anyone show me how to do that?

3 Answers 3

3

Consider this sample df

df = pd.DataFrame({'year': ['2017']*20, 'month': list('1')*12 + list('2')*8, 'amount': np.random.randint(0,50,20)})

You can sum by condition using lambda

df.groupby(['year', 'month']).amount.apply(lambda x: x.sum() if x.count() > 10 else np.nan).reset_index()

You get

    year    month   amount
0   2017    1       249.0
1   2017    2       NaN

Edit:

df = pd.DataFrame({'year': ['2017']*20, 'month': ['1']*12 + ['2']*8,\ 
'amount': np.random.randint(0,50,20),'other':np.random.randint(0,30,20)})

df.groupby(['year', 'month']).apply(lambda x: x['amount'].sum() if\ 
x['other'].sum() > 150 else np.nan).reset_index()
Sign up to request clarification or add additional context in comments.

1 Comment

What if I want to use a condition rely on sum of other column? df=(raw_data.groupby(['Year', 'Month'])['amount'] .apply(lambda x: x.sum() if raw_data['othercolumn'].sum() >= n else np.nan).reset_index()) If othercolumn sum < n, It should return NaN but it turns to 0. Why is that?
1

You can always create a custom aggregation function.
For your example:

import pandas as pd

df = pd.DataFrame(index=pd.date_range('2017-01-01', '2017-02-05'))
df['amount'] = range(len(df))


def custom_sum(s):
    if len(s) > 10:
        return s.sum()
    else:
        return None

g = df.groupby([df.index.year, df.index.month])['amount'].agg(custom_sum)
print(g)

output:

2017  1    465.0
      2      NaN

Comments

0

Borrowed @vaishali's data set:

In [24]: df.groupby(['year', 'month']).amount \
           .agg(lambda x: x.sum() * 1 if x.count() > 10 else np.nan)
Out[24]:
year  month
2017  1        216.0
      2          NaN
Name: amount, dtype: float64

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.