0

I have a DataFrame like this:

df = pd.DataFrame(data= {'month' : [2,7,4,8], 'sales' : [10,40,70,50]})

I would like to get the sum of sales aggregated by the month. However, I want to have two groups of month combined, the first for months 1-6 (resulting in sales of 80) and the second for the months 7-12 (resulting in 90).

What's the best way to do this?

1
  • 1
    I won't post this as an answer since it uses extra technology but using duckdb (pip install duckdb) you could answer your query using SQL directly on the dataframe as such: duckdb.query("SELECT CASE WHEN month <= 6 THEN 1 ELSE 2 END as halfyear, sum(sales) FROM df GROUP BY halfyear").to_df(). Commented Nov 30, 2021 at 15:06

2 Answers 2

1

One way to do this is to create a column that acts as a grouping key. This can be done like so:

import numpy as np
import pandas as pd

df = pd.DataFrame(data= {'month': [2, 7, 4, 8], 'sales' : [10, 40, 70, 50]})
df["foo"] = np.where(df['month'] < 7, 0, 1)
bar = df.groupby(['foo']).sum()

Here, a foo column is being created which will assign a group to each column depending on the condition you defined. i.e df['month'] < 7. Then using this created column you can perform a classic groupby() and obtain the sum.


Note you can also use df.groupby(['foo'])['sales'].agg('sum') if you only want to keep the sales column.

Sign up to request clarification or add additional context in comments.

Comments

1

You can use pd.cut to assign labels to the months and use these in a groupby:

>>> df.groupby(pd.cut(df["month"], bins=[0, 6, 12], labels=["1-6", "7-12"]))["sales"].sum()

month
1-6     80
7-12    90
Name: sales, dtype: int64

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.