1

I have a data frame as follows:


data1 month day

20    1     1

10    1     1

15    1     2

12    1     2

16    1     3

10    1     3

20    2     1

10    2     1

15    2     2

10    2     2

12    2     3

10    2     3

I want to find the sum of data for each day of each month and display the result as a dataframe similar to the following:


date sum_data1

1.1. 30

2.1. 27

3.1. 26

1.2. 30

2.2. 25

3.2. 22

 

The data set is quite big > 200,000 rows.

3
  • You can add two columns: day, month and then to group by them and to sum, I can elaborate more in an answer if you would like Commented Oct 14, 2022 at 8:18
  • how do you mean? Commented Oct 14, 2022 at 8:20
  • Just as @jezrael Answered Commented Oct 14, 2022 at 8:26

3 Answers 3

2

Because no column year first add it to month and day, pass to to_datetime and aggregate sum:

date = pd.to_datetime(df[['month','day']].assign(year=2022))

df = df.groupby(date.rename('date'))['data1'].sum().reset_index(name='sum_data1')
print (df)
        date  sum_data1
0 2022-01-01         30
1 2022-01-02         27
2 2022-01-03         26
3 2022-02-01         30
4 2022-02-02         25
5 2022-02-03         22
Sign up to request clarification or add additional context in comments.

2 Comments

could I expand this, e.g. if I originally had columns data1 & data2? could I generate two new columns besides the date, named sum_data1 and sum_data2?
@licm - sure, use df1 = df.groupby(date.rename('date'))[['data1', 'data2']].sum().reset_index()
0
df = df.groupby(['month', 'day']).sum().reset_index()
df['New'] = df.apply(lambda row:  float(f'{row.day}.{row.month}'), axis=1)
df

Output:

    month   day data1   New
0   1       1   30     1.1
1   1       2   27     2.1
2   1       3   26     3.1
3   2       1   30     1.2
4   2       2   25     2.2
5   2       3   22     3.2

Comments

0

You can do something like this:

df = pd.DataFrame({
    'data1': [20, 10, 15, 12, 16, 10, 20, 10, 15, 10, 12, 10],
    'month': [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2],
    'day': [1, 1, 2, 2, 3, 3, 1, 1, 2, 2, 3, 3]
})
df['date'] = df['month'].astype('string') + '.' + df['day'].astype('string')

df.groupby(['date']).sum()[['data1']].rename(columns={'data1':'sum_data1'})
    sum_data1
date    
1.1 30
1.2 27
1.3 26
2.1 30
2.2 25
2.3 22

After that, you can use reset_index() if needed...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.