Create Datetime index after groupby

Question

I would like to revert index after groupby function.

Question is how to create a DateTime index having year, month, day in separate columns in Multindex.

Given a DataFrame as an example:

import pandas as pd
import numpy as np

index=pd.date_range('2011-1-1 00:00:00', '2011-1-31 23:50:00', freq='10min')
df=pd.DataFrame(np.random.randn(len(index),2).cumsum(axis=0),columns=['A','B'],index=index)

Then, get the sum over each hour using grupby:

day_h = df.groupby([lambda x: x.year, lambda x: x.month, lambda x: x.day,lambda x: x.hour]).mean()

This creates an Index, where year, month, day and hour are in separate columns.

                      A         B
2011    1   1   0    0.209908  1.196164
2011    1   1   1    0.692531  0.518185
2011    1   1   2    1.674748  0.013136
2011    1   1   3    1.674748  0.013136 
2011    1   1   4    1.674748  0.013136
2011    1   1   5    1.674748  0.013136

The desired output would be to have DateTime index:

                 A         B
2011-1-1 00:00  0.209908  1.196164
2011-1-1 01:00  0.692531  0.518185
2011-1-1 03:00  1.674748  0.013136
2011-1-1 04:00  1.674748  0.013136
2011-1-1 05:00  1.674748  0.013136

In my files there are some missing rows, so I can't create a new index with 1h timestep.

My data after groupby Example data

What's wrong with the current df? The index shows the relative level values, as you have multiple hours for a given day the output is correct — EdChum
– EdChum, Commented Sep 11, 2015 at 12:15
Also what are you trying to achieve here? Your groupby object is no different from your sample df as the mean here is the same — EdChum
– EdChum, Commented Sep 11, 2015 at 12:17
Yes it is correct, but I would like to get rid off Multindex and have Datetime index. — Michal
– Michal, Commented Sep 11, 2015 at 12:17
@EdChum this would work if there are no missing values (hours, days). resample creates empty rows that I wouldn't like to have. I know I can drop them but I'm looking for a solution that will take date from multiple columns. — Michal
– Michal, Commented Sep 11, 2015 at 12:24

Community · Accepted Answer · 2017-05-23 11:44:44Z

1

Someone else on SO had a similar question, but their solution was to use resample. You can avoid resampling by mapping the tuples in the multi-index to create a new index. This will handle missing rows just fine.

day_h['new_index'] = day_h.index.map(lambda x: datetime.datetime(x[0], x[1], x[2], x[3]))
day_h.set_index('new_index')

Output:

                        A          B
new_index                                
2011-01-01 00:00:00  -1.095114   1.995776
2011-01-01 01:00:00  -2.411459   4.508794
2011-01-01 02:00:00  -1.261747   4.953709
2011-01-01 03:00:00  -0.311934   5.454112
2011-01-01 04:00:00   2.095718   6.854375
2011-01-01 05:00:00   1.696756   3.518919
2011-01-01 06:00:00   0.623589   1.740478
2011-01-01 07:00:00   0.544426   0.916016
2011-01-01 08:00:00   2.331326   0.891177

edited May 23, 2017 at 11:44

CommunityBot

11 silver badge

answered Sep 11, 2015 at 14:05

thecircus

9157 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Michal Over a year ago

The only thing I was missing with @Edchum answer were magic x[0] and so on...Thank you @thecircus !

Collectives™ on Stack Overflow

Create Datetime index after groupby

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related