1

I would like to revert index after groupby function.

Question is how to create a DateTime index having year, month, day in separate columns in Multindex.

Given a DataFrame as an example:

import pandas as pd
import numpy as np

index=pd.date_range('2011-1-1 00:00:00', '2011-1-31 23:50:00', freq='10min')
df=pd.DataFrame(np.random.randn(len(index),2).cumsum(axis=0),columns=['A','B'],index=index)

Then, get the sum over each hour using grupby:

day_h = df.groupby([lambda x: x.year, lambda x: x.month, lambda x: x.day,lambda x: x.hour]).mean()

This creates an Index, where year, month, day and hour are in separate columns.

                      A         B
2011    1   1   0    0.209908  1.196164
2011    1   1   1    0.692531  0.518185
2011    1   1   2    1.674748  0.013136
2011    1   1   3    1.674748  0.013136 
2011    1   1   4    1.674748  0.013136
2011    1   1   5    1.674748  0.013136

The desired output would be to have DateTime index:

                 A         B
2011-1-1 00:00  0.209908  1.196164
2011-1-1 01:00  0.692531  0.518185
2011-1-1 03:00  1.674748  0.013136
2011-1-1 04:00  1.674748  0.013136
2011-1-1 05:00  1.674748  0.013136

In my files there are some missing rows, so I can't create a new index with 1h timestep.

My data after groupby Example data

13
  • What's wrong with the current df? The index shows the relative level values, as you have multiple hours for a given day the output is correct Commented Sep 11, 2015 at 12:15
  • Also what are you trying to achieve here? Your groupby object is no different from your sample df as the mean here is the same Commented Sep 11, 2015 at 12:17
  • Yes it is correct, but I would like to get rid off Multindex and have Datetime index. Commented Sep 11, 2015 at 12:17
  • 1
    Are you really after df.resample('h', how='mean')? Commented Sep 11, 2015 at 12:18
  • @EdChum this would work if there are no missing values (hours, days). resample creates empty rows that I wouldn't like to have. I know I can drop them but I'm looking for a solution that will take date from multiple columns. Commented Sep 11, 2015 at 12:24

1 Answer 1

1

Someone else on SO had a similar question, but their solution was to use resample. You can avoid resampling by mapping the tuples in the multi-index to create a new index. This will handle missing rows just fine.

day_h['new_index'] = day_h.index.map(lambda x: datetime.datetime(x[0], x[1], x[2], x[3]))
day_h.set_index('new_index')

Output:

                        A          B
new_index                                
2011-01-01 00:00:00  -1.095114   1.995776
2011-01-01 01:00:00  -2.411459   4.508794
2011-01-01 02:00:00  -1.261747   4.953709
2011-01-01 03:00:00  -0.311934   5.454112
2011-01-01 04:00:00   2.095718   6.854375
2011-01-01 05:00:00   1.696756   3.518919
2011-01-01 06:00:00   0.623589   1.740478
2011-01-01 07:00:00   0.544426   0.916016
2011-01-01 08:00:00   2.331326   0.891177
Sign up to request clarification or add additional context in comments.

1 Comment

The only thing I was missing with @Edchum answer were magic x[0] and so on...Thank you @thecircus !

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.