5

I have and pandas dataframe with a multiindex that looks like this:

# -*- coding: utf-8 -*-
import numpy as np
import pandas as pd

# multi-indexed dataframe
df = pd.DataFrame(np.random.randn(8760 * 3, 3))
df['concept'] = "some_value"
df['datetime'] = pd.date_range(start='2016', periods=len(df), freq='60Min')
df.set_index(['concept', 'datetime'], inplace=True)
df.sort_index(inplace=True)

Console output:

df.head()
Out[23]: 
                 0         1         2
datetime                              
2016      0.458802  0.413004  0.091056
2016     -0.051840 -1.780310 -0.304122
2016     -1.119973  0.954591  0.279049
2016     -0.691850 -0.489335  0.554272
2016     -1.278834 -1.292012 -0.637931

df.head()
    ...: df.tail()

Out[24]: 
                 0         1         2
datetime                              
2018     -1.872155  0.434520 -0.526520
2018      0.345213  0.989475 -0.892028
2018     -0.162491  0.908121 -0.993499
2018     -1.094727  0.307312  0.515041
2018     -0.880608 -1.065203 -1.438645

Now I want to create annual sums along the level 'datetime'.

My first try was the following but this doesn't work:

# sum along years
years = df.index.get_level_values('datetime').year.tolist()
df.index.set_levels([years], level=['datetime'], inplace=True)
df = df.groupby(level=['datetime']).sum()

And it also seems quite heavy handed to me as this task is probably pretty easy to realize.

So here's my question: How can I get annual sums for the level 'datetime'? Is there a simple way to realize this by applying a function to the DateTime level values?

2 Answers 2

4

You can groupby by second level of multiindex and year:

# -*- coding: utf-8 -*-
import numpy as np
import pandas as pd

# multi-indexed dataframe
df = pd.DataFrame(np.random.randn(8760  * 3, 3))
df['concept'] = "some_value"
df['datetime'] = pd.date_range(start='2016', periods=len(df), freq='60Min')
df.set_index(['concept', 'datetime'], inplace=True)
df.sort_index(inplace=True)
print df.head() 
                                       0         1         2
concept    datetime                                         
some_value 2016-01-01 00:00:00  1.973437  0.101535 -0.693360
           2016-01-01 01:00:00  1.221657 -1.983806 -0.075609
           2016-01-01 02:00:00 -0.208122 -2.203801  1.254084
           2016-01-01 03:00:00  0.694332 -0.235864  0.538468
           2016-01-01 04:00:00 -0.928815 -1.417445  1.534218

# sum along years
#years = df.index.get_level_values('datetime').year.tolist()
#df.index.set_levels([years], level=['datetime'], inplace=True)

print df.index.levels[1].year
[2016 2016 2016 ..., 2018 2018 2018]
df = df.groupby(df.index.levels[1].year).sum()
print df.head()
               0           1          2
2016  -93.901914  -32.205514 -22.460965
2017  205.681817   67.701669 -33.960801
2018   67.438355  150.954614 -21.381809

Or you can use get_level_values and year:

df = df.groupby(df.index.get_level_values('datetime').year).sum()
print df.head()
               0           1          2
2016  -93.901914  -32.205514 -22.460965
2017  205.681817   67.701669 -33.960801
2018   67.438355  150.954614 -21.381809
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks! I already tried the option using get_level_values() but obviously something went wrong.. Anyhow, your answer saved my day! :)
Glad can help you! Good luck!
4

Starting with your sample data:

df = pd.DataFrame(np.random.randn(8760 * 3, 3))
df['concept'] = "some_value"
df['datetime'] = pd.date_range(start='2016', periods=len(df), freq='60Min')
df.set_index(['concept', 'datetime'], inplace=True)

you can apply groupby to a level of your MultiIndex:

df.groupby(pd.TimeGrouper(level='datetime', freq='A')).sum()

to get:

                     0          1          2
datetime                                    
2016-12-31  100.346135 -71.673222  42.816675
2017-12-31 -132.880909 -66.017010 -73.449358
2018-12-31  -71.449710 -15.774929  97.634349

pd.TimeGrouper is now (0.23) deprecated; please use pd.Grouper(freq=...) instead.

4 Comments

Thanks for your answer! I didn't know the TimeGrouper option and it seems to be a really helpful feature. Nevertheless, I accepted jezraels answer since it's also applicable to non DateTime levels without using the year attribute.
Wasn't your question about a 'datetime level'?
Yes, but if I am not mistaken, jezraels answer is also applicable for non-datetime levels whereas yours is limited (but nevertheless right!) to these. Or is this not the case? Anyway, thanks again!
Non-datetime levels don't have a 'year' attribute, for instance. You also cannot use the numerous other frequencies that are available with a DateTimeIndex but are not a datetime attribute. Also, TimeGrouper was the original intend solution for DateTimeIndexes until Grouper became the generic solution, including for datetime index and columns.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.