0

I'm trying to code as efficiently and pythonicly as possible (that is avoiding loops whenever possible) and I've found this and this but they seem to be irrelevant to my question.

Consider a pd.DataFrame called df with 30 years of hourly data for 28 countries:

  • index: datetime64[ns] (e.g. '1980-01-01 00:00:00') and in
  • columns: a 2-digit countrycode of each of the 28 countries.

e.g.

countrycode                AT        BE        [...]
time                                   
1980-01-01 00:00:00  0.016947  0.059076
1980-01-01 01:00:00  0.019461  0.068575
[...]
2009-31-12 23:00:00  0.119735  0.287495

In order to create yearly boxplots for every country separately, e.g. via ds[country].plot(kind='box'), I intend to transform this DataFrame into the following form:

  • columns: pd.MultiIndex such as (countrycode, year) e.g. ('AT', 1980), ('AT', 1981), ...
  • index: range(8760) e.g. (0, 1, ... 8759)

The piece of code I came up with works well, but its damn slow due to a nested loop:

years = sorted(set(df.index.year))
ds = pd.DataFrame(columns=pd.MultiIndex
                            .from_product([df.columns, years]))
for reg, col in df.iteritems():
    for year, group in col.groupby(by=df.index.year):
        ds.loc[:, (reg, year)] = group.reset_index(drop=True)

Any ideas to code this part more efficiently are highly welcome :)

1 Answer 1

0
df.groupby(['countrycode','year']).size().unstack().plot(kind='box')
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, but this does not work, as neither 'year' (as part of datetime64[ns]-index) nor 'countrycode' (as name of the columns) are valid keys.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.