3

I have a dataset which contains a DateTime field. I need to group by hours and dispatch each group to a dictionary with the following structure:

{year_1: 
    {month_1: 
        {week_1: 
            {day_1: 
                {hour_1: df_1, hour_2: df_2}
            }
        },
        {week_2: 
            {day_1: 
                {hour_1: df_1}
            }
        }
    },
    {month_3: 
        {week_1: 
            {day_1: 
                {hour_1: df_1, hour_2: df_2}
            }
        }
    },
year_2:
    {month_5: 
        {week_1: 
            {day_1: 
                {hour_2: df_2}
            }
        }
    }
}

To do that I am using the following code:

import pandas as pd

df = df = pd.DataFrame({'date': [pd.datetime(2015,3,17,2),    pd.datetime(2014,3,24,3), pd.datetime(2014,3,17,4)], 'hdg_id': [4041,4041,4041],'stock': [1.0,1.0,1.0]})
df.loc[:,'year'] = [x.year for x in df['date']]
df.loc[:,'month'] = [x.month for x in df['date']]
df.loc[:,'week'] = [x.week for x in df['date']]
df.loc[:,'day'] = [x.day for x in df['date']]
df.loc[:,'hour'] = [x.hour for x in df['date']]

result = {}
for to_unpack, df_hour in df.groupby(['year','month','day','week','hour']):
    year, month, week, day, hour = to_unpack
    try:
        result[year]
    except KeyError:
        result[year] = {}
    try:
        result[year][month]
    except KeyError:
        result[year][month] = {}
    try:
        result[year][month][week]
    except KeyError:
        result[year][month][week] = {}
    try:
        result[year][month][week][day]
    except KeyError:
        result[year][month][week][day] = {}

    result[year][month][week][day][hour] = df_hour

As you can see this is pretty much a brute-force solution and I was looking for something that looks more clean and understandable. Furthermore, it is also extremely slow. I tried different ways for grouping (Python Pandas Group by date using datetime data) and I also tried a multindex with each component of datetime (Pandas DataFrame with MultiIndex: Group by year of DateTime level values). However, the problem is always how to create the dict. Ideally, I would like just to write something like:

result[year][month][week][day][hour] = df_hour

but to the best of my knowledge, I first need to initialize each dict.

1 Answer 1

4

You need dict.setdefault

result = {}
for to_unpack, df_hour in df.groupby(['year','month','day','week','hour']):
    year, month, week, day, hour = to_unpack

    result.setdefault(year, {}) \
          .setdefault(month, {}) \
          .setdefault(week, {}) \
          .setdefault(day, {}) \
          .setdefault(hour, df_hour)

You can also subclass dict to do this

class Fict(dict):
    def __getitem__(self, item):
        return super().setdefault(item, type(self)())

result = Fict()

for to_unpack, df_hour in df.groupby(['year','month','day','week','hour']):
    year, month, week, day, hour = to_unpack

    result[year][month][week][day][hour] = df_hour
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.