2

I have a pandas DataFrameGroupBy (df_groups) that I have created by grouping another dataframe (df_pub) containing a list of publications by their day/month/year index.

df_groups = df_pub.groupby(by=df_pub.index.day, df_pub.index.month,df_pub.index.year],sort=False)

I then want to check how many unique publications exist in each group, so I use:

n_unique_pub = df_groups.Title.nunique()

This is a pandas series with a MutiIndex that looks like the following:

MultiIndex([( 1,  7, 2020),
            ( 2,  7, 2020),
            ( 3,  7, 2020),
            ( 4,  7, 2020),
            ( 5,  7, 2020),
            ( 6,  7, 2020),
            ( 7,  7, 2020),
            ( 8,  7, 2020),
            ( 9,  7, 2020),
            (10,  7, 2020),
            ...
            ( 8, 11, 2021),
            ( 9, 11, 2021),
            (10, 11, 2021),
            (11, 11, 2021),
            (12, 11, 2021),
            (13, 11, 2021),
            (14, 11, 2021),
            (15, 11, 2021),
            (16, 11, 2021),
            (17, 11, 2021)],
           names=['Date', 'Date', 'Date'], length=497)

I would like to convert this MultiIndex into a DatetimeIndex so that it would look like:

DatetimeIndex(['2020-07-01', '2020-07-02', '2020-07-03', '2020-07-04',
               '2020-07-05', '2020-07-06', '2020-07-07', '2020-07-08',
               '2020-07-09', '2020-07-10',
               ...
               '2021-11-08', '2021-11-09', '2021-11-10', '2021-11-11',
               '2021-11-12', '2021-11-13', '2021-11-14', '2021-11-15',
               '2021-11-16', '2021-11-17'],
              dtype='datetime64[ns]', name='Date', length=505, freq='D')

Is there a simple way to do it? So far I've tried several approaches but nothing has worked. For example if I do pd.to_datetime(n_unique_pub.index) I have an error: TypeError: <class 'tuple'> is not convertible to datetime.

3 Answers 3

2

Use pd.to_datetime:

# mi is your MultiIndex instance, like mi = df.index
>>> pd.DatetimeIndex(pd.to_datetime(mi.rename(['day', 'month', 'year']).to_frame()))

DatetimeIndex(['2020-07-01', '2020-07-02', '2020-07-03', '2020-07-04',
               '2020-07-05', '2020-07-06', '2020-07-07', '2020-07-08',
               '2020-07-09', '2020-07-10', '2021-11-08', '2021-11-09',
               '2021-11-10', '2021-11-11', '2021-11-12', '2021-11-13',
               '2021-11-14', '2021-11-15', '2021-11-16', '2021-11-17'],
              dtype='datetime64[ns]', freq=None)

How to replace a MultiIndex to DatetimeIndex:

idx = pd.to_datetime(df.index.rename(['day', 'month', 'year']).to_frame())
df = df.set_index(idx)
print(df)

# Output:
                   A
2020-07-01  0.961038
2020-07-02  0.098132
2020-07-03  0.406996
2020-07-04  0.008376
2020-07-05  0.568059
2020-07-06  0.576610
2020-07-07  0.137144
2020-07-08  0.672219
2020-07-09  0.142874
2020-07-10  0.509231
2021-11-08  0.368762
2021-11-09  0.249107
2021-11-10  0.136282
2021-11-11  0.119291
2021-11-12  0.052388
2021-11-13  0.434899
2021-11-14  0.770705
2021-11-15  0.850914
2021-11-16  0.621283
2021-11-17  0.379888
Sign up to request clarification or add additional context in comments.

Comments

1

You can convert to 'YYYY-MM-DD' format first:

idx = pd.MultiIndex.from_tuples(
    [( 1,  7, 2020),
     ( 2,  7, 2020),]
    )

pd.to_datetime(idx.map(lambda x: '-'.join(map(str, reversed(x)))))

Output:

DatetimeIndex(['2020-07-01', '2020-07-02'], dtype='datetime64[ns]', freq=None)

Comments

0

something like should work

dates = pd.to_datetime(df_groups.reset_index()[['year','month','day']]) 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.