0

I have a dataframe with multiindex which I want to convert to date() index.

Here is an example emulation of the type of dataframes I have:

i = pd.date_range('01-01-2016', '01-01-2020')
x = pd.DataFrame(index = i, data=np.random.randint(0, 10, len(i)))
x = x.groupby(by = [x.index.year, x.index.month]).sum()
print(x)

I tried to convert it to date index by this:

def to_date(ind):
    return pd.to_datetime(str(ind[0]) + '/' + str(ind[1]), format="%Y/%m").date()

# flattening the multiindex to tuples to later reset the index
x.set_axis(x.index.to_flat_index(), axis=0, inplace = True)    

x = x.rename(index = to_date)

x.set_axis(pd.DatetimeIndex(x.index), axis=0, inplace=True)

But it is very slow. I think the problem is in the pd.to_datetime(str(ind[0]) + '/' + str(ind[1]), format="%Y/%m").date() line. Would greatly appreciate any ideas to make this faster.

1 Answer 1

1

You can just use:

x.index=pd.to_datetime([f"{a}-{b}" for a,b in x.index],format='%Y-%m')
print(x)

            0
2016-01-01  162
2016-02-01  119
2016-03-01  148
2016-04-01  125
2016-05-01  132
2016-06-01  144
2016-07-01  157
2016-08-01  141
2016-09-01  138
2016-10-01  168
2016-11-01  140
2016-12-01  137
2017-01-01  113
2017-02-01  113
2017-03-01  155
..........
..........
......
Sign up to request clarification or add additional context in comments.

2 Comments

you are right, it is 10 times faster. But why? It virtually seems to be the same idea in the code.
@guyguyguy12345 yes, but there are too many operations there , also some functions eg: set_index operates on the dataframe and not with index only. However this method just operates wit index list.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.