Converting Index into MultiIndex (hierarchical index) in Pandas

Question

In the data I am working with the index is compound - i.e. it has both item name and a timestamp, e.g. [email protected]|2013-05-07 05:52:51 +0200.

I want to do hierarchical indexing, so that the same e-mails are grouped together, so I need to convert a DataFrame Index into a MultiIndex (e.g. for the entry above - ([email protected], 2013-05-07 05:52:51 +0200)).

What is the most convenient method to do so?

Piotr Migdal · Accepted Answer · 2017-07-10 10:34:55Z

28

Once we have a DataFrame

import pandas as pd
df = pd.read_csv("input.csv", index_col=0)  # or from another source

and a function mapping each index to a tuple (below, it is for the example from this question)

def process_index(k):
    return tuple(k.split("|"))

we can create a hierarchical index in the following way:

df.index = pd.MultiIndex.from_tuples([process_index(k) for k,v in df.iterrows()])

An alternative approach is to create two columns then set them as the index (the original index will be dropped):

df['e-mail'] = [x.split("|")[0] for x in df.index] 
df['date'] = [x.split("|")[1] for x in df.index]
df = df.set_index(['e-mail', 'date'])

or even shorter

df['e-mail'], df['date'] = zip(*map(process_index, df.index))
df = df.set_index(['e-mail', 'date'])

edited Jul 10, 2017 at 10:34

answered Jul 23, 2013 at 19:16

Piotr Migdal

13k9 gold badges70 silver badges90 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Moot Over a year ago

This was very helpful. But, as far as I can see, when calling set_index() the default is inplace=False, so one has to use inplace=True or else assign df back to itself.

Piotr Migdal Over a year ago

@Moot Thanks, updated. Either a typo or back them (4 years ago) it was inplace by default.

Piotr Migdal Over a year ago

Thanks! I was too fast and careless.

Def_Os · Accepted Answer · 2015-12-02 00:08:20Z

14

In pandas>=0.16.0, we can use the .str accessor on indices. This makes the following possible:

df.index = pd.MultiIndex.from_tuples(df.index.str.split('|').tolist())

(Note: I tried the more intuitive: pd.MultiIndex.from_arrays(df.index.str.split('|')) but for some reason that gives me errors.)

answered Dec 2, 2015 at 0:08

Def_Os

5,4775 gold badges37 silver badges64 bronze badges

Comments

Andy Hayden · Accepted Answer · 2013-07-23 20:28:47Z

My preference would be to initially read this in as a column (i.e. not as an index), then you can use the str split method:

csv = '\n'.join(['[email protected]|2013-05-07 05:52:51 +0200, 42'] * 3)
df = pd.read_csv(StringIO(csv), header=None)

In [13]: df[0].str.split('|')
Out[13]:
0    [[email protected], 2013-05-07 05:52:51 +0200]
1    [[email protected], 2013-05-07 05:52:51 +0200]
2    [[email protected], 2013-05-07 05:52:51 +0200]
Name: 0, dtype: object

And then feed this into a MultiIndex (perhaps this can be done cleaner?):

m = pd.MultiIndex.from_arrays(zip(*df[0].str.split('|')))

Delete the 0th column and set the index to the new MultiIndex:

del df[0]
df.index = m

In [17]: df
Out[17]:
                                            1
[email protected] 2013-05-07 05:52:51 +0200  42
                2013-05-07 05:52:51 +0200  42
                2013-05-07 05:52:51 +0200  42

Collectives™ on Stack Overflow

Converting Index into MultiIndex (hierarchical index) in Pandas

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related