Use column with duplicated values as data frame index in Pandas

Question

I would like to set index for a data frame using a column with duplicated values. Is there any way that Pandas can automatically add a second index so that when the first index is duplicated then the second index will be increased?

For example:

   ID              name  company           position
   ------------------------------------------------
0  23      Alex Monoson   Coobit      Sales manager
1  12    Johnny Johnson   Coobit  Marketing manager
2  62         Hans Dupa    Pesik  Marketing manager
3  31    Jessica Heiler  Montino           Engineer
4  92  Dominic Alvorine  Montino                CFO
5  16           Hei Lee   Coobit                CEO

I would like to use company as index and there will be another integer index column

My expected output:

                    ID    name    position
company
------------------------------------------
Coobit      0       blah  blah        blah
Coobit      1       blah  blah        blah
Coobit      2       blah  blah        blah
Pesik       0       blah  blah        blah
Montino     0       blah  blah        blah
Montino     1       blah  blah        blah

BENY · Accepted Answer · 2019-11-08 03:39:57Z

1

We can use cumcount

df['index2']=df.groupby('company').cumcount()
df=df.set_index(['company','index2']).sort_index()

answered Nov 8, 2019 at 3:39

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Use column with duplicated values as data frame index in Pandas

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related