3

Having a dataframe like this:

>>> df = pd.DataFrame({'name': ['foo', 'foo', 'bar', 'bar'],
                   'colx': [1, 2, 3, 4],
                   'coly': [5, 6, 7, 8]})
>>> df.set_index('name', inplace=True)
>>> df
      colx  coly
name            
foo      1     5
foo      2     6
bar      3     7
bar      4     8

how is it possible to get a proper formatted index like:

      colx  coly
name            
foo      1     5
         2     6
bar      3     7
         4     8

so that pandas doesn't complains about duplicated indices.

1 Answer 1

2

One (among many) option would be to add a new index level:

In [49]: df = df.set_index(df.groupby(level=0).cumcount().add(1) \
                             .to_frame('num')['num'],
                           append=True)

In [50]: df
Out[50]:
          colx  coly
name num
foo  1       1     5
     2       2     6
bar  1       3     7
     2       4     8

UPDATE: don't be confused by the way Pandas shows duplicates in the multi-indices:

if we select all values of the name level of the multi-index we will still see the duplicates:

In [51]: df.index.get_level_values(0)
Out[51]: Index(['foo', 'foo', 'bar', 'bar'], dtype='object', name='name')

It's just the way Pandas represents duplicates in the multi-index. We can switch off this display option:

In [53]: pd.options.display.multi_sparse = False

In [54]: df
Out[54]:
          colx  coly
name num
foo  1       1     5
foo  2       2     6
bar  1       3     7
bar  2       4     8

In [55]: pd.options.display.multi_sparse = True

In [56]: df
Out[56]:
          colx  coly
name num
foo  1       1     5
     2       2     6
bar  1       3     7
     2       4     8

PS this option doesn't change index values and it affects the representaion only for multi-indices

Sign up to request clarification or add additional context in comments.

3 Comments

this works, but shouldn't pandas have a less convoluted way of achieving the same? Besides it creates a multi-index.
@PedroA, can you explain bit more - what are you trying to achieve? How are you going to use the index? Is it important to preserve existing index values? Can we add a number to the index values, so that they'll become: ['foo1','foo2','bar1','bar2', etc.] - would it be an option for you? As you see there might be many different solutions, but we need to know what are you trying to achieve...
Sorry, I'm still learning pandas, but I thought the resulting DF would just have the index of the column name. You now added a new index num. I believe this must be it but could you expand a little bit why is so in your answer?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.