Pandas recalculate index after a concatenation

Question

I have a problem where I produce a pandas dataframe by concatenating along the row axis (stacking vertically).

Each of the constituent dataframes has an autogenerated index (ascending numbers).

After concatenation, my index is screwed up: it counts up to n (where n is the shape[0] of the corresponding dataframe), and restarts at zero at the next dataframe.

I am trying to "re-calculate the index, given the current order", or "re-index" (or so I thought). Turns out that isn't exactly what DataFrame.reindex seems to be doing.

Here is what I tried to do:

train_df = pd.concat(train_class_df_list)
train_df = train_df.reindex(index=[i for i in range(train_df.shape[0])])

It failed with "cannot reindex from a duplicate axis." I don't want to change the order of my data... just need to delete the old index and set up a new one, with the order of rows preserved.

ilmarinen · Accepted Answer · 2016-02-20 19:51:11Z

151

If your index is autogenerated and you don't want to keep it, you can use the ignore_index option. `

train_df = pd.concat(train_class_df_list, ignore_index=True)

This will autogenerate a new index for you, and my guess is that this is exactly what you are after.

answered Feb 20, 2016 at 19:51

ilmarinen

5,9173 gold badges19 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Dmitri Over a year ago

This is more direct than .reset_index(drop=True) and thus IMO preferable, but the naming is somewhat less clear

NeStack Over a year ago

Strangely, this doesn't work for me. It runs without creating an error, but for each concatenated file the indexing starts from 0

veg2020 Over a year ago

works for me great. And I agree with Dmitri's comment that ignore_index = True option is more intuitive to explain others

Tito Sanz Over a year ago

This option will remove also the column names, so it's up to you! stackoverflow.com/a/43406062/7654451

Ami Tavory · Accepted Answer · 2016-02-20 19:53:27Z

80

After vertical concatenation, if you get an index of [0, n) followed by [0, m), all you need to do is call reset_index:

train_df.reset_index(drop=True)

(you can do this in place using inplace=True).

import pandas as pd

>>> pd.concat([
    pd.DataFrame({'a': [1, 2]}), 
    pd.DataFrame({'a': [1, 2]})]).reset_index(drop=True)
    a
0   1
1   2
2   1
3   2

edited Feb 20, 2016 at 19:53

answered Feb 20, 2016 at 19:46

Ami Tavory

76.7k13 gold badges152 silver badges196 bronze badges

Comments

Mike Müller · Accepted Answer · 2016-02-20 19:46:19Z

11

This should work:

train_df.reset_index(inplace=True, drop=True)

Set drop to True to avoid an additional column in your dataframe.

answered Feb 20, 2016 at 19:46

Mike Müller

86k21 gold badges174 silver badges165 bronze badges

Collectives™ on Stack Overflow

Pandas recalculate index after a concatenation

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related