96

I have a problem where I produce a pandas dataframe by concatenating along the row axis (stacking vertically).

Each of the constituent dataframes has an autogenerated index (ascending numbers).

After concatenation, my index is screwed up: it counts up to n (where n is the shape[0] of the corresponding dataframe), and restarts at zero at the next dataframe.

I am trying to "re-calculate the index, given the current order", or "re-index" (or so I thought). Turns out that isn't exactly what DataFrame.reindex seems to be doing.


Here is what I tried to do:

train_df = pd.concat(train_class_df_list)
train_df = train_df.reindex(index=[i for i in range(train_df.shape[0])])

It failed with "cannot reindex from a duplicate axis." I don't want to change the order of my data... just need to delete the old index and set up a new one, with the order of rows preserved.

3 Answers 3

151

If your index is autogenerated and you don't want to keep it, you can use the ignore_index option. `

train_df = pd.concat(train_class_df_list, ignore_index=True)

This will autogenerate a new index for you, and my guess is that this is exactly what you are after.

Sign up to request clarification or add additional context in comments.

4 Comments

This is more direct than .reset_index(drop=True) and thus IMO preferable, but the naming is somewhat less clear
Strangely, this doesn't work for me. It runs without creating an error, but for each concatenated file the indexing starts from 0
works for me great. And I agree with Dmitri's comment that ignore_index = True option is more intuitive to explain others
This option will remove also the column names, so it's up to you! stackoverflow.com/a/43406062/7654451
80

After vertical concatenation, if you get an index of [0, n) followed by [0, m), all you need to do is call reset_index:

train_df.reset_index(drop=True)

(you can do this in place using inplace=True).


import pandas as pd

>>> pd.concat([
    pd.DataFrame({'a': [1, 2]}), 
    pd.DataFrame({'a': [1, 2]})]).reset_index(drop=True)
    a
0   1
1   2
2   1
3   2

Comments

11

This should work:

train_df.reset_index(inplace=True, drop=True) 

Set drop to True to avoid an additional column in your dataframe.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.