85

The index that I have in the dataframe (with 30 rows) is of the form:

Int64Index([171, 174, 173, 172, 199, …, 175, 200])

The index is not strictly increasing because the data frame is the output of a sort().

I want to add a column which is the series:

[1, 2, 3, 4, 5, …, 30]

How should I go about doing that?

0

4 Answers 4

178

How about:

df['new_col'] = range(1, len(df) + 1)

Alternatively if you want the index to be the ranks and store the original index as a column:

df = df.reset_index()
Sign up to request clarification or add additional context in comments.

2 Comments

This answer got me halfway to where I wanted since I already had an index that I wanted replaced. In such a case you can complement with: df = df.reset_index(drop=True)
Using np.arange instead of native range, like df['new_col'] = np.arange(1, df.shape[0] + 1) should speed up the runtime, especially when dealing with large datasets.
113

I stumbled on this question while trying to do the same thing (I think). Here is how I did it:

df['index_col'] = df.index

You can then sort on the new index column, if you like.

2 Comments

No, that would be unsorted.
more dynamic df[df.index.name] = df.index
23

How about this:

from pandas import *

idx = Int64Index([171, 174, 173])
df = DataFrame(index = idx, data =([1,2,3]))
print df

It gives me:

     0
171  1
174  2
173  3

Is this what you are looking for?

2 Comments

Almost. So, in sum, I need to create another data frame which contains the rank/position of the row. And then, I need to join these.
Yes you combine add this df to your existing dataframe by using df.combine_first(df2)
9

The way to do that would be this:

Resetting the index:

df.reset_index(drop=True, inplace=True)

Sorting an index:

df.sort_index(inplace=True)

Setting a new index from a column:

df.set_index('column_name', inplace=True)

Setting a new index from a range:

df.index = range(1, 31, 1) #a range starting at one ending at 30 with a stepsize of 1.

Sorting a dataframe based on column value:

df.sort_values(by='column_name', inplace=True)

Reassigning variables works as-well:

df=df.reset_index(drop=True)
df=df.sort_index()
df=df.set_index('column_name')
df.index = range(1, 31, 1) #a range starting at one ending at 30 with a stepsize of 1.
df=df.sort_values(by='column_name')

1 Comment

I don't think you answered: I want to add a column which is the [sort order] series ie set a column to the index.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.