4

I know this question may seem trivial, but I can't find the solution anywhere. I have a really large pandas dataframe df that looks something like this:

                                            conference     IF2013  AR2013
0                                            HOTMOBILE  16.333333   31.50
1                                                 FOGA  13.772727   60.00
2                                              IEA/AIE  10.433735   28.20
3    IEEE Real-Time and Embedded Technology and App...  10.250000   29.00
4                  Symposium on Computational Geometry   9.880342   35.00
5                                                 WISA   9.693878   43.60
6                                                 ICMT   8.750000   22.00
7                                              Haskell   8.703704   39.00

I would like to add an extra column at the end that orders it 1,2,3,4, etc. So it looks like this:

                                               conference     IF2013    AR2013  Ranking 
    0                                            HOTMOBILE  16.333333   31.50   1  
    1                                                 FOGA  13.772727   60.00   2
    2                                              IEA/AIE  10.433735   28.20   3
    3    IEEE Real-Time and Embedded Technology and App...  10.250000   29.00   4  

I can't seem to figure out how to add a filled extra column that just put a Series of consecutive numbers.

3 Answers 3

6

You can try:

df['rank'] = df.index + 1

print df
#                                          conference     IF2013  AR2013  rank
#0                                          HOTMOBILE  16.333333    31.5     1
#1                                               FOGA  13.772727    60.0     2
#2                                            IEA/AIE  10.433735    28.2     3
#3  IEEE Real-Time and Embedded Technology and App...  10.250000    29.0     4
#4                Symposium on Computational Geometry   9.880342    35.0     5
#5                                               WISA   9.693878    43.6     6
#6                                               ICMT   8.750000    22.0     7
#7                                            Haskell   8.703704    39.0     8

Or use rank with parameter ascending=False:

df['rank'] = df['conference'].rank(ascending=False)
print df
#                                          conference     IF2013  AR2013  rank
#0                                          HOTMOBILE  16.333333    31.5     1
#1                                               FOGA  13.772727    60.0     2
#2                                            IEA/AIE  10.433735    28.2     3
#3  IEEE Real-Time and Embedded Technology and App...  10.250000    29.0     4
#4                Symposium on Computational Geometry   9.880342    35.0     5
#5                                               WISA   9.693878    43.6     6
#6                                               ICMT   8.750000    22.0     7
#7                                            Haskell   8.703704    39.0     8
Sign up to request clarification or add additional context in comments.

Comments

2

I guess you are looking for the rank function:

df['rank'] = df['IF2013'].rank()

This way your result will be independant of the index.

Comments

1

You could add column with range:

df['Ranking'] = range(1, len(df) + 1)

Example:

import pandas as pd
from io import StringIO

data = """
                                        conference     IF2013  AR2013
                                        HOTMOBILE  16.333333   31.50
                                             FOGA  13.772727   60.00
                                          IEA/AIE  10.433735   28.20
IEEE Real-Time and Embedded Technology and App...  10.250000   29.00
              Symposium on Computational Geometry   9.880342   35.00
                                             WISA   9.693878   43.60
                                             ICMT   8.750000   22.00
                                          Haskell   8.703704   39.00

"""

df = pd.read_csv(StringIO(data), sep=' \s+')

df['Ranking'] = range(1, len(df) + 1)

In [183]: df
Out[183]:
                                          conference     IF2013  AR2013        Ranking  
0                                          HOTMOBILE  16.333333    31.5            1
1                                               FOGA  13.772727    60.0            2
2                                            IEA/AIE  10.433735    28.2            3
3  IEEE Real-Time and Embedded Technology and App...  10.250000    29.0            4
4                Symposium on Computational Geometry   9.880342    35.0            5
5                                               WISA   9.693878    43.6            6
6                                               ICMT   8.750000    22.0            7
7                                            Haskell   8.703704    39.0            8

EDIT

Benchmarking:

In [202]: %timeit df['rank'] = range(1, len(df) + 1)
10000 loops, best of 3: 127 us per loop

In [203]: %timeit df['rank'] = df.AR2013.rank(ascending=False)
1000 loops, best of 3: 248 us per loop

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.