Index pandas DataFrame by column numbers, when column names are integers

Question

I am trying to keep just certain columns of a DataFrame, and it works fine when column names are strings:

In [2]: import numpy as np

In [3]: import pandas as pd

In [4]: a = np.arange(35).reshape(5,7)

In [5]: df = pd.DataFrame(a, ['x', 'y', 'u', 'z', 'w'], ['a', 'b', 'c', 'd', 'e', 'f', 'g'])

In [6]: df
Out[6]: 
    a   b   c   d   e   f   g
x   0   1   2   3   4   5   6
y   7   8   9  10  11  12  13
u  14  15  16  17  18  19  20
z  21  22  23  24  25  26  27
w  28  29  30  31  32  33  34

[5 rows x 7 columns]

In [7]: df[[1,3]] #No problem
Out[7]: 
    b   d
x   1   3
y   8  10
u  15  17
z  22  24
w  29  31

However, when column names are integers, I am getting a key error:

In [8]: df = pd.DataFrame(a, ['x', 'y', 'u', 'z', 'w'], range(10, 17))

In [9]: df
Out[9]: 
   10  11  12  13  14  15  16
x   0   1   2   3   4   5   6
y   7   8   9  10  11  12  13
u  14  15  16  17  18  19  20
z  21  22  23  24  25  26  27
w  28  29  30  31  32  33  34

[5 rows x 7 columns]

In [10]: df[[1,3]]

Results in:

KeyError: '[1 3] not in index'

I can see why pandas does not allow that -> to avoid mix up between indexing by column names and column numbers. However, is there a way to tell pandas that I want to index by column numbers? Of course, one solution is to convert column names to strings, but I am wondering if there is a better solution.

Jeff · Accepted Answer · 2014-11-26 21:13:32Z

23

This is exactly the purpose of iloc, see here

In [37]: df
Out[37]: 
   10  11  12  13  14  15  16
x   0   1   2   3   4   5   6
y   7   8   9  10  11  12  13
u  14  15  16  17  18  19  20
z  21  22  23  24  25  26  27
w  28  29  30  31  32  33  34

In [38]: df.iloc[:,[1,3]]
Out[38]: 
   11  13
x   1   3
y   8  10
u  15  17
z  22  24
w  29  31

answered Nov 26, 2014 at 21:13

Jeff

130k21 gold badges223 silver badges189 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Marc Over a year ago

totally missed the question. Not how to index by integer postion but how to index by name, when the name is an integer. (his example is a bit confusing, but still ..

Anurag Agarwal · Accepted Answer · 2020-07-15 02:19:40Z

11

Just convert the headers from integer to string. This should be done almost always as a best practice when working with pandas datasets to avoid surprise

df.columns = df.columns.map(str)

answered Jul 15, 2020 at 2:19

Anurag Agarwal

1071 silver badge3 bronze badges

Comments

JD Long · Accepted Answer · 2014-11-26 18:29:56Z

3

This is certainly one of those things that feels like a bug but is really a design decision (I think).

A few work around options:

rename the columns with their positions as their name:

 df.columns = arange(0,len(df.columns))

Another way is to get names from df.columns:

print df[ df.columns[[1,3]] ]
   11  13
x   1   3
y   8  10
u  15  17
z  22  24
w  29  31

I suspect this is the most appealing as it just requires adding a wee bit of code and not changing any column names.

answered Nov 26, 2014 at 18:29

JD Long

61k58 gold badges209 silver badges300 bronze badges

1 Comment

jimiclapton Over a year ago

In this example you may need to use ...= np.arange(0,len(df.columns))

Athii · Accepted Answer · 2022-11-16 21:17:34Z

0

import pandas as pd
df = pd.DataFrame(a, ['x', 'y', 'u', 'z', 'w'], range(10, 17))

#Let say you want to keep only columns 1,2 (these are locations not names)
needed_columns = [1,2]

df = df[df.columns[needed_columns]

print(df)

11  12
x   1   2
y   8   9
u   15  16
z   22  23
w   29  30

answered Nov 16, 2022 at 21:17

Athii

1302 silver badges11 bronze badges

Collectives™ on Stack Overflow

Index pandas DataFrame by column numbers, when column names are integers

4 Answers 4

1 Comment

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related