Subsetting index from Pandas DataFrame

Question

I have a DataFrame with columns [A, B, C, D, E, F, G, H].

An index has been made with columns [D, G, H]:

>>> print(dgh_columns)
Index(['D', 'G', 'H'], dtype='object')

How can I retrieve the original DataFrame without the columns D, G, H ?

Is there an index subset operation?

Ideally, this would be:

df[df.index - dgh_columns]

But this doesn't seem to work

jezrael · Accepted Answer · 2016-11-07 14:26:38Z

5

I think you can use Index.difference:

df[df.columns.difference(dgh_columns)]

Sample:

df = pd.DataFrame({'A':[1,2,3],
                   'B':[4,5,6],
                   'C':[7,8,9],
                   'D':[1,3,5],
                   'E':[7,8,9],
                   'F':[1,3,5],
                   'G':[5,3,6],
                   'H':[7,4,3]})

print (df)
   A  B  C  D  E  F  G  H
0  1  4  7  1  7  1  5  7
1  2  5  8  3  8  3  3  4
2  3  6  9  5  9  5  6  3

dgh_columns = pd.Index(['D', 'G', 'H'])
print (df[df.columns.difference(dgh_columns)])
   A  B  C  E  F
0  1  4  7  7  1
1  2  5  8  8  3
2  3  6  9  9  5

Numpy solution with numpy.setxor1d or numpy.setdiff1d:

dgh_columns = pd.Index(['D', 'G', 'H'])
print (df[np.setxor1d(df.columns, dgh_columns)])
   A  B  C  E  F
0  1  4  7  7  1
1  2  5  8  8  3
2  3  6  9  9  5

dgh_columns = pd.Index(['D', 'G', 'H'])
print (df[np.setdiff1d(df.columns, dgh_columns)])
   A  B  C  E  F
0  1  4  7  7  1
1  2  5  8  8  3
2  3  6  9  9  5

edited Nov 7, 2016 at 14:26

answered Nov 7, 2016 at 14:18

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Jivan Over a year ago

Hum. Makes sense, however I get IndexError: indices are out-of-bounds

jezrael Over a year ago

Please check my sample, I think I do typo before.

Jivan Over a year ago

Okay, that's because I did perform column insertion/removal after having initialized dgh_columns

Jivan Over a year ago

Anyway I'll accept your answer because it's definitely the answer :) I just have to figure out the details with my code

jezrael Over a year ago

Thank you, glad can help you.

piRSquared · Accepted Answer · 2016-11-07 14:27:20Z

2

use drop

df.drop(list('DGH'), axis=1)

df = pd.DataFrame({'A':[1,2,3],
                   'B':[4,5,6],
                   'C':[7,8,9],
                   'D':[1,3,5],
                   'E':[7,8,9],
                   'F':[1,3,5],
                   'G':[5,3,6],
                   'H':[7,4,3]})

df.drop(list('DGH'), 1)

answered Nov 7, 2016 at 14:27

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Collectives™ on Stack Overflow

Subsetting index from Pandas DataFrame

2 Answers 2

5 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related