5

I have a DataFrame with columns [A, B, C, D, E, F, G, H].

An index has been made with columns [D, G, H]:

>>> print(dgh_columns)
Index(['D', 'G', 'H'], dtype='object')

How can I retrieve the original DataFrame without the columns D, G, H ?

Is there an index subset operation?

Ideally, this would be:

df[df.index - dgh_columns]

But this doesn't seem to work

2 Answers 2

5

I think you can use Index.difference:

df[df.columns.difference(dgh_columns)]

Sample:

df = pd.DataFrame({'A':[1,2,3],
                   'B':[4,5,6],
                   'C':[7,8,9],
                   'D':[1,3,5],
                   'E':[7,8,9],
                   'F':[1,3,5],
                   'G':[5,3,6],
                   'H':[7,4,3]})

print (df)
   A  B  C  D  E  F  G  H
0  1  4  7  1  7  1  5  7
1  2  5  8  3  8  3  3  4
2  3  6  9  5  9  5  6  3

dgh_columns = pd.Index(['D', 'G', 'H'])
print (df[df.columns.difference(dgh_columns)])
   A  B  C  E  F
0  1  4  7  7  1
1  2  5  8  8  3
2  3  6  9  9  5

Numpy solution with numpy.setxor1d or numpy.setdiff1d:

dgh_columns = pd.Index(['D', 'G', 'H'])
print (df[np.setxor1d(df.columns, dgh_columns)])
   A  B  C  E  F
0  1  4  7  7  1
1  2  5  8  8  3
2  3  6  9  9  5

dgh_columns = pd.Index(['D', 'G', 'H'])
print (df[np.setdiff1d(df.columns, dgh_columns)])
   A  B  C  E  F
0  1  4  7  7  1
1  2  5  8  8  3
2  3  6  9  9  5
Sign up to request clarification or add additional context in comments.

5 Comments

Hum. Makes sense, however I get IndexError: indices are out-of-bounds
Please check my sample, I think I do typo before.
Okay, that's because I did perform column insertion/removal after having initialized dgh_columns
Anyway I'll accept your answer because it's definitely the answer :) I just have to figure out the details with my code
Thank you, glad can help you.
2

use drop

df.drop(list('DGH'), axis=1)

df = pd.DataFrame({'A':[1,2,3],
                   'B':[4,5,6],
                   'C':[7,8,9],
                   'D':[1,3,5],
                   'E':[7,8,9],
                   'F':[1,3,5],
                   'G':[5,3,6],
                   'H':[7,4,3]})

df.drop(list('DGH'), 1)

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.