41

I have a pandas dataframe and it has some columns. I want to drop columns if they are not presented at a list.

pandas dataframe columns:

list(pandas_df.columns.values)

Result:

['id', 'name' ,'region', 'city']

And my expected column names:

final_table_columns = ['id', 'name', 'year']

After x operations result should be:

list(pandas_df.columns.values)

['id', 'name']

4 Answers 4

47

Use Index.intersection to find the intersection of an index and a list of (column) labels:

pandas_df = pandas_df[pandas_df.columns.intersection(final_table_columns)]
Sign up to request clarification or add additional context in comments.

2 Comments

Hi @unutbu, how would you go about doing the same thing bu with case insensitive string matching ? Thank you !
Use Index.intersection(), if the number of columns to keep is small, compared to those to exclude. In the opposite case (exclude only a few of a large number) Index.difference() may be the better choice. White-listing vs. black-listing.
35

You could use a list comprehension creating all column-names to drop()

final_table_columns = ['id', 'name', 'year']
df = df.drop(columns=[col for col in df if col not in final_table_columns])

To do it in-place:

df.drop(columns=[col for col in df if col not in final_table_columns], inplace=True)

3 Comments

why not simply df = df[final_table_columns]
Btw, df = df[final_table_columns] is faster than drop way.
i think this is the best answer. @Rexovas solution will fail if final_table_columns are not in df.
18

To do it in-place, consider Index.difference. This was not documented in any prior answer.

df.drop(columns=df.columns.difference(final_table_columns), inplace=True)

To create a new dataframe, Index.intersection also works.

df_final = df.drop(columns=df.columns.difference(final_table_columns)

df_final = df[df.columns.intersection(final_table_columns)]  # credited to unutbu

1 Comment

Can you make this a soft drop, as in, if it contains one of the words in the column then you should keep the columns?
1

You could also accomplish this much more simply

df = df[final_table_columns]

2 Comments

This will give a key error, because year is not in the DF's columns.
Yes I see you’re correct now - but the question remains. Why then would he include that in his final columns? He can simply change the list to remove it. Much simpler.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.