Drop columns from Pandas dataframe if they are not in specific list

Question

I have a pandas dataframe and it has some columns. I want to drop columns if they are not presented at a list.

pandas dataframe columns:

list(pandas_df.columns.values)

Result:

['id', 'name' ,'region', 'city']

And my expected column names:

final_table_columns = ['id', 'name', 'year']

After x operations result should be:

list(pandas_df.columns.values)

['id', 'name']

unutbu · Accepted Answer · 2019-07-04 16:16:10Z

47

Use Index.intersection to find the intersection of an index and a list of (column) labels:

pandas_df = pandas_df[pandas_df.columns.intersection(final_table_columns)]

answered Jul 4, 2019 at 16:16

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Alroc Over a year ago

Hi @unutbu, how would you go about doing the same thing bu with case insensitive string matching ? Thank you !

twil Over a year ago

Use Index.intersection(), if the number of columns to keep is small, compared to those to exclude. In the opposite case (exclude only a few of a large number) Index.difference() may be the better choice. White-listing vs. black-listing.

Asclepius · Accepted Answer · 2021-01-11 22:56:28Z

35

You could use a list comprehension creating all column-names to drop()

final_table_columns = ['id', 'name', 'year']
df = df.drop(columns=[col for col in df if col not in final_table_columns])

To do it in-place:

df.drop(columns=[col for col in df if col not in final_table_columns], inplace=True)

edited Jan 11, 2021 at 22:56

Asclepius

64.7k20 gold badges188 silver badges164 bronze badges

answered Jul 4, 2019 at 16:16

ilja

2,7022 gold badges18 silver badges23 bronze badges

3 Comments

Rexovas Over a year ago

why not simply df = df[final_table_columns]

e-ruiz Over a year ago

Btw, df = df[final_table_columns] is faster than drop way.

r4bb1t Feb 6 at 22:28

i think this is the best answer. @Rexovas solution will fail if final_table_columns are not in df.

Asclepius · Accepted Answer · 2022-02-06 19:08:55Z

18

To do it in-place, consider Index.difference. This was not documented in any prior answer.

df.drop(columns=df.columns.difference(final_table_columns), inplace=True)

To create a new dataframe, Index.intersection also works.

df_final = df.drop(columns=df.columns.difference(final_table_columns)

df_final = df[df.columns.intersection(final_table_columns)]  # credited to unutbu

edited Feb 6, 2022 at 19:08

answered Jan 11, 2021 at 23:14

Asclepius

64.7k20 gold badges188 silver badges164 bronze badges

1 Comment

Max Over a year ago

Can you make this a soft drop, as in, if it contains one of the words in the column then you should keep the columns?

Rexovas · Accepted Answer · 2023-05-12 20:35:35Z

1

You could also accomplish this much more simply

df = df[final_table_columns]

answered May 12, 2023 at 20:35

Rexovas

4782 silver badges9 bronze badges

2 Comments

twil Over a year ago

This will give a key error, because year is not in the DF's columns.

Rexovas Over a year ago

Yes I see you’re correct now - but the question remains. Why then would he include that in his final columns? He can simply change the list to remove it. Much simpler.

Collectives™ on Stack Overflow

Drop columns from Pandas dataframe if they are not in specific list

4 Answers 4

2 Comments

3 Comments

1 Comment

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

3 Comments

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related