2

In the dataframe I have there are so many columns of which I only need a few. For instance

Col_A      Col_B      Col_C      Col_D      Col_E      Col_F
...        ...        ...      ...      ...      ...      ...        

I only need columns Col_A, Col_C and Col_E so currently what I do is df = df[['Col_A', 'Col_C', 'Col_E']] but the issue here is that not always there will columns A, C and E maybe all these wont be present. So I need if Col_A is in df.columns add to the df and so on. Is there any simple method to do this ? rather than so many if? Now if a Column is missing I get a KeyError: "['Col_C'] not in index

4 Answers 4

1

Use Index.intersection:

df[df.columns.intersection(['Col_A','Col_A','Col_E'], sort=False)]
Sign up to request clarification or add additional context in comments.

4 Comments

This is way faster than other answers. Can this be used in groupby like df.groupby(by=[df.columns.intersection(['Col_A','Col_A','Col_E'], as_index=False) ?
@Derik81 - Very similar, like df.groupby(by=df.columns.intersection(['Col_A','Col_A','Col_E']), as_index=False)
I think there is some issue by using this. it gives a ValueError: Grouper and axis must be same length. When I checked the result of df.columns.intersection(['Col_A','Col_A','Col_E']) it prints as Index(['Col_A','Col_A','Col_E'], dtype='object'). It has no issue when I do .groupby(by=['Col_A','Col_A','Col_E'] but when I input as you gave it throws an error.
@Derik81 - I forget, need convert to list df.groupby(by=df.columns.intersection(['Col_A','Col_A','Col_E']).tolist(), as_index=False)
1

You can use loc and isin

df.loc[:, df.columns.isin(['a','b','c'])]

Comments

0

You could use a list comprehension. For example:

test_columns = ['Col_A', 'Col_C', 'Col_E']
df = df[[c for c in test_columns if c in df.columns]]

Comments

0

From what I have interpreted, you could create a copy of df, store it in another variable and then 'drop' the columns you don't require,

df_copy = df.copy()
df = df.drop(['Col_A', 'Col_C', 'Col_E'], axis = 1)

# If you want to add other columns to the df
df['Col_B'] = df_copy['Col_B']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.