0

I have a dataframe with multiple columns as t_orno,t_pono, t_sqnb ,t_pric,....and so on(it's a table with multiple columns). The 2nd dataframe contains certain name of the columns from 1st dataframe. Eg.

columnname
t_pono
t_pric
:
:

I need to select only those columns from the 1st dataframe whose name is present in the 2nd. In above example t_pono,t_pric.

How can this be done?

1
  • seems fairly easy. What have you tried? Commented Jan 31, 2023 at 17:35

1 Answer 1

1

Let's say you have the following columns (which can be obtained using df.columns, which returns a list):

df1_cols = ["t_orno", "t_pono", "t_sqnb", "t_pric"]
df2_cols = ["columnname", "t_pono", "t_pric"]

To get only those columns from the first dataframe that are present in the second one, you can do set intersection (and I cast it to a list, so it can be used to select data):

list(set(df1_cols).intersection(df2_cols))

And we get the result:

["t_pono", "t_pric"]

To put it all together and select only those columns:

select_columns = list(set(df1_cols).intersection(df2_cols))

new_df = df1.select(*select_columns)
Sign up to request clarification or add additional context in comments.

1 Comment

Little twisting in the file and it worked...thanks :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.