1

I want to run query on multiple columns (Data1, Data2) to find the values in other columns (Column1, Column2, Column3) if it exists in any of these columns (considering these columns have empty cells also)

My data looks like this:

Column1  Column2  Column3  Data1    Data2
sunday   monday            sunday   friday
sunday   friday   tuesday  friday   sunday
sunday            monday   thursday tuesday

I want the output like:

Column1  Column2  Column3  Data1    Exist_Data1   Data2      Exist_Data2
sunday   monday            sunday   Yes           friday     Yes
sunday   friday   tuesday  friday   Yes           wednesday  No
sunday            monday   thursday  No           tuesday    Yes

I have tried the command below:

pd.Series(['A', 'B']).isin(df.columns).all() (as mentioned in the Check if multiple columns exist in a df

but it check in all the columns. I have modified the df.columns but doesn't seem to work either.

*******UPDATE ********

Editing after trying the code:

As can be seen in the image below, there are some entries in Data1 and Data2 that exists in the other columns but still says "No". (There are no whitespace characters - all are removed using strip()" Double checked after writing the CSV and checking for spaces.

enter image description here

2
  • Answer was edited, Last Exist_Data1 is not correctly True ? Commented Jun 17, 2021 at 9:35
  • No, in the image Data1 has "A1BG-AS1" that exists in Column1 twice but still it says "No" and even the "NCRNA" says "No" but it do exists in Column2 Commented Jun 17, 2021 at 9:49

1 Answer 1

1

You need Series.isin with flatten all values of columns to one long list:

cols = ['Column1','Column2','Column3']
v = np.ravel(df[cols])

df['Exist_Data1'] = np.where(df['Data1'].isin(v),'Yes', 'No')
df['Exist_Data2'] = np.where(df['Data2'].isin(v),'Yes', 'No')

#alternative
df[['Exist_Data1','Exist_Data2']] = np.where(df[['Data1', 'Data2']].isin(v),'Yes', 'No')


print (df)
  Column1 Column2  Column3     Data1      Data2 Exist_Data1 Exist_Data2
0  sunday  monday      NaN    sunday     friday         Yes         Yes
1  sunday  friday  tuesday    friday  wednesday         Yes          No
2  sunday     NaN   monday  thursday    tuesday          No         Yes
Sign up to request clarification or add additional context in comments.

3 Comments

This one also has an error as in Data1 "thursday" does not exists in any of the columns but it says "Yes" in Exist_Data1. I am not sure why it's happening
Thank you so much.... The first one worked perfectly as needed. Can you please elaborate why the errors were happening earlier?
@aso - Problem was I compare df[cols] with df['Data1'] instead df['Data1'] with df[cols].

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.