Pandas Match Column Value to Multiple Columns in Same Dataframe

Question

I have a Pandas Dataframe with thousands of columns. A snippet of the Pandas Dataframe is represented through the following:

    import numpy as np
    import pandas as pd
    DataFrame1=pd.DataFrame([ ['A1X1' , 'J1', 'Q4', 'ND', 'J1'],
                              ['A1X2' , 'X1', '01', 'Q2', 'JK'],
                              ['A1X3' , 'R6', 'R6', '01', 'A5'],
                              ['A1X4' , 'J1', 'R6', 'A5', 'B6']],
                            columns=['ID', 'SearchValue', 'Check 1', 'Check 2', 'Check 60000'])
    DataFrame1.head(4)

DataFrame1-IMAGE1

I am trying to concisely determine whether 'SearchValue' is in 'Check1', 'Check2', and all the other columns up through 'Check 60000', and if it does exist, returning 'SearchValue' in a new 'FinalResult' column with a default to 'XX' when false.

I know I can utilize something like the below code to accomplish this task, but I would need to write the code out 60,000 times to cover all of the columns. This is simply unacceptable considering the amount of other 'SearchValue' columns that exist in the actual DataFrame that could push the program into millions of lines of code very quickly. Is there any better way to accomplish this?

    Condition=[
               DataFrame1['SearchValue'] .eq (DataFrame1 [ 'Check 1' ]) 
               | DataFrame1['SearchValue'] .eq (DataFrame1 [ 'Check 2' ]) 
               | DataFrame1['SearchValue'] .eq (DataFrame1 [ 'Check 60000' ]) 
              ]
    Choice=[ 
            DataFrame1['SearchValue']
           ]
    DataFrame1['FinalResult']=numpy.select(Condition,Choice,default='XX')
         
    DataFrame1.head(4)

DataFrame1-IMAGE2

Thanks in advance!

It_is_Chris · Accepted Answer · 2020-10-07 19:53:01Z

3

# create a mask using isin and any
mask = DataFrame1[DataFrame1.columns[2:]].isin(DataFrame1['SearchValue']).any(1)
# use numpy.where to assign values
DataFrame1['FinalResult'] = np.where(mask, DataFrame1['SearchValue'], 'XX')

     ID SearchValue Check 1 Check 2 Check 60000 FinalResult
0  A1X1          J1      Q4      ND          J1          J1
1  A1X2          X1      01      Q2          JK          XX
2  A1X3          R6      R6      01          A5          R6
3  A1X4          J1      R6      A5          B6          XX

answered Oct 7, 2020 at 19:53

It_is_Chris

14.2k3 gold badges27 silver badges45 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Quang Hoang Over a year ago

I'm not sure if isin is a valid choice here.

PythonPandaBear Over a year ago

Unbelievable. I spent so many hours trying to figure this out and you just immediately figured it out in a concise manner that worked perfectly. Thank you @It_is_Chris!!

It_is_Chris Over a year ago

@QuangHoang "The result will only be true at a location if all the labels match. If values is a Series, that’s the index," which should work for this case since the OP wants to compare row wise. Is there something I am missing or not thinking of?

Quang Hoang Over a year ago

I also understand op wants to search row-wise. What happens if you change the first row search value to J2?

It_is_Chris Over a year ago

@QuangHoang When I change the SearchValue of index 0 to J2 i get the correct result: array(['XX', 'XX', 'R6', 'XX'], dtype=object)

|

sammywemmy · Accepted Answer · 2020-10-07 19:55:56Z

0

I would go into numpy world and do the comparism :

box = df.to_numpy()

box


array([['A1X1', 'J1', 'Q4', 'ND', 'J1'],
       ['A1X2', 'X1', '01', 'Q2', 'JK'],
       ['A1X3', 'R6', 'R6', '01', 'A5'],
       ['A1X4', 'J1', 'R6', 'A5', 'B6']], dtype=object)

compare = box[:, 1, np.newaxis] == box[:, 2:]

Now assign with numpy where

df['Final Result'] = np.where(np.sum(compare, axis=1), box[:, 1], "XX")


    ID  SearchValue Check 1 Check 2 Check 60000 Final Result
0   A1X1    J1        Q4      ND      J1           J1
1   A1X2    X1        01      Q2      JK           XX
2   A1X3    R6        R6      01      A5           R6
3   A1X4    J1        R6      A5      B6           XX

This link was helpful.

answered Oct 7, 2020 at 19:55

sammywemmy

28.9k4 gold badges21 silver badges35 bronze badges

Comments

halfer · Accepted Answer · 2020-10-07 21:49:57Z

0

You want to search for the values in each row:

mask = DataFrame1.filter(like='Check').eq(DataFrame1['SearchValue'], axis=0).any(1)

DataFrame1['FinalResult'] = DataFrame1['SerchValue'].where(mask, 'XX')

Output:

     ID SearchValue Check 1 Check 2 Check 60000 FinalResult
0  A1X1          J1      Q4      ND          J1          J1
1  A1X2          X1      01      Q2          JK          XX
2  A1X3          R6      R6      01          A5          R6
3  A1X4          J1      R6      A5          B6          XX

edited Oct 7, 2020 at 21:49

halfer

20.2k20 gold badges110 silver badges207 bronze badges

answered Oct 7, 2020 at 19:57

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

4 Comments

PythonPandaBear Over a year ago

This worked perfectly as well! Really like the .filter addition with the .eq.

PythonPandaBear Over a year ago

I played with the DataFrame1 creation using different values and achieved the same results. I will keep on testing it as I read both of your comments and if I find that your answer technically is more correct I will pin it as such. Really appreciate your help!

Quang Hoang Over a year ago

@PythonPandaBear forget what I said. Chris' answer is certainly valid.

PythonPandaBear Over a year ago

Awesome! You and @It_is_Chris just taught me more about the application of .where and .isin than any documentation or other resource has. Can't thank you all enough for the prompt and knowledgeable replies :)

Collectives™ on Stack Overflow

Pandas Match Column Value to Multiple Columns in Same Dataframe

3 Answers 3

6 Comments

Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related