1

I have a Pandas Dataframe with thousands of columns. A snippet of the Pandas Dataframe is represented through the following:

    import numpy as np
    import pandas as pd
    DataFrame1=pd.DataFrame([ ['A1X1' , 'J1', 'Q4', 'ND', 'J1'],
                              ['A1X2' , 'X1', '01', 'Q2', 'JK'],
                              ['A1X3' , 'R6', 'R6', '01', 'A5'],
                              ['A1X4' , 'J1', 'R6', 'A5', 'B6']],
                            columns=['ID', 'SearchValue', 'Check 1', 'Check 2', 'Check 60000'])
    DataFrame1.head(4)

DataFrame1-IMAGE1

I am trying to concisely determine whether 'SearchValue' is in 'Check1', 'Check2', and all the other columns up through 'Check 60000', and if it does exist, returning 'SearchValue' in a new 'FinalResult' column with a default to 'XX' when false.

I know I can utilize something like the below code to accomplish this task, but I would need to write the code out 60,000 times to cover all of the columns. This is simply unacceptable considering the amount of other 'SearchValue' columns that exist in the actual DataFrame that could push the program into millions of lines of code very quickly. Is there any better way to accomplish this?

    Condition=[
               DataFrame1['SearchValue'] .eq (DataFrame1 [ 'Check 1' ]) 
               | DataFrame1['SearchValue'] .eq (DataFrame1 [ 'Check 2' ]) 
               | DataFrame1['SearchValue'] .eq (DataFrame1 [ 'Check 60000' ]) 
              ]
    Choice=[ 
            DataFrame1['SearchValue']
           ]
    DataFrame1['FinalResult']=numpy.select(Condition,Choice,default='XX')
         
    DataFrame1.head(4)

DataFrame1-IMAGE2

Thanks in advance!

3 Answers 3

3
# create a mask using isin and any
mask = DataFrame1[DataFrame1.columns[2:]].isin(DataFrame1['SearchValue']).any(1)
# use numpy.where to assign values
DataFrame1['FinalResult'] = np.where(mask, DataFrame1['SearchValue'], 'XX')

     ID SearchValue Check 1 Check 2 Check 60000 FinalResult
0  A1X1          J1      Q4      ND          J1          J1
1  A1X2          X1      01      Q2          JK          XX
2  A1X3          R6      R6      01          A5          R6
3  A1X4          J1      R6      A5          B6          XX
Sign up to request clarification or add additional context in comments.

6 Comments

I'm not sure if isin is a valid choice here.
Unbelievable. I spent so many hours trying to figure this out and you just immediately figured it out in a concise manner that worked perfectly. Thank you @It_is_Chris!!
@QuangHoang "The result will only be true at a location if all the labels match. If values is a Series, that’s the index," which should work for this case since the OP wants to compare row wise. Is there something I am missing or not thinking of?
I also understand op wants to search row-wise. What happens if you change the first row search value to J2?
@QuangHoang When I change the SearchValue of index 0 to J2 i get the correct result: array(['XX', 'XX', 'R6', 'XX'], dtype=object)
|
0

I would go into numpy world and do the comparism :

box = df.to_numpy()

box


array([['A1X1', 'J1', 'Q4', 'ND', 'J1'],
       ['A1X2', 'X1', '01', 'Q2', 'JK'],
       ['A1X3', 'R6', 'R6', '01', 'A5'],
       ['A1X4', 'J1', 'R6', 'A5', 'B6']], dtype=object)

compare = box[:, 1, np.newaxis] == box[:, 2:]

Now assign with numpy where

df['Final Result'] = np.where(np.sum(compare, axis=1), box[:, 1], "XX")


    ID  SearchValue Check 1 Check 2 Check 60000 Final Result
0   A1X1    J1        Q4      ND      J1           J1
1   A1X2    X1        01      Q2      JK           XX
2   A1X3    R6        R6      01      A5           R6
3   A1X4    J1        R6      A5      B6           XX

This link was helpful.

Comments

0

You want to search for the values in each row:

mask = DataFrame1.filter(like='Check').eq(DataFrame1['SearchValue'], axis=0).any(1)

DataFrame1['FinalResult'] = DataFrame1['SerchValue'].where(mask, 'XX')

Output:

     ID SearchValue Check 1 Check 2 Check 60000 FinalResult
0  A1X1          J1      Q4      ND          J1          J1
1  A1X2          X1      01      Q2          JK          XX
2  A1X3          R6      R6      01          A5          R6
3  A1X4          J1      R6      A5          B6          XX

4 Comments

This worked perfectly as well! Really like the .filter addition with the .eq.
I played with the DataFrame1 creation using different values and achieved the same results. I will keep on testing it as I read both of your comments and if I find that your answer technically is more correct I will pin it as such. Really appreciate your help!
@PythonPandaBear forget what I said. Chris' answer is certainly valid.
Awesome! You and @It_is_Chris just taught me more about the application of .where and .isin than any documentation or other resource has. Can't thank you all enough for the prompt and knowledgeable replies :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.