2

I want to match certain criteria across multiple columns . If Criteria matches return the column name:

my demo df is :

df = pd.DataFrame({"mtc": ["A", "B", "C", "D"],
                     "C1": ["A", "A", "A", "C"],
                    "C2": ["X", "B", "A", "C"],
                    "C3": ["Y", "D", "A", "D"],
                    "C4": ["Z", "D", "C", "C"]})


    mtc C1  C2  C3  C4
0   A   A   X   Y   Z
1   B   A   B   D   D
2   C   A   A   A   C
3   D   C   C   D   C

here i want to match values from mtc column to columns['C1', 'C2', 'C3', 'C4'].

My expected output in Result Column as:

   mtc  Result  C1  C2  C3  C4
0   A     C1    A   X   Y   Z
1   B     C2    A   B   D   D
2   C     C4    A   A   A   C
3   D     C3    C   C   D   C

2 Answers 2

3

Solution

m = df.filter(like='C').eq(df['mtc'], axis=0)
df['Result'] = m.idxmax(1).mask(~m.any(1))

Explanations

Filter the C like columns then compare theses columns with the mtc column along axis=0 to create a boolean mask.

>>> m
      C1     C2     C3     C4
0   True  False  False  False
1  False   True  False  False
2  False  False  False   True
3  False  False   True  False

Now we can use idxmax along axis=1 to get the name of column containing the first True value in the above boolean mask. Further we could also mask the column name in case there is no match found.

>>> m.idxmax(1)

0    C1
1    C2
2    C4
3    C3
dtype: object

>>> df

  mtc C1 C2 C3 C4 Result
0   A  A  X  Y  Z     C1
1   B  A  B  D  D     C2
2   C  A  A  A  C     C4
3   D  C  C  D  C     C3
Sign up to request clarification or add additional context in comments.

8 Comments

Your answer works but lags speed 5.08 ms ± 186 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
@SurajS What is the datasize on which you have tested the answer? Try increasing the datasize and test again i'm sure you will see the performance gains by using vectorized methods. Further you can skip the mask part in case there is always a match found between the mtc and C like columns.
hay how to replace .eq(df['mtc'], axis=0) to some thing like <= like no exact match or >= & <=
@SurajS Not sure what you mean by no exact match but I guess you can try .le for <= similarly .ge for >=
yes thats what i want but what if both required like >= & <= for comparison
|
1
df = pd.DataFrame({"mtc": ["A", "B", "C", "D"],
                     "C1": ["A", "A", "A", "C"],
                    "C2": ["X", "B", "A", "C"],
                    "C3": ["Y", "D", "A", "D"],
                    "C4": ["Z", "D", "C", "C"]})
import numpy as np
df['result'] = np.NaN
def find_col(x):
    for col in x.index[1:-1]:
        if x['mtc'] == x[col]:
            return col
df['result']  = df.apply(lambda x: find_col(x), axis=1)

This will give output -

mtc C1 C2 C3 C4 result
A A X Y Z C1
B A B D D C2
C A A A C C4
D C C D C C3

1 Comment

your answer works also have speed compared to other 2.96 ms ± 205 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.