Looping lambda function across multiple panda columns

Question

I am struggling to loop a lambda function across multiple columns.

samp = pd.DataFrame({'ID':['1','2','3'], 'A':['1C22', '3X35', '2C77'],
                     'B': ['1C35', '2C88', '3X99'], 'C':['3X56', '2C73', '1X91']})

Essentially, I am trying to add three columns to this dataframe with a 1 if there is a 'C' in the string and a 0 if not (i.e. an 'X').

This function works fine when I apply it as a lambda function to each column individually, but I'm doing so to 40 differnt columns and the code is (I'm assuming) unnecessarily clunky:

def is_correct(str):
    correct = len(re.findall('C', str))
    return correct

samp.A_correct=samp.A.apply(lambda x: is_correct(x))
samp.B_correct=samp.B.apply(lambda x: is_correct(x))
samp.C_correct=samp.C.apply(lambda x: is_correct(x))

I'm confident there is a way to loop this, but I have been unsuccessful thus far.

Related, you may want to look into this: pandas.pydata.org/pandas-docs/stable/reference/api/…. It should be much faster than using lamdas — C_Z_
– C_Z_, Commented Oct 5, 2020 at 14:52
thank you for this. My initial aversion to this approach is that the next thing I', trying to loop is extracting the number following the 'C' or 'X', which I'm doing similarly using a lambda function. My goal is a broader conceptual understanding for how to loop the same function across multiple columns. — Nathan Silver
– Nathan Silver, Commented Oct 5, 2020 at 16:31

gtomer · Accepted Answer · 2020-10-05 16:37:50Z

1

You can iterate over the columns:

import pandas as pd
import re

df = pd.DataFrame({'ID':['1','2','3'], 'A':['1C22', '3X35', '2C77'],
                     'B': ['1C35', '2C88', '3X99'], 'C':['3X56', '2C73', '1X91']})
def is_correct(str):
    correct = len(re.findall('C', str))
    return correct

for col in df.columns:
    df[col + '_correct'] = df[col].apply(lambda x: is_correct(x))

edited Oct 5, 2020 at 16:37

answered Oct 5, 2020 at 14:50

gtomer

6,6141 gold badge14 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Nathan Silver Over a year ago

this produces a type error: 'Index' object is not callable.

gtomer Over a year ago

I have added the answer above

Quang Hoang · Accepted Answer · 2020-10-05 14:52:28Z

0

Let's try apply and join:

samp.join(samp[['A','B','C']].add_suffix('_correct')
                .apply(lambda x: x.str.contains('C'))
                .astype(int)
        )

Output:

  ID     A     B     C  A_correct  B_correct  C_correct
0  1  1C22  1C35  3X56          1          1          0
1  2  3X35  2C88  2C73          0          1          1
2  3  2C77  3X99  1X91          1          0          0

answered Oct 5, 2020 at 14:52

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Collectives™ on Stack Overflow

Looping lambda function across multiple panda columns

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related