1

I have a dataframe:

      var1  var2  var3  var4
Id#                         
1001     Y     Y     Y     Y
1002     N     N     N     N
1003     N     N     Y     N
1003     Y     Y     Y     N

I want to create a new column called Small, where if any var=Y then Small is equal to N

      var1  var2  var3  var4  Small
Id#                         
1001     Y     Y     Y     Y      N       
1002     N     N     N     N      Y 
1003     N     N     Y     N      N
1003     Y     Y     Y     N      N

My tried solution: I have created a function called is_small that flips to 'N' anytime there a column in a row is 'Y'

def is_small(row, *cols):
    _small = 'Y'
    for col in cols:
        if col == 'Y':
            _small = 'N'
    return _small

and apply it to my dataset:

all_data['Small'] = all_data.apply(lambda row: is_small(row,
                                                        'var1',
                                                        'var2',
                                                        'var3',
                                                        'var4'),
                                   axis=1)

However Small just all return as 'Y' and I'm not sure why.

2 Answers 2

1

You are almost there. but every time you pass literal 'var1', 'var2'... into is_small, that's why it always return 'Y'. You should pass row['var1'], row['var2']....

all_data['Small'] = all_data.apply(lambda row: is_small(row,
    row['var1'],
    row['var2'],
    row['var3'],
    row['var4']),
axis=1)
Sign up to request clarification or add additional context in comments.

1 Comment

So what I was doing was passing my function the list of *args ['var1', 'var2', 'var3', 'var4']?
1

You can use numpy.where (vectorized if/else):

df['small'] = pd.np.where(df.eq('Y').any(1), 'N', 'Y')

df
#    var1 var2  var3  var4  small
#Id                 
#1001   Y    Y     Y    Y   N
#1002   N    N     N    N   Y
#1003   N    N     Y    N   N
#1003   Y    Y     Y    N   N

2 Comments

I have other columns besides var1, var2, var3, var4., that may have Y/N responses.
You can select these four columns manually before constructing the boolean condition if that is the case. pd.np.where(df[['var1', 'var2', 'var3', 'var4']].eq('Y').any(1), 'N', 'Y')

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.