1

I found a similar question here, but it did not help me because my case is different.

I have a huge dataframe that looks more or less like the example below:

        x   y   st  mt  ast sr  c7  z   w
    0   mt      2   1   4   2   2   a   yes
    1       b   3   3   3   3   3       yes
    2           1   1   2   4   3       yes
    3       d   3   3   1   2   4   d   
    4       e   2   3   2   1   4       
    5   nlp     5   5   5   5   5   f   yes

my goal is to create another column named "other_flagged" based on the following condition:

'if columns x, y and z are NOT filled with a string, but w has a "yes", fill column "other_flagged" with the string 'flagged'. So, my desired output would be:

        x   y   st  mt  ast sr  c7  z   w   other flagged
    0   mt      2   1   4   2   2   a   yes     
    1       b   3   3   3   3   3       yes    
    2           1   1   2   4   3       yes     flagged
    3       d   3   3   1   2   4   d   
    4       e   2   3   2   1   4       
    5   nlp     5   5   5   5   5   f   yes

in this case, only row 2 receives the string "flagged" because x, y and z are empty but column w has the string 'yes'. I tried the code below but the new column created is being completely populated with the string 'flagged' but not based on the conditions:

 survey.loc[(survey['x'] != '') & (survey['y'] !='') & (survey['z'] != '') & 
(survey['w'] == 'yes'), 'other_flagged'] ='flagged'

also tried:

survey.loc[(survey['x'] != '[a-zA-Z0-9-]+') & (survey['y'] != '[a-zA-Z0-9-]+') & 
(survey['z'] != '[a-zA-Z0-9-]+') & (survey['w'] == 'yes'), 'other_flagged']='flagged'

How can I achieve this goal?

2 Answers 2

1

Use if not exist values are empty strings is possible test all columns together use DataFrame.all:

m = (survey[['x', 'y', 'z']] == '').all(axis=1) & (survey['w'] == 'yes')
survey.loc[m, 'other_flagged'] ='flagged'

print (survey)
     x  y  st  mt  ast  sr  c7  z    w other_flagged
0    m  t   2   1    4   2   2  a  yes           NaN
1       b   3   3    3   3   3     yes           NaN
2           1   1    2   4   3     yes       flagged
3       d   3   3    1   2   4  d                NaN
4       e   2   3    2   1   4                   NaN
5  nlp      5   5    5   5   5  f  yes           NaN

Your solution - change != to ==:

m = (survey['x'] == '') & (survey['y'] =='') & (survey['z'] == '') & (survey['w'] == 'yes')
survey.loc[m, 'other_flagged'] ='flagged'

If values are missing values:

m = survey[['x', 'y', 'z']].isna().all(1) & (survey['w'] == 'yes')
survey.loc[m, 'other_flagged'] ='flagged'

print (survey)
     x    y  st  mt  ast  sr  c7    z    w other_flagged
0    m    t   2   1    4   2   2    a  yes           NaN
1  NaN    b   3   3    3   3   3  NaN  yes           NaN
2  NaN  NaN   1   1    2   4   3  NaN  yes       flagged
3  NaN    d   3   3    1   2   4    d  NaN           NaN
4  NaN    e   2   3    2   1   4  NaN  NaN           NaN
5  nlp  NaN   5   5    5   5   5    f  yes           NaN
Sign up to request clarification or add additional context in comments.

1 Comment

I'll try this. What does the .all(1) do?
0

Use numpy.where:

import numpy as np

survey['other flagged'] = np.where((survey['x'].eq('') & survey['y'].eq('') & survey['z'].eq('') & survey['w'].eq('yes')) , 'flagged', 'not flagged')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.