How to apply list comprehension on multiple columns in a dataframe?

Question

I'm setting up a new column of 0s and 1s in my dataframe based on the values of other columns in that row. The value should equal 1 if any of the following conditions are true, and 0 otherwise:

y_train['SEPSISPATOS']=='Yes' OR
y_train['SEPSHOCKPATOS'] == 'Yes' OR 
y_train['OTHSYSEP'] == 'Sepsis' OR
y_train['OTHSESHOCK'] == 'Septic Shock'

I've tried using list comprehensions and np.select (code below)

NSQIPdf_train = pd.read_csv("acs_nsqip_puf13_2.csv",sep=',',encoding='utf-8')
y_train = NSQIPdf_train.loc[:,('SEPSISPATOS','SEPSHOCKPATOS', 'OTHSYSEP', 'OTHSESHOCK')]

### trying list comprehension
y_train['SEPSIS_STATUS'] = [1 if (x['SEPSISPATOS'] == 'Yes') or (x['SEPSHOCKPATOS'] == 'Yes') or (x['OTHSYSEP'] == 'Sepsis') or (x['OTHSESHOCK'] == 'Septic Shock') else 0 for x in y_train]

### trying np.select
y_train['SEPSIS_STATUS'] = [1 if (x['SEPSISPATOS'] == 'Yes') or (x['SEPSHOCKPATOS'] == 'Yes') or (x['OTHSYSEP'] == 'Sepsis') or (x['OTHSESHOCK'] == 'Septic Shock') else 0 for x in y_train]
conditions=[
    (y_train['SEPSISPATOS'] == 'Yes'),
    (y_train['SEPSHOCKPATOS'] == 'Yes'),
    (y_train['OTHSYSEP'] == 'Sepsis'),
    (y_train['OTHSESHOCK'] == 'Septic Shock')]
choices=[1,1,1,1]
y_train['SEPSIS_STATUS'] = np.select(conditions,choices,default=0)

print (y_train)
print (y_train.dtypes)

Using np.select, you can see where OTHSESHOCK='Septic Shock' in row 3, SEPSIS_STATUS is still 0, where I am instead expecting 1. The string comparison does not seem to work (sample output below - I am wondering if this is because dtype of the columns is 'object' because of how Pandas reads in csv files instead of string)

       SEPSISPATOS SEPSHOCKPATOS  ...          OTHSESHOCK SEPSIS_STATUS
0            b'No'         b'No'  ...  b'No Complication'             0
1            b'No'         b'No'  ...  b'No Complication'             0
2            b'No'         b'No'  ...  b'No Complication'             0
3            b'No'         b'No'  ...     b'Septic Shock'             0
4            b'No'         b'No'  ...  b'No Complication'             0
5            b'No'         b'No'  ...  b'No Complication'             0
6            b'No'         b'No'  ...  b'No Complication'             0
7            b'No'         b'No'  ...  b'No Complication'             0
8            b'No'         b'No'  ...  b'No Complication'             0

When using list comprehension, I get the following error:

AttributeError: 'DataFrame' object has no attribute 'str'.

Finally, here are the dtypes of my variables when using print(df.dtypes)

SEPSISPATOS      object
SEPSHOCKPATOS    object
OTHSYSEP         object
OTHSESHOCK       object
SEPSIS_STATUS     int32
dtype: object

Help much appreciated. I've tried multiple ways, but am open to other suggestions or fixes. Thank you!

codeman51 · Accepted Answer · 2019-07-15 14:16:00Z

0

Try casting your columns to be strings. Not sure what the name of your dataframe is, but something like the below should work.

df.SEPSIS_SHOCK = df.SEPSIS_SHOCK.astype(str)

answered Jul 15, 2019 at 14:16

codeman51

1568 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

michellemabelle Over a year ago

I'd also tried that without success using y_train=y_train.astype(str). Printing dtypes still yields 'object', and comparison does not seem to work.

Collectives™ on Stack Overflow

How to apply list comprehension on multiple columns in a dataframe?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related