I'm setting up a new column of 0s and 1s in my dataframe based on the values of other columns in that row. The value should equal 1 if any of the following conditions are true, and 0 otherwise:
y_train['SEPSISPATOS']=='Yes' OR
y_train['SEPSHOCKPATOS'] == 'Yes' OR
y_train['OTHSYSEP'] == 'Sepsis' OR
y_train['OTHSESHOCK'] == 'Septic Shock'
I've tried using list comprehensions and np.select (code below)
NSQIPdf_train = pd.read_csv("acs_nsqip_puf13_2.csv",sep=',',encoding='utf-8')
y_train = NSQIPdf_train.loc[:,('SEPSISPATOS','SEPSHOCKPATOS', 'OTHSYSEP', 'OTHSESHOCK')]
### trying list comprehension
y_train['SEPSIS_STATUS'] = [1 if (x['SEPSISPATOS'] == 'Yes') or (x['SEPSHOCKPATOS'] == 'Yes') or (x['OTHSYSEP'] == 'Sepsis') or (x['OTHSESHOCK'] == 'Septic Shock') else 0 for x in y_train]
### trying np.select
y_train['SEPSIS_STATUS'] = [1 if (x['SEPSISPATOS'] == 'Yes') or (x['SEPSHOCKPATOS'] == 'Yes') or (x['OTHSYSEP'] == 'Sepsis') or (x['OTHSESHOCK'] == 'Septic Shock') else 0 for x in y_train]
conditions=[
(y_train['SEPSISPATOS'] == 'Yes'),
(y_train['SEPSHOCKPATOS'] == 'Yes'),
(y_train['OTHSYSEP'] == 'Sepsis'),
(y_train['OTHSESHOCK'] == 'Septic Shock')]
choices=[1,1,1,1]
y_train['SEPSIS_STATUS'] = np.select(conditions,choices,default=0)
print (y_train)
print (y_train.dtypes)
Using np.select, you can see where OTHSESHOCK='Septic Shock' in row 3, SEPSIS_STATUS is still 0, where I am instead expecting 1. The string comparison does not seem to work (sample output below - I am wondering if this is because dtype of the columns is 'object' because of how Pandas reads in csv files instead of string)
SEPSISPATOS SEPSHOCKPATOS ... OTHSESHOCK SEPSIS_STATUS
0 b'No' b'No' ... b'No Complication' 0
1 b'No' b'No' ... b'No Complication' 0
2 b'No' b'No' ... b'No Complication' 0
3 b'No' b'No' ... b'Septic Shock' 0
4 b'No' b'No' ... b'No Complication' 0
5 b'No' b'No' ... b'No Complication' 0
6 b'No' b'No' ... b'No Complication' 0
7 b'No' b'No' ... b'No Complication' 0
8 b'No' b'No' ... b'No Complication' 0
When using list comprehension, I get the following error:
AttributeError: 'DataFrame' object has no attribute 'str'.
Finally, here are the dtypes of my variables when using print(df.dtypes)
SEPSISPATOS object
SEPSHOCKPATOS object
OTHSYSEP object
OTHSESHOCK object
SEPSIS_STATUS int32
dtype: object
Help much appreciated. I've tried multiple ways, but am open to other suggestions or fixes. Thank you!