0

I'm setting up a new column of 0s and 1s in my dataframe based on the values of other columns in that row. The value should equal 1 if any of the following conditions are true, and 0 otherwise:

y_train['SEPSISPATOS']=='Yes' OR
y_train['SEPSHOCKPATOS'] == 'Yes' OR 
y_train['OTHSYSEP'] == 'Sepsis' OR
y_train['OTHSESHOCK'] == 'Septic Shock' 

I've tried using list comprehensions and np.select (code below)

NSQIPdf_train = pd.read_csv("acs_nsqip_puf13_2.csv",sep=',',encoding='utf-8')
y_train = NSQIPdf_train.loc[:,('SEPSISPATOS','SEPSHOCKPATOS', 'OTHSYSEP', 'OTHSESHOCK')]

### trying list comprehension
y_train['SEPSIS_STATUS'] = [1 if (x['SEPSISPATOS'] == 'Yes') or (x['SEPSHOCKPATOS'] == 'Yes') or (x['OTHSYSEP'] == 'Sepsis') or (x['OTHSESHOCK'] == 'Septic Shock') else 0 for x in y_train]

### trying np.select
y_train['SEPSIS_STATUS'] = [1 if (x['SEPSISPATOS'] == 'Yes') or (x['SEPSHOCKPATOS'] == 'Yes') or (x['OTHSYSEP'] == 'Sepsis') or (x['OTHSESHOCK'] == 'Septic Shock') else 0 for x in y_train]
conditions=[
    (y_train['SEPSISPATOS'] == 'Yes'),
    (y_train['SEPSHOCKPATOS'] == 'Yes'),
    (y_train['OTHSYSEP'] == 'Sepsis'),
    (y_train['OTHSESHOCK'] == 'Septic Shock')]
choices=[1,1,1,1]
y_train['SEPSIS_STATUS'] = np.select(conditions,choices,default=0)

print (y_train)
print (y_train.dtypes)

Using np.select, you can see where OTHSESHOCK='Septic Shock' in row 3, SEPSIS_STATUS is still 0, where I am instead expecting 1. The string comparison does not seem to work (sample output below - I am wondering if this is because dtype of the columns is 'object' because of how Pandas reads in csv files instead of string)

       SEPSISPATOS SEPSHOCKPATOS  ...          OTHSESHOCK SEPSIS_STATUS
0            b'No'         b'No'  ...  b'No Complication'             0
1            b'No'         b'No'  ...  b'No Complication'             0
2            b'No'         b'No'  ...  b'No Complication'             0
3            b'No'         b'No'  ...     b'Septic Shock'             0
4            b'No'         b'No'  ...  b'No Complication'             0
5            b'No'         b'No'  ...  b'No Complication'             0
6            b'No'         b'No'  ...  b'No Complication'             0
7            b'No'         b'No'  ...  b'No Complication'             0
8            b'No'         b'No'  ...  b'No Complication'             0

When using list comprehension, I get the following error:

AttributeError: 'DataFrame' object has no attribute 'str'.

Finally, here are the dtypes of my variables when using print(df.dtypes)

SEPSISPATOS      object
SEPSHOCKPATOS    object
OTHSYSEP         object
OTHSESHOCK       object
SEPSIS_STATUS     int32
dtype: object

Help much appreciated. I've tried multiple ways, but am open to other suggestions or fixes. Thank you!

1 Answer 1

0

Try casting your columns to be strings. Not sure what the name of your dataframe is, but something like the below should work.

df.SEPSIS_SHOCK = df.SEPSIS_SHOCK.astype(str)
Sign up to request clarification or add additional context in comments.

1 Comment

I'd also tried that without success using y_train=y_train.astype(str). Printing dtypes still yields 'object', and comparison does not seem to work.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.