1

I have a bit of a puzzle which I similar to other questions but there is a slight twist.

I have a dataframe - see below. Each record is unique and some records have multiple admit locations in the Concat column. The CONCAT columns reflects the progression of a patients admissions location status.

I want to know where patients ended.

I know that if the text within the CONCAT column is '3E PICU' or '6EN' or '3MN' or '6E' or '3MC' regardless of any other text that is in the column, they ended in the ICU.

I know that if a patient had any of the following admit locations with the CONCAT column, WITHOUT any of the ICU locations they can be considered "ACUTE": '4E' or '5E NSU' or '3E HKU'(see code below for full list of locations).

I know that if a patient had APU or CPU or PSU regardless of any other location that is in the CONCAT column, they can be considered "Psych".

I know that if patient is not considered ICU or ACUTE or PSYCH, they were not admitted.

Current Data

 ID              Concat 
  1              MAIN, 3E HKU, 6EN 
  2              ED Eval and Treatment Unit
  3              ED Main, 3E PICU
  4              ED Main, APU

Desired Data

 ID              Concat                              Admit Status 
  1              MAIN, 3E HKU, 6EN                       ICU
  2              ED Eval and Treatment Unit            Non-Admit
  3              ED Main, PICU                           ICU
  4              ED Main, APU                           Psych
  5              ED Main, 5E NSU, 3E HKU                Acute

I am familiar with the str.contains code but I need some help in illogically mapping out the code, especially if if else conditions are required.

 condition_one=new_ADM1["concat"].str.contains("3E PICU|6EN|3MN|6E|3MC", case=False)
 condition_two=new_ADM1["concat"].str.contains("4E|5E NSU|3E HKU|3E|4MN|5E SCU|4MA|7E|7E IRU", case=False)
 condition_three=new_ADM1["concat"].str.contains("APU|CPU|PSU", case=False)
4
  • 1
    np.select([condition_one,condition_two,condition_three],[value1,value2,value3]) should do Commented Jun 24, 2020 at 14:21
  • the issue with that is that ACUTE designation may still grab ICU patients because some ICU patients also have ACUTE admit locations. Commented Jun 24, 2020 at 14:26
  • 1
    '3E PICU'?? ID 3 shows ICU with only PICU in its string. Commented Jun 24, 2020 at 15:08
  • my mistake - I have changed it to say "3E PICU" Commented Jun 24, 2020 at 15:12

1 Answer 1

1

Use, Series.str.contains along with the given regex patterns, then use np.select to select the items from choices based on the conditions m1, m2 & m3:

m1 = df["Concat"].str.contains("(?i)(?:3E PICU|6EN|3MN|6E|3MC)$")
m2 = df["Concat"].str.contains("(?i)(?:4E|5E NSU|3E HKU|3E|4MN|5E SCU|4MA|7E|7E IRU)$")
m3 = df["Concat"].str.contains("(?i)(?:APU|CPU|PSU)$")

df['Admit Status'] = np.select([m1, m2, m3], ['ICU', 'Acute', 'Psych'], 'Non-Admit')

Result:

# print(df)
   ID                      Concat Admit Status
0   1           MAIN, 3E HKU, 6EN          ICU
1   2  ED Eval and Treatment Unit    Non-Admit
2   3            ED Main, 3E PICU          ICU
3   4                ED Main, APU        Psych
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks this runs but every record in the Admit Status column says "Non-Admit" which is incorrect.
@Raven As per the data you have provided this should produce the correct results.
Thanks, can you help me understand what the ("(?i)(?: .......$") is instructing the code to do?
(?i) is case insensitive match flag which is equivalent to setting case=False. (?:) is a non capturing group and $ is used to indicate a position at the end of line.
would your code change if there were no commas separating the locations within the Concat column?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.