I have a dataframe like below having patients stay in ICU (in hours) that is shown by ICULOS.
df # Main dataframe
dfy = df.copy()
dfy
| P_ID | ICULOS | Count |
|---|---|---|
| 1 | 1 | 5 |
| 1 | 2 | 5 |
| 1 | 3 | 5 |
| 1 | 4 | 5 |
| 1 | 5 | 5 |
| 2 | 1 | 9 |
| 2 | 2 | 9 |
| 2 | 3 | 9 |
| 2 | 4 | 9 |
| 2 | 5 | 9 |
| 2 | 6 | 9 |
| 2 | 7 | 9 |
| 2 | 8 | 9 |
| 2 | 9 | 9 |
| 3 | 1 | 3 |
| 3 | 2 | 3 |
| 3 | 3 | 3 |
| 4 | 1 | 7 |
| 4 | 2 | 7 |
| 4 | 3 | 7 |
| 4 | 4 | 7 |
| 4 | 5 | 7 |
| 4 | 6 | 7 |
| 4 | 7 | 7 |
I calculated their ICULOS Count and placed in the new column named Count using the code:
dfy['Count'] = dfy.groupby(['P_ID'])['ICULOS'].transform('count')
Now, I want to remove those patients based on P_ID whose Count is less than 8. (Note, I want to remove whole patient record). So, after removing the patients with Count < 8, Only the P_ID = 2 will remain as the count is 9.
The desired output:
| P_ID | ICULOS | Count |
|---|---|---|
| 2 | 1 | 9 |
| 2 | 2 | 9 |
| 2 | 3 | 9 |
| 2 | 4 | 9 |
| 2 | 5 | 9 |
| 2 | 6 | 9 |
| 2 | 7 | 9 |
| 2 | 8 | 9 |
| 2 | 9 | 9 |
I tried the following code, but for some reason, it is not working for me. It did worked for me but when I re-run the code after few days, it is giving me 0 result. Can someone suggest a better code? Thanks.
dfy = dfy.drop_duplicates(subset=['P_ID'],keep='first')
lis1 = dfy['P_ID'].tolist()
Icu_less_8 = dfy.loc[dfy['Count'] < 8]
lis2 = Icu_less_8.P_ID.to_list()
lis_3 = [k for k in tqdm_notebook(lis1) if k not in lis2]
# removing those patients who have ICULOS of less than 8 hours
df_1 = pd.DataFrame()
for l in tqdm_notebook(lis_3, desc = 'Progress'):
df_1 = df_1.append(df.loc[df['P_ID']==l])
dfyis the dataframe after the transform operation, all you have to do is simple boolean masking.dfy = dfy[dfy['count'] >= 8]