Calculate count of a column based on other column in python dataframe

Question

I have a dataframe like below having patients stay in ICU (in hours) that is shown by ICULOS.

df # Main dataframe
dfy = df.copy()

dfy

P_ID	ICULOS	Count
1	1	5
1	2	5
1	3	5
1	4	5
1	5	5
2	1	9
2	2	9
2	3	9
2	4	9
2	5	9
2	6	9
2	7	9
2	8	9
2	9	9
3	1	3
3	2	3
3	3	3
4	1	7
4	2	7
4	3	7
4	4	7
4	5	7
4	6	7
4	7	7

I calculated their ICULOS Count and placed in the new column named Count using the code:

dfy['Count'] = dfy.groupby(['P_ID'])['ICULOS'].transform('count')

Now, I want to remove those patients based on P_ID whose Count is less than 8. (Note, I want to remove whole patient record). So, after removing the patients with Count < 8, Only the P_ID = 2 will remain as the count is 9.

The desired output:

P_ID	ICULOS	Count
2	1	9
2	2	9
2	3	9
2	4	9
2	5	9
2	6	9
2	7	9
2	8	9
2	9	9

I tried the following code, but for some reason, it is not working for me. It did worked for me but when I re-run the code after few days, it is giving me 0 result. Can someone suggest a better code? Thanks.

dfy = dfy.drop_duplicates(subset=['P_ID'],keep='first')
lis1 = dfy['P_ID'].tolist()
Icu_less_8 = dfy.loc[dfy['Count'] < 8]
lis2 = Icu_less_8.P_ID.to_list()
lis_3  = [k for k in tqdm_notebook(lis1) if k not in lis2]
# removing those patients who have ICULOS of less than 8 hours
df_1 = pd.DataFrame()
for l in tqdm_notebook(lis_3, desc = 'Progress'):
    df_1 = df_1.append(df.loc[df['P_ID']==l])

if dfy is the dataframe after the transform operation, all you have to do is simple boolean masking. dfy = dfy[dfy['count'] >= 8] — timgeb
– timgeb, Commented Feb 21, 2022 at 11:39

Mayank Porwal · Accepted Answer · 2022-02-21 14:42:25Z

3

You can directly filter rows in transform using Series.ge:

In [1521]: dfy[dfy.groupby(['P_ID'])['ICULOS'].transform('count').ge(8)]
Out[1521]: 
    P_ID  ICULOS  Count
5      2       1      9
6      2       2      9
7      2       3      9
8      2       4      9
9      2       5      9
10     2       6      9
11     2       7      9
12     2       8      9
13     2       9      9

EDIT after OP's comment: For multiple conditions, do:

In [1533]: x = dfy.groupby(['P_ID'])['ICULOS'].transform('count')
In [1539]: dfy.loc[x[x.ge(8) & x.le(72)].index]
Out[1539]: 
    P_ID  ICULOS  Count
5      2       1      9
6      2       2      9
7      2       3      9
8      2       4      9
9      2       5      9
10     2       6      9
11     2       7      9
12     2       8      9
13     2       9      9

edited Feb 21, 2022 at 14:42

answered Feb 21, 2022 at 11:38

Mayank Porwal

34.2k9 gold badges45 silver badges65 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Mayank Porwal Over a year ago

gt: greater than

timgeb Over a year ago

should be .ge(8) according to description in question I think

Mayank Porwal Over a year ago

@timgeb Makes sense. Updated the answer.

Mayank Porwal Over a year ago

@Optimus Please check the 2nd part of my answer.

user17978261 Over a year ago

tysm for the answer :)

|

Collectives™ on Stack Overflow

Calculate count of a column based on other column in python dataframe

dfy

1 Answer 1

8 Comments

Your Answer

Hot Network Questions

P_ID	ICULOS	Count
1	1	5
1	2	5
1	3	5
1	4	5
1	5	5
2	1	9
2	2	9
2	3	9
2	4	9
2	5	9
2	6	9
2	7	9
2	8	9
2	9	9
3	1	3
3	2	3
3	3	3
4	1	7
4	2	7
4	3	7
4	4	7
4	5	7
4	6	7
4	7	7

P_ID	ICULOS	Count
1	1	5
1	2	5
1	3	5
1	4	5
1	5	5
2	1	9
2	2	9
2	3	9
2	4	9
2	5	9
2	6	9
2	7	9
2	8	9
2	9	9
3	1	3
3	2	3
3	3	3
4	1	7
4	2	7
4	3	7
4	4	7
4	5	7
4	6	7
4	7	7

Collectives™ on Stack Overflow

dfy

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related

P_ID	ICULOS	Count
1	1	5
1	2	5
1	3	5
1	4	5
1	5	5
2	1	9
2	2	9
2	3	9
2	4	9
2	5	9
2	6	9
2	7	9
2	8	9
2	9	9
3	1	3
3	2	3
3	3	3
4	1	7
4	2	7
4	3	7
4	4	7
4	5	7
4	6	7
4	7	7