Pandas groupby NaN/None values in non-key columns

Question

Where there are NaN/None values in columns which aren't groupby key columns, when last() is used, it seems groupby is doing some sort of filling:

df = pd.DataFrame({'a': [1, 2, 1, 2], 'b': [23, 43, np.nan, 12], 'c': ['x', 'y', 'z', None]})
   a     b     c
0  1  23.0     x
1  2  43.0     y
2  1   NaN     z
3  2  12.0  None

df.groupby(by='a', as_index=False, dropna=False).last()
   a     b  c
0  1  23.0  z
1  2  12.0  y

where expected output is

   a     b     c
0  1   NaN     z
1  2  12.0  None

dropna=False doesn't help because it only applies to groupby column 'a'. Is there a way to make pandas not ignore NaN/None values without a hack?

mozway · Accepted Answer · 2022-06-24 02:59:13Z

1

last is designed to get the last non-NA value, independently in each column.

What you want (last row per group) is tail:

df.groupby(by='a', as_index=False).tail(1)

Output:

   a     b     c
2  1   NaN     z
3  2  12.0  None

answered Jun 24, 2022 at 2:59

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Pandas groupby NaN/None values in non-key columns

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related