Pandas fillna using groupby

Question

I am trying to impute/fill values using rows with similar columns' values.

For example, I have this dataframe:

one | two | three
1      1     10
1      1     nan
1      1     nan
1      2     nan
1      2     20
1      2     nan
1      3     nan
1      3     nan

I wanted to using the keys of column one and two which is similar and if column three is not entirely nan then impute the existing value from a row of similar keys with value in column '3'.

Here is my desired result:

one | two | three
1      1     10
1      1     10
1      1     10
1      2     20
1      2     20
1      2     20
1      3     nan
1      3     nan

You can see that keys 1 and 3 do not contain any value because the existing value does not exists.

I have tried using groupby+fillna():

df['three'] = df.groupby(['one','two'])['three'].fillna()

which gave me an error.

I have tried forward fill which give me rather strange result where it forward fill the column 2 instead. I am using this code for forward fill.

df['three'] = df.groupby(['one','two'], sort=False)['three'].ffill()

jezrael · Accepted Answer · 2017-09-24 14:37:59Z

72

If only one non NaN value per group use ffill (forward filling) and bfill (backward filling) per group, so need apply with lambda:

df['three'] = df.groupby(['one','two'], sort=False)['three']
                .apply(lambda x: x.ffill().bfill())
print (df)
   one  two  three
0    1    1   10.0
1    1    1   10.0
2    1    1   10.0
3    1    2   20.0
4    1    2   20.0
5    1    2   20.0
6    1    3    NaN
7    1    3    NaN

But if multiple value per group and need replace NaN by some constant - e.g. mean by group:

print (df)
   one  two  three
0    1    1   10.0
1    1    1   40.0
2    1    1    NaN
3    1    2    NaN
4    1    2   20.0
5    1    2    NaN
6    1    3    NaN
7    1    3    NaN

df['three'] = df.groupby(['one','two'], sort=False)['three']
                .apply(lambda x: x.fillna(x.mean()))
print (df)
   one  two  three
0    1    1   10.0
1    1    1   40.0
2    1    1   25.0
3    1    2   20.0
4    1    2   20.0
5    1    2   20.0
6    1    3    NaN
7    1    3    NaN

edited Sep 24, 2017 at 14:37

answered Sep 24, 2017 at 14:32

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

12 Comments

Andy L. Over a year ago

@jezrael: is there any reason that force to use apply in your answer? I am asking because I tried direct ffill and bfill and it returns correct result: df['three'] = df.groupby(['one', 'two'])['three'].ffill().bfill()

jezrael Over a year ago

@Andy L. It working correct, because last group is only NaN group. If change sample data for first only NaN group (10 to NaN) , your solution failed. Reason is last bfill working not per groups, but per Series returned groupby +ffill.

Andy L. Over a year ago

ah, I forgot that the bfill back-fills the output series from ffill, not the groupby. Thanks for answers

ah bon Over a year ago

May I ask, how can I apply df['three'] = df.groupby(['one','two'], sort=False)['three'].apply(lambda x: x.ffill().bfill()) to multiple columns three, four, five, etc instead of only three which need groupby one and two and fillna?

jezrael Over a year ago

@ahbon - Use cols = ['three','four','five'] and df[cols] = df.groupby(['one','two'], sort=False)[cols].apply(lambda x: x.ffill().bfill())

|

Mykola Zotko · Accepted Answer · 2021-09-15 14:52:15Z

2

You can sort data by the column with missing values then groupby and forwardfill:

df.sort_values('three', inplace=True)
df['three'] = df.groupby(['one','two'])['three'].ffill()

edited Sep 15, 2021 at 14:52

answered Sep 15, 2021 at 14:44

Mykola Zotko

18.2k6 gold badges88 silver badges90 bronze badges

Collectives™ on Stack Overflow

Pandas fillna using groupby

2 Answers 2

12 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

12 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related