Filling nan of one column with the values of another Python

Question

I have a dataset that has been merged together to fill missing values from one another.

The problem is that I have some columns with missing data that I want to now fill with the values that aren't missing.

The merged data set looks like this for an input:

Name         State       ID       Number_x      Number_y       Op_x       Op_y
Johnson      AL          1        1             nan            1956       nan
Johnson      AL          1        nan           nan            1956       nan
Johnson      AL          2        1             nan            1999       nan
Johnson      AL          2        0             nan            1999       nan
Debra        AK          1A       0             nan            2000       nan
Debra        AK          1B       nan           20             nan        1997
Debra        AK          2        nan           10             nan        2009
Debra        AK          3        nan           1              nan        2008
.
.

What I'd want for an output is this:

Name         State       ID       Number_x      Number_y     Op_x       Op_y
Johnson      AL          1        1             1            1956       1956
Johnson      AL          2        1             1            1999       1999
Johnson      AL          2        0             0            1999       1999
Debra        AK          1A       0             0            2000       2000
Debra        AK          1B       20            20           1997       1997
Debra        AK          2        10            10           2009       2009
Debra        AK          3        1             1            2008       2008
.
.

So I want it so that all nan values are replaced by the associated values in their columns - match Number_x to Number_y and Op_x to Op_y.

One thing to note is that when there are two IDs that are the same sometimes their values will be different; like Johnson with ID = 2 which has different numbers but the same op values. I want to keep these because I need to investigate them more.

Also, if the row has two missing values for Number_x and Number_y I want to take that row out - like Johnson with Number_x and Number_y missing as a nan value.

also you have duplicated column in out put , _x and _y are the same ? — BENY
– BENY, Commented Jan 21, 2019 at 17:04
what about df.loc[df.isnull().any(axis=1), :] = df.ffill() — Karn Kumar
– Karn Kumar, Commented Jan 21, 2019 at 17:04

BENY · Accepted Answer · 2019-01-21 17:05:33Z

2

let us do groupby with axis =1

df.groupby(df.columns.str.split('_').str[0],1).first().dropna(subset=['Number','Op'])
   ID     Name  Number      Op State
0   1  Johnson     1.0  1956.0    AL
2   2  Johnson     1.0  1999.0    AL
3   2  Johnson     0.0  1999.0    AL
4  1A    Debra     0.0  2000.0    AK
5  1B    Debra    20.0  1997.0    AK
6   2    Debra    10.0  2009.0    AK
7   3    Debra     1.0  2008.0    AK

answered Jan 21, 2019 at 17:05

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Karn Kumar Over a year ago

astype(int) for ['Number','Op'].

BENY Over a year ago

@pygo the original df should be float , that is why I keep it here. And that is just partial df.

Karn Kumar Over a year ago

@ W-B, ok nice solution, +1

Collectives™ on Stack Overflow

Filling nan of one column with the values of another Python

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related