I have a dataset that has been merged together to fill missing values from one another.
The problem is that I have some columns with missing data that I want to now fill with the values that aren't missing.
The merged data set looks like this for an input:
Name State ID Number_x Number_y Op_x Op_y
Johnson AL 1 1 nan 1956 nan
Johnson AL 1 nan nan 1956 nan
Johnson AL 2 1 nan 1999 nan
Johnson AL 2 0 nan 1999 nan
Debra AK 1A 0 nan 2000 nan
Debra AK 1B nan 20 nan 1997
Debra AK 2 nan 10 nan 2009
Debra AK 3 nan 1 nan 2008
.
.
What I'd want for an output is this:
Name State ID Number_x Number_y Op_x Op_y
Johnson AL 1 1 1 1956 1956
Johnson AL 2 1 1 1999 1999
Johnson AL 2 0 0 1999 1999
Debra AK 1A 0 0 2000 2000
Debra AK 1B 20 20 1997 1997
Debra AK 2 10 10 2009 2009
Debra AK 3 1 1 2008 2008
.
.
So I want it so that all nan values are replaced by the associated values in their columns - match Number_x to Number_y and Op_x to Op_y.
One thing to note is that when there are two IDs that are the same sometimes their values will be different; like Johnson with ID = 2 which has different numbers but the same op values. I want to keep these because I need to investigate them more.
Also, if the row has two missing values for Number_x and Number_y I want to take that row out - like Johnson with Number_x and Number_y missing as a nan value.
df.loc[df.isnull().any(axis=1), :] = df.ffill()