Merge Only When Value is Empty/Null in Pandas

Question

I have two dataframes in Pandas which are being merged together df.A and df.B, df.A is the original, and df.B has the new data I want to bring over. The merge works fine and as expected I get two columns col_x and col_y in the merged df.

However, in some rows, the original df.A has values where the other df.B does not. My question is, how can I selectively take the values from col_x and col_y and place them into a new col such as col_z ?

Here's what I mean, how can I merge df.A:

date   impressions    spend    col
1/1/15 100000         3.00     ABC123456
1/2/15 145000         5.00     ABCD00000
1/3/15 300000         15.00    (null)

with df.B

date    col
1/1/15  (null)
1/2/15  (null)
1/3/15  DEF123456

To get:

date   impressions    spend    col_z
1/1/15 100000         3.00     ABC123456
1/2/15 145000         5.00     ABCD00000
1/3/15 300000         15.00    DEF123456

Any help or point in the right direction would be really appreciated!

Thanks

Show your code which isn't working.

martineau
– martineau

2015-05-18 09:01:05 +00:00
Commented May 18, 2015 at 9:01 — martineau
– martineau, Commented May 18, 2015 at 9:01

EdChum · Accepted Answer · 2015-05-18 09:12:37Z

30

OK assuming that your (null) values are in fact NaN values and not that string then the following works:

In [10]:
# create the merged df
merged = dfA.merge(dfB, on='date')
merged

Out[10]:
        date  impressions  spend      col_x      col_y
0 2015-01-01       100000      3  ABC123456        NaN
1 2015-01-02       145000      5  ABCD00000        NaN
2 2015-01-03       300000     15        NaN  DEF123456

You can use where to conditionally assign a value from the _x and _y columns:

In [11]:
# now create col_z using where
merged['col_z'] = merged['col_x'].where(merged['col_x'].notnull(), merged['col_y'])
merged

Out[11]:
        date  impressions  spend      col_x      col_y      col_z
0 2015-01-01       100000      3  ABC123456        NaN  ABC123456
1 2015-01-02       145000      5  ABCD00000        NaN  ABCD00000
2 2015-01-03       300000     15        NaN  DEF123456  DEF123456

You can then drop the extraneous columns:

In [13]:

merged = merged.drop(['col_x','col_y'],axis=1)
merged

Out[13]:
        date  impressions  spend      col_z
0 2015-01-01       100000      3  ABC123456
1 2015-01-02       145000      5  ABCD00000
2 2015-01-03       300000     15  DEF123456

answered May 18, 2015 at 9:12

EdChum

397k204 gold badges836 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user3404455 Over a year ago

How can you merge multiple columns I am doing repeated tasks to get the final output but wonder if there is a cleaner way ``` obs_table['value'] = obs_table['value_coded'].where(obs_table['value_coded'].notnull(),obs_table['value_numeric']) obs_table['value'] = obs_table['value'].where(obs_table['value'].notnull(),obs_table['value_text']) obs_table['value'] = obs_table['value'].where(obs_table['value'].notnull(),obs_table['value_datetime']) obs_table['value'] = obs_table['value'].where(obs_table['value'].notnull(),obs_table['value_datetime']) ```

Oskar_U · Accepted Answer · 2019-11-26 00:06:29Z

7

IMO the shortest and yet readable solution is something like that:

df.A.loc[df.A['col'].isna(), 'col'] = df.A.merge(df.B, how='left', on='date')['col_y']

What it basically does is assigning values from merged table column col_y to primary df.A table, for those rows in col column, which are empty (.isna() condition).

answered Nov 26, 2019 at 0:06

Oskar_U

4824 silver badges14 bronze badges

2 Comments

leo Over a year ago

very clean... thx! to merge on multiple columns change the on=' ' to a list.

mihow Over a year ago

I don't think df.A.merge() will work because a Series doesn't have .merge(), only a DataFrame.

רועי שמש · Accepted Answer · 2021-10-14 14:30:41Z

2

If you have got data that contains 'nans' and you want to fill the 'nans' from other dataframe
(that matching the index and columns names) you can do the following:

df_A : target DataFrame that contain nans element

df_B : the source DataFrame thatcomplete the missing elements

df_A = df_A.where(df_A.notnull(),df_B)

answered Oct 14, 2021 at 14:30

רועי שמש

1512 silver badges3 bronze badges

Collectives™ on Stack Overflow

Merge Only When Value is Empty/Null in Pandas

3 Answers 3

1 Comment

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related