25

I have two dataframes in Pandas which are being merged together df.A and df.B, df.A is the original, and df.B has the new data I want to bring over. The merge works fine and as expected I get two columns col_x and col_y in the merged df.

However, in some rows, the original df.A has values where the other df.B does not. My question is, how can I selectively take the values from col_x and col_y and place them into a new col such as col_z ?

Here's what I mean, how can I merge df.A:

date   impressions    spend    col
1/1/15 100000         3.00     ABC123456
1/2/15 145000         5.00     ABCD00000
1/3/15 300000         15.00    (null)

with df.B

date    col
1/1/15  (null)
1/2/15  (null)
1/3/15  DEF123456

To get:

date   impressions    spend    col_z
1/1/15 100000         3.00     ABC123456
1/2/15 145000         5.00     ABCD00000
1/3/15 300000         15.00    DEF123456

Any help or point in the right direction would be really appreciated!

Thanks

1
  • 1
    Show your code which isn't working. Commented May 18, 2015 at 9:01

3 Answers 3

30

OK assuming that your (null) values are in fact NaN values and not that string then the following works:

In [10]:
# create the merged df
merged = dfA.merge(dfB, on='date')
merged

Out[10]:
        date  impressions  spend      col_x      col_y
0 2015-01-01       100000      3  ABC123456        NaN
1 2015-01-02       145000      5  ABCD00000        NaN
2 2015-01-03       300000     15        NaN  DEF123456

You can use where to conditionally assign a value from the _x and _y columns:

In [11]:
# now create col_z using where
merged['col_z'] = merged['col_x'].where(merged['col_x'].notnull(), merged['col_y'])
merged

Out[11]:
        date  impressions  spend      col_x      col_y      col_z
0 2015-01-01       100000      3  ABC123456        NaN  ABC123456
1 2015-01-02       145000      5  ABCD00000        NaN  ABCD00000
2 2015-01-03       300000     15        NaN  DEF123456  DEF123456

You can then drop the extraneous columns:

In [13]:

merged = merged.drop(['col_x','col_y'],axis=1)
merged

Out[13]:
        date  impressions  spend      col_z
0 2015-01-01       100000      3  ABC123456
1 2015-01-02       145000      5  ABCD00000
2 2015-01-03       300000     15  DEF123456
Sign up to request clarification or add additional context in comments.

1 Comment

How can you merge multiple columns I am doing repeated tasks to get the final output but wonder if there is a cleaner way ``` obs_table['value'] = obs_table['value_coded'].where(obs_table['value_coded'].notnull(),obs_table['value_numeric']) obs_table['value'] = obs_table['value'].where(obs_table['value'].notnull(),obs_table['value_text']) obs_table['value'] = obs_table['value'].where(obs_table['value'].notnull(),obs_table['value_datetime']) obs_table['value'] = obs_table['value'].where(obs_table['value'].notnull(),obs_table['value_datetime']) ```
7

IMO the shortest and yet readable solution is something like that:

df.A.loc[df.A['col'].isna(), 'col'] = df.A.merge(df.B, how='left', on='date')['col_y']

What it basically does is assigning values from merged table column col_y to primary df.A table, for those rows in col column, which are empty (.isna() condition).

2 Comments

very clean... thx! to merge on multiple columns change the on=' ' to a list.
I don't think df.A.merge() will work because a Series doesn't have .merge(), only a DataFrame.
2

If you have got data that contains 'nans' and you want to fill the 'nans' from other dataframe
(that matching the index and columns names) you can do the following:

df_A : target DataFrame that contain nans element

df_B : the source DataFrame thatcomplete the missing elements

df_A = df_A.where(df_A.notnull(),df_B)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.