Get subset of Pandas df where multiple columns match values from another df

Question

I have two dataframes with multi-indexes looking like this:

df1

pd.DataFrame({'observation': {('foo', '2017-04-16'): 'green',
  ('bar', '2017-04-25'): 'red',
  ('zap', '2017-04-16'): 'red',
  ('zip', '2017-04-25'): 'blue',
  ('zip', '2017-04-16'): 'white'},
 'observation': {('zap', '2017-04-16'): np.nan,
  ('bar', '2017-04-27'): 'white',
  ('foo', '2017-05-16'): np.nan,
  ('foo', '2017-04-25'): 'red',
  ('zip', '2017-08-16'): 'red'}})

df2

pd.DataFrame({'foo': {('00', '08'): '0.0',
  ('01', '08'): '0.0',
  ('01', '08'): '0.0',
  ('00', '08'): '1.0',
  ('03', '08'): '1.0',
  ('06', '08'): '0.0',
  ('00', '08'): '1.0',
  ('00', '08'): '1.0',
  ('00', '08'): '0.0',
  ('02', '08'): '0.0'},
 'client_id': {('00', '08'): '1.0',
  ('01', '08'): '1.0',
  ('01', '08'): '1.0',
  ('00', '08'): '1.0',
  ('03', '08'): '1.0',
  ('06', '08'): '1.0',
  ('00', '08'): '1.0',
  ('00', '08'): '1.0',
  ('00', '08'): '1.0',
  ('02', '08'): '1.0'},
 'execution_date': {('00', '08'): '2019-01-09',
  ('01', '08'): '2019-01-09',
  ('01', '08'): '2019-01-09',
  ('00', '08'): '2019-01-09',
  ('03', '08'): '2019-01-09',
  ('06', '08'): '2019-01-09',
  ('00', '08'): '2019-01-09',
  ('00', '08'): '2019-01-09',
  ('00', '08'): '2019-01-09',
  ('02', '08'): '2019-01-09'},
 'del': {('00', '08'): '0.0',
  ('01', '08'): '0.0',
  ('01', '08'): '0.0',
  ('00', '08'): '0.0',
  ('03', '08'): '0.0',
  ('06', '08'): '0.0',
  ('00', '08'): '0.0',
  ('00', '08'): '0.0',
  ('00', '08'): '0.0',
  ('02', '08'): '0.0'},
 'act': {('00', '08'): '11',
  ('01', '08'): '03',
  ('01', '08'): '06',
  ('00', '08'): '07',
  ('03', '08'): '07',
  ('06', '08'): '11',
  ('00', '08'): '28',
  ('00', '08'): '08',
  ('00', '08'): '14',
  ('02', '08'): '26'},
 'obs': {('00', '08'): '02',
  ('01', '08'): '02',
  ('01', '08'): '02',
  ('00', '08'): '02',
  ('03', '08'): '02',
  ('06', '08'): '02',
  ('00', '08'): '02',
  ('00', '08'): '02',
  ('00', '08'): '02',
  ('02', '08'): '02'}})

The two are not the same size, and the values don't always overlap, but every index pair found in df1 is in df2. What I would like to do is update the observation col in the df1 with the values of observation in df2, wherever it matches.

In other words, I would like to do the equivalent of an inner join based on the multi-index, and then overwrite the values in observation in df1 with those from df2. But is there a way to do this in one step, using loc/indexing? (This is structured as an index problem, but if there is a way to solve it using reset_index() that would be fine too.)

Desired output:

Do you mean update the observation column?

Dani Mesejo
– Dani Mesejo

2019-01-16 14:12:35 +00:00
Commented Jan 16, 2019 at 14:12 — Dani Mesejo
– Dani Mesejo, Commented Jan 16, 2019 at 14:12
Yes - overwrite it

Josh Friedlander
– Josh Friedlander

2019-01-16 14:13:18 +00:00
Commented Jan 16, 2019 at 14:13 — Josh Friedlander
– Josh Friedlander, Commented Jan 16, 2019 at 14:13

Dani Mesejo · Accepted Answer · 2019-01-16 14:15:55Z

1

If I understood correctly, you could do:

df2 = pd.DataFrame({'observation': {('foo', '2017-04-16'): 'green',
  ('bar', '2017-04-25'): 'red',
  ('zap', '2017-04-16'): 'red',
  ('zip', '2017-04-25'): 'blue',
  ('zip', '2017-04-16'): 'white'},
 'observation': {('zap', '2017-04-16'): 'yellow',
  ('bar', '2017-04-27'): 'white',
  ('foo', '2017-05-16'): 'black',
  ('foo', '2017-04-25'): 'red',
  ('zip', '2017-08-16'): 'red'}})

df['observation'] = df.index.map(dict(zip(df2.index, df2.observation)))

Output

               observation
bar 2017-04-27       white
foo 2017-04-25         red
    2017-05-16       black
zap 2017-04-16      yellow
zip 2017-08-16         red

answered Jan 16, 2019 at 14:15

Dani Mesejo

62.2k6 gold badges57 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Josh Friedlander Over a year ago

This is awesome! If it is not an index but just 2 cols a and b, could I use something similar?

Dani Mesejo Over a year ago

I believe you could, just need to set the key of the dictionary properly.

Collectives™ on Stack Overflow

Get subset of Pandas df where multiple columns match values from another df

df1

df2

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

df1

df2

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related