0

I have two dataframes with multi-indexes looking like this:

df1

pd.DataFrame({'observation': {('foo', '2017-04-16'): 'green',
  ('bar', '2017-04-25'): 'red',
  ('zap', '2017-04-16'): 'red',
  ('zip', '2017-04-25'): 'blue',
  ('zip', '2017-04-16'): 'white'},
 'observation': {('zap', '2017-04-16'): np.nan,
  ('bar', '2017-04-27'): 'white',
  ('foo', '2017-05-16'): np.nan,
  ('foo', '2017-04-25'): 'red',
  ('zip', '2017-08-16'): 'red'}})

df2

pd.DataFrame({'foo': {('00', '08'): '0.0',
  ('01', '08'): '0.0',
  ('01', '08'): '0.0',
  ('00', '08'): '1.0',
  ('03', '08'): '1.0',
  ('06', '08'): '0.0',
  ('00', '08'): '1.0',
  ('00', '08'): '1.0',
  ('00', '08'): '0.0',
  ('02', '08'): '0.0'},
 'client_id': {('00', '08'): '1.0',
  ('01', '08'): '1.0',
  ('01', '08'): '1.0',
  ('00', '08'): '1.0',
  ('03', '08'): '1.0',
  ('06', '08'): '1.0',
  ('00', '08'): '1.0',
  ('00', '08'): '1.0',
  ('00', '08'): '1.0',
  ('02', '08'): '1.0'},
 'execution_date': {('00', '08'): '2019-01-09',
  ('01', '08'): '2019-01-09',
  ('01', '08'): '2019-01-09',
  ('00', '08'): '2019-01-09',
  ('03', '08'): '2019-01-09',
  ('06', '08'): '2019-01-09',
  ('00', '08'): '2019-01-09',
  ('00', '08'): '2019-01-09',
  ('00', '08'): '2019-01-09',
  ('02', '08'): '2019-01-09'},
 'del': {('00', '08'): '0.0',
  ('01', '08'): '0.0',
  ('01', '08'): '0.0',
  ('00', '08'): '0.0',
  ('03', '08'): '0.0',
  ('06', '08'): '0.0',
  ('00', '08'): '0.0',
  ('00', '08'): '0.0',
  ('00', '08'): '0.0',
  ('02', '08'): '0.0'},
 'act': {('00', '08'): '11',
  ('01', '08'): '03',
  ('01', '08'): '06',
  ('00', '08'): '07',
  ('03', '08'): '07',
  ('06', '08'): '11',
  ('00', '08'): '28',
  ('00', '08'): '08',
  ('00', '08'): '14',
  ('02', '08'): '26'},
 'obs': {('00', '08'): '02',
  ('01', '08'): '02',
  ('01', '08'): '02',
  ('00', '08'): '02',
  ('03', '08'): '02',
  ('06', '08'): '02',
  ('00', '08'): '02',
  ('00', '08'): '02',
  ('00', '08'): '02',
  ('02', '08'): '02'}})

The two are not the same size, and the values don't always overlap, but every index pair found in df1 is in df2. What I would like to do is update the observation col in the df1 with the values of observation in df2, wherever it matches.

In other words, I would like to do the equivalent of an inner join based on the multi-index, and then overwrite the values in observation in df1 with those from df2. But is there a way to do this in one step, using loc/indexing? (This is structured as an index problem, but if there is a way to solve it using reset_index() that would be fine too.)

Desired output:

        obs
00  04  30
    08  02
    09  16
    10  26
    16  26
01  01  30
    07  16
02  08  02
03  13  26
07  15  26
2
  • Do you mean update the observation column? Commented Jan 16, 2019 at 14:12
  • Yes - overwrite it Commented Jan 16, 2019 at 14:13

1 Answer 1

1

If I understood correctly, you could do:

df2 = pd.DataFrame({'observation': {('foo', '2017-04-16'): 'green',
  ('bar', '2017-04-25'): 'red',
  ('zap', '2017-04-16'): 'red',
  ('zip', '2017-04-25'): 'blue',
  ('zip', '2017-04-16'): 'white'},
 'observation': {('zap', '2017-04-16'): 'yellow',
  ('bar', '2017-04-27'): 'white',
  ('foo', '2017-05-16'): 'black',
  ('foo', '2017-04-25'): 'red',
  ('zip', '2017-08-16'): 'red'}})

df['observation'] = df.index.map(dict(zip(df2.index, df2.observation)))

Output

               observation
bar 2017-04-27       white
foo 2017-04-25         red
    2017-05-16       black
zap 2017-04-16      yellow
zip 2017-08-16         red
Sign up to request clarification or add additional context in comments.

2 Comments

This is awesome! If it is not an index but just 2 cols a and b, could I use something similar?
I believe you could, just need to set the key of the dictionary properly.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.