0

I would like to merge/concatenate/... 2 dataframes such that I get the 3rd dataframe below (which is the 1st dataframe + var2 from the 2nd dataframe for each ticker/date combination from the 1st one):

1st dataframe:

 dict1 = [{'date': '2016-11-29','var1': 'x1'},
 { 'date': '2016-11-29','var1': 'x2'},
 { 'date': '2016-11-29','var1': 'x3'},
 {'date': '2016-11-29','var1': 'x4'},
 {'date': '2016-11-30','var1': 'x5'},
 {'date': '2016-11-30','var1': 'x6'}]
 df1 = pd.DataFrame(dict1, index=['ge','jpm','fb', 'msft','ge','jpm'])

2nd dataframe:

 dict2 = [{'date': '2016-11-29','var2': 'y1'},
 { 'date': '2016-11-29','var2': 'y2'},
 { 'date': '2016-11-29','var2': 'y3'},
 {'date': '2016-11-29','var2': 'y4'},
 {'date': '2016-11-30','var2': 'y5'},
 {'date': '2016-11-30','var2': 'y6'},
 {'date': '2016-11-30','var2': 'y7'},
 {'date': '2016-11-30','var2': 'y8'}]
  df2 = pd.DataFrame(dict2, index=['aapl', 'msft','ge','jpm','aapl', 'msft','ge','jpm'])

3rd (target) dataframe:

  dict3 = [{'date': '2016-11-29','var1': 'x1','var2': 'y3'},
 { 'date': '2016-11-29','var1': 'x2','var2': 'y4'},
 { 'date': '2016-11-29','var1': 'x3','var2': 'NaN'},
 {'date': '2016-11-29','var1': 'x4','var2': 'y2'},
 {'date': '2016-11-30','var1': 'x5','var2': 'y7'},
 {'date': '2016-11-30','var1': 'x6','var2': 'y8'}]
 df3 = pd.DataFrame(dict3, index=['ge','jpm','fb', 'msft','ge','jpm'])

Note, that the dataframes are not aligned, so the merging should ensure that the index and the date are identical. That is, index and date are the unique identifiers. For instance in the 3rd dataframe, you can see that the 1st row needs the ticker 'ge' from the date '2016-11-29'. Also, as mentioned, I only need the data that is in df1, anything in df2 beyond that is not interesting (i.e. additional dates or tickers are not relevant).

3
  • 1
    What did you try that didn't work? Commented Dec 2, 2018 at 3:26
  • I am actually not even sure where to start. My first go to approach would have been merge but to my knowledge that works with one unique identifier. Commented Dec 2, 2018 at 3:54
  • Another approach I thought of, was a brute force method. by running a loop, that searches the 2nd dataframe for the correct value but that does not sound very pythonic. Commented Dec 2, 2018 at 3:56

1 Answer 1

2

You may reset the index, merge on the index column and date column, and restore the index:

df1.reset_index().merge(df2.reset_index(), 
                        on=['index', 'date'], how='left')\
                 .set_index('index')
#             date var1 var2
#index                      
#ge     2016-11-29   x1   y3
#jpm    2016-11-29   x2   y4
#fb     2016-11-29   x3  NaN
#msft   2016-11-29   x4   y2
#ge     2016-11-30   x5   y7
#jpm    2016-11-30   x6   y8
Sign up to request clarification or add additional context in comments.

3 Comments

Wow, I did not realize that merge could do that, thanks so much for the help. Just to make sure I understand correctly, why did you have to reset the index of both dataframes? Couldn't you have merged on the actual index and date? Or does on= only accept variables?
You can merge only on indexes or columns, but you cannot mix and match.
Ah I see. Thank you so much for the explanation and for the solution above. Worked very well for me!!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.