0

I want to add the values in Dataframe_B, column 'Mdist', to the corresponding ones in Dataframe_A, column 'Pdist'. The correspondence is based on matching the names of pairs in rows, in the columns 'ind_comp_a' and 'ind_comp_b' in both dataframes. I want the pairs in Dataframe_A that are not represented in Dataframe_B to remain unchanged.

Below is an example dataset:

Dataframe_A <- data.frame(ind_comp_a = c("OP2645ii_cPOR","OP2645ii_cPOR","OP2645ii_cPOR","OP2645ii_cPOR","OP2645ii_cPOR","OP2645ii_cPOR","OP2645ii_cPOR","OP2645ii_cPOR","OP2645ii_cPOR","OP2645ii_cPOR","OP2645ii_cPOR","OP2645ii_cPOR","OP2645ii_cPOR","OP2645ii_cPOR","OP2645ii_cPOR","OP2645ii_cPOR","OP2645ii_cPOR","OP2645ii_cPOR","OP2645ii_cPOR","OP2645ii_aPOR","OP2645ii_cPOR","OP2645ii_cPOR","OP2645ii_cPOR","OP2645ii_cPOR","OP2645ii_aPOR","OP2645ii_cPOR","OP2645ii_ePOR","OP2645ii_cPOR","OP2645ii_ePOR","OP2645ii_dPOR","OP2645ii_aPOR","OP2645ii_cPOR","OP2645ii_cPOR","OP2645ii_cPOR","OP2645ii_dPOR","OP2645ii_cPOR","OP2645ii_ePOR","OP2413iiiaMOU","OP5046___eWAT","OP2413iiicMOU","OP2413iiibMOU","OP2645ii_aPOR","OP2645ii_aPOR","OP5046___eWAT","OP5048___bPHA","OP5046___eWAT","OP2645ii_aPOR","OP5046___dWAT","OP5046___aWAT","OP5046___aWAT","OP2645ii_aPOR","OP5048___fPHA","OP2645ii_aPOR","OP5046___dWAT","OP2645ii_ePOR","OP2645ii_dPOR","OP2645ii_aPOR","OP2645ii_cPOR","OP2645ii_bPOR","OP2645ii_aPOR","OP2645ii_aPOR","OP5046___aWAT","OP2645ii_dPOR","OP5048___bPHA","OP5048___fPHA","OP5046___dWAT","OP5046___eWAT","OP5046___eWAT","OP2645ii_aPOR","OP2645ii_ePOR","OP2645ii_dPOR","OP5048___bPHA","OP2645ii_ePOR","OP2645ii_ePOR","OP2645ii_bPOR","OP2645ii_ePOR","OP2645ii_aPOR","OP5048___ePHA","OP5046___eWAT","OP2645ii_cPOR","OP2645ii_dPOR","OP2645ii_cPOR","OP5048___fPHA","OP2645ii_dPOR","OP2645ii_dPOR","OP2645ii_ePOR","OP2645ii_bPOR","OP5048___ePHA","OP5046___eWAT","OP2645ii_ePOR","OP2645ii_ePOR","OP5046___aWAT","OP2645ii_aPOR","OP5044___cMOU","OP2645ii_cPOR","OP2645ii_ePOR","OP5046___dWAT","OP2645ii_dPOR","OP5046___aWAT","OP5046___bWAT","OP2645ii_aPOR","OP2645ii_aPOR","OP5046___bWAT","OP5053DNAaPHA","OP5048___ePHA","OP2645ii_dPOR","OP5048___dPHA","OP5046___bWAT","OP5046___aWAT","OP2645ii_ePOR","OP2413iiibMOU","OP2413iiicMOU","OP2645ii_dPOR","OP5044___cMOU","OP2645ii_bPOR","OP2645ii_aPOR","OP2413iiiaMOU","OP2645ii_dPOR","OP5046___bWAT","OP5046___dWAT","OP2645ii_aPOR","OP5048___bPHA","OP5051DNAbCOM","OP5046___dWAT","OP2413iiibMOU","OP2413iiicMOU","OP2645ii_bPOR","OP5049___bWAT","OP5046___aWAT","OP2413iiiaMOU","OP5046___cWAT","OP5046___bWAT","OP5046___bWAT","OP2645ii_dPOR","OP2645ii_bPOR","OP5043___aWAT","OP5048___fPHA","OP2645ii_dPOR","OP5046___bWAT","OP5046___eWAT","OP5048___hPHA","OP5046___bWAT","OP5048___hPHA","OP2645ii_ePOR","OP5048___cPHA","OP5046___cWAT","OP5048___ePHA","OP5048___ePHA","OP5046___bWAT","OP5046___cWAT","OP2645ii_aPOR","OP2645ii_ePOR","OP2645ii_aPOR","OP5048___bPHA","OP5048___cPHA","OP5043___bWAT","OP5046___dWAT","OP5048___hPHA","OP2645ii_cPOR","OP5048___fPHA","OP5048___bPHA","OP2645ii_dPOR","OP2645ii_bPOR","OP5048___dPHA","OP5053DNAaPHA","OP5048___fPHA","OP2645ii_aPOR","OP2645ii_ePOR","OP5044___cMOU","OP5046___eWAT","OP2645ii_bPOR","OP2645ii_bPOR","OP5048___fPHA","OP5044___cMOU","OP2645ii_ePOR"),
                          ind_comp_b = c("OP5048___bPHA","OP5051DNAbCOM","OP5046___bWAT","OP5048___fPHA","OP5043___bWAT","OP5043___aWAT","OP5047___bPHA","OP5052DNAaWAT","OP5053DNAcPHA","OP5048___ePHA","OP5049___aWAT","OP5046___cWAT","OP5053DNAaPHA","OP5046___eWAT","OP5044___aMOU","OP5051DNAaCOM","OP5048___cPHA","OP5048___gPHA","OP5054DNAbMOU","OP5048___bPHA","OP5048___hPHA","OP5047___aPHA","OP5053DNAbPHA","OP5051DNAcCOM","OP5048___fPHA","OP5046___aWAT","OP5048___bPHA","OP5049___bWAT","OP5048___fPHA","OP5048___bPHA","OP5048___ePHA","OP5044___bMOU","OP3088i__aPOR","OP5046___dWAT","OP5048___fPHA","OP5054DNAcMOU","OP5048___ePHA","OP2645ii_cPOR","OP5048___fPHA","OP2645ii_cPOR","OP2645ii_cPOR","OP5051DNAbCOM","OP5053DNAaPHA","OP5048___bPHA","OP5054DNAaMOU","OP5048___ePHA","OP5043___aWAT","OP5048___bPHA","OP5048___bPHA","OP5048___fPHA","OP5046___bWAT","OP5054DNAaMOU","OP5043___bWAT","OP5048___fPHA","OP5053DNAaPHA","OP5048___ePHA","OP5048___cPHA","OP5044___cMOU","OP5048___bPHA","OP5048___hPHA","OP5047___bPHA","OP5048___ePHA","OP5051DNAbCOM","OP5049___bWAT","OP5049___bWAT","OP5048___ePHA","OP5053DNAaPHA","OP5048___hPHA","OP5053DNAcPHA","OP5043___aWAT","OP5046___bWAT","OP5054DNAcMOU","OP5048___cPHA","OP5048___hPHA","OP5048___fPHA","OP5051DNAbCOM","OP5047___aPHA","OP5054DNAaMOU","OP5048___cPHA","OP5048___dPHA","OP5053DNAaPHA","OP5054DNAaMOU","OP5054DNAcMOU","OP5043___aWAT","OP5043___bWAT","OP5043___bWAT","OP2645ii_cPOR","OP5049___bWAT","OP5047___aPHA","OP5047___aPHA","OP5046___bWAT","OP5053DNAaPHA","OP5052DNAaWAT","OP5048___bPHA","OP2645ii_ePOR","OP5047___bPHA","OP5053DNAaPHA","OP5047___bPHA","OP5048___hPHA","OP5048___dPHA","OP5048___gPHA","OP5049___aWAT","OP5048___ePHA","OP5054DNAaMOU","OP5054DNAcMOU","OP5048___cPHA","OP5051DNAbCOM","OP5048___fPHA","OP5048___cPHA","OP5053DNAcPHA","OP5048___bPHA","OP5048___bPHA","OP5048___hPHA","OP5048___fPHA","OP5048___ePHA","OP5053DNAbPHA","OP5048___bPHA","OP5053DNAcPHA","OP5048___hPHA","OP5048___hPHA","OP5046___cWAT","OP5048___dPHA","OP5054DNAaMOU","OP5048___cPHA","OP5048___fPHA","OP5048___fPHA","OP5051DNAbCOM","OP5053DNAaPHA","OP5047___aPHA","OP5048___fPHA","OP5048___fPHA","OP5048___cPHA","OP5048___bPHA","OP5047___aPHA","OP5046___bWAT","OP5054DNAaMOU","OP5052DNAaWAT","OP5052DNAaWAT","OP5047___aPHA","OP5048___dPHA","OP5049___bWAT","OP5054DNAaMOU","OP5054DNAaMOU","OP5048___gPHA","OP5054DNAaMOU","OP5048___bPHA","OP5051DNAbCOM","OP5052DNAaWAT","OP5053DNAaPHA","OP5048___ePHA","OP5051DNAaCOM","OP5053DNAbPHA","OP5044___aMOU","OP5051DNAcCOM","OP5049___bWAT","OP5054DNAaMOU","OP5047___aPHA","OP5051DNAbCOM","OP2645ii_dPOR","OP5051DNAcCOM","OP5052DNAaWAT","OP5049___aWAT","OP5053DNAaPHA","OP5048___fPHA","OP5054DNAcMOU","OP5051DNAbCOM","OP5054DNAbMOU","OP5052DNAaWAT","OP5048___ePHA","OP5053DNAbPHA","OP5043___bWAT","OP5043___aWAT","OP5049___aWAT","OP5051DNAbCOM","OP5049___aWAT"),
                           Pdist = c(0.12653736,0.12545262,0.12409420,0.12023167,0.11852507,0.11574044,0.11371805,0.11165877,0.11096499,0.11000436,0.10860921,0.10716355,0.10648404,0.10457088,0.10043985,0.10043419,0.09902992,0.09809625,0.09742466,0.09706079,0.09691789,0.09532336,0.09374877,0.09359057,0.09352572,0.09191749,0.09136457,0.08965083,0.08872891,0.08630526,0.08531594,0.08454861,0.08453494,0.08312192,0.08258318,0.08140542,0.08140466,0.08083571,0.08036883,0.07964833,0.07964736,0.07930556,0.07916955,0.07909909,0.07871759,0.07749702,0.07735318,0.07692221,0.07663146,0.07655228,0.07610728,0.07601355,0.07589804,0.07586683,0.07475816,0.07427158,0.07295387,0.07264578,0.07239881,0.07239652,0.07230148,0.07213147,0.07178486,0.07143912,0.07102923,0.07034595,0.07017927,0.07009262,0.06990277,0.06953688,0.06945218,0.06933059,0.06923690,0.06918330,0.06905105,0.06894675,0.06886782,0.06873706,0.06835633,0.06827398,0.06818929,0.06815169,0.06781528,0.06755839,0.06709807,0.06673160,0.06651507,0.06631521,0.06577319,0.06527915,0.06521944,0.06479374,0.06450183,0.06444880,0.06439217,0.06363232,0.06313289,0.06312447,0.06301823,0.06299480,0.06277461,0.06277369,0.06274871,0.06205441,0.06190890,0.06190525,0.06183778,0.06180255,0.06174675,0.06142775,0.06142015,0.06141977,0.06132026,0.06126746,0.06121289,0.06106807,0.06069853,0.06060409,0.06057873,0.06002828,0.05988876,0.05983741,0.05952482,0.05916929,0.05912005,0.05911979,0.05906816,0.05899453,0.05865145,0.05853252,0.05818659,0.05785562,0.05784148,0.05781387,0.05760903,0.05755058,0.05742954,0.05731918,0.05701451,0.05701384,0.05698890,0.05686745,0.05665475,0.05662290,0.05661457,0.05648999,0.05641717,0.05638154,0.05633743,0.05630275,0.05624860,0.05594854,0.05594397,0.05581496,0.05577077,0.05576073,0.05571763,0.05552730,0.05545187,0.05541380,0.05508725,0.05495578,0.05481013,0.05478274,0.05476202,0.05470291,0.05452429,0.05403781,0.05369966,0.05355532,0.05337705,0.05334701,0.05318317,0.05289062,0.05281420))


Dataframe_B <- data.frame(ind_comp_a = c("OP5054DNAbMOU","OP5044___cMOU","OP5051DNAbCOM","OP5044___bMOU","OP5047___aPHA","OP5049___aWAT","OP5044___aMOU","OP5046___eWAT","OP5048___dPHA","OP5048___bPHA","OP5047___bPHA","OP5053DNAaPHA","OP5048___hPHA","OP5048___fPHA","OP2645ii_bPOR","OP5048___cPHA","OP5046___cWAT","OP2645ii_dPOR","OP5043___bWAT","OP2645ii_cPOR","OP3088i__aPOR","OP5048___ePHA","OP5046___aWAT","OP5046___dWAT","OP5046___bWAT","OP2413iiicMOU"),
                          group_a = c(2,2,3,2,5,3,2,5,2,5,1,5,5,5,2,5,4,4,3,4,1,5,4,4,3,1),
                          ind_comp_b = c("OP5054DNAcMOU","OP5046___dWAT","OP5053DNAbPHA","OP5048___ePHA","OP5049___bWAT","OP5054DNAaMOU","OP5049___bWAT","OP5054DNAcMOU","OP5048___hPHA","OP5049___bWAT","OP5049___bWAT","OP5053DNAcPHA","OP5052DNAaWAT","OP5049___bWAT","OP5049___bWAT","OP5049___bWAT","OP5051DNAcCOM","OP5048___ePHA","OP5046___dWAT","OP5053DNAcPHA","OP5048___ePHA","OP5051DNAaCOM","OP5054DNAaMOU","OP5052DNAaWAT","OP5054DNAaMOU","OP2645ii_aPOR"),
                          group_b = c(2,4,5,5,1,1,4,2,5,4,4,1,1,4,4,4,1,5,4,1,5,1,1,1,1,4),
                          Mdist = c(2.092198,15.490914,15.702724,16.853118,25.458256,18.765831,25.949452,28.394883,20.628395,22.628845,28.756217,28.890253,29.616514,31.830067,28.096185,26.786987,27.274954,22.497309,19.044275,16.573599,31.643230,21.878292,15.605599,29.423213,19.672345,27.408871))

Many thanks in advance Deon

2 Answers 2

2

I'm not sure if this is the most elegant way to accomplish this, but you can use tidyr::unite to create a new unique id based on ind_comp_a and ind_comp_b then base::merge using this id?

library(tidyr)
#Create new_id column for merge
add_ida=tidyr::unite(Dataframe_A,new_id,ind_comp_a,ind_comp_b,remove=F)
add_idb=tidyr::unite(Dataframe_B,new_id,ind_comp_a,ind_comp_b,remove=F)

#Now do a left join to keep all Dataframe_A pairs that don't appear in Dataframe_B pairs
left = merge(add_ida,add_idb,by = 'new_id',all.x=T)

If I'm understanding you correctly then I hope this helps!

Sign up to request clarification or add additional context in comments.

Comments

0

I would select the columns using dplyr::select() then use tidyr::spread() to make the keys column names and then summarize

https://tidyr.tidyverse.org/reference/spread.html

https://dplyr.tidyverse.org/reference/select.html

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.