2

I have 2 data frames and I would like to add a column to one and add values from another data frame based on matching conditions. I would then like to repeat this for many large datasets.

# Example dataframe

DF1 <- data.frame(Batch = c('a', 'a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'b', 'b'),
           Patch1 = c(1, 1, 2, 2, 3, 3, 1, 1, 2, 2, 3, 3),
           Patch2 = c(2, 3, 1, 3, 1, 2, 2, 3, 1, 3, 1, 2))

DF2 <- data.frame(Batch = c('a', 'a', 'a', 'b', 'b', 'b'),
                             Patch = c(1, 2, 3, 1, 2, 3),
                             Pop_size = sample(1:300, size=6, replace=TRUE)) 

To DF1 I would like to add 2 columns (Patch1_popsize and Patch2_popsize) where the Pop_size is given in DF2 with corresponding Patch numbers for a given Batch

3 Answers 3

2

Try using match() in this style:

#Code
DF1$Patch1_Pop_size <- DF2[match(paste(DF1$Batch,DF1$Patch1),paste(DF2$Batch,DF2$Patch)),"Pop_size"]
DF1$Patch2_Pop_size <- DF2[match(paste(DF1$Batch,DF1$Patch2),paste(DF2$Batch,DF2$Patch)),"Pop_size"]

Output:

DF1
   Batch Patch1 Patch2 Patch1_Pop_size Patch2_Pop_size
1      a      1      2             137             254
2      a      1      3             137             211
3      a      2      1             254             137
4      a      2      3             254             211
5      a      3      1             211             137
6      a      3      2             211             254
7      b      1      2              78              81
8      b      1      3              78              43
9      b      2      1              81              78
10     b      2      3              81              43
11     b      3      1              43              78
12     b      3      2              43              81
Sign up to request clarification or add additional context in comments.

Comments

2

Try this:

DF3 <- merge(DF1,DF2, by.x = c("Batch","Patch1"), by.y=c("Batch","Patch"), all.x=TRUE) %>%
      rename(Pop_size1=Pop_size)
DF3 <- merge(DF3,DF2, by.x = c("Batch","Patch2"), by.y=c("Batch","Patch"), all.x=TRUE) %>%
      rename(Pop_size2=Pop_size)

Comments

0

We can use data.table methods which would be faster and efficient

library(data.table)
setDT(DF1)[DF2, Patch1_Pop_size := Pop_size,
  on = .(Batch, Patch1 = Patch)][DF2, 
   Patch2_Pop_size := Pop_size, on = .(Batch, Patch2 = Patch)]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.