1

I have a data frame df1

df1<- data.frame(ID = c("A","B","A","A","B"),CLASS = c(1,1,2,1,4))
 ID CLASS
1  A     1
2  B     1
3  A     2
4  A     1
5  B     4

and another two data frames A and B

   > A<- data.frame(CLASS = c(1,2,3), DESCRIPTION = c("Unknown", "Tall", "Short"))
  CLASS DESCRIPTION
1     1     Unknown
2     2        Tall
3     3       Short

> B <- data.frame(CLASS = c(1,2,3,4), DESCRIPTION = c("Big", "Small", "Medium", "Very Big"))
  CLASS DESCRIPTION
1     1         Big
2     2       Small
3     3      Medium
4     4    Very Big

I want to merge these three data frames depending on the ID and class of df1 to have something like this:

      ID CLASS DESCRIPTION
1  A     1     Unknown
2  B     1         Big
3  A     2        Tall
4  A     1     Unknown
5  B     4    Very Big

I know I can merge it as df1 <- merge(df1, A, by = "CLASS") but I can't find a way to add the conditional (maybe an "if" is too much) to also merge B according to the ID. I need to have an efficient way to do this as I am applying it to over 2M rows.

2
  • Could you explain row 5 from df1? How does it end up with CLASS = 4? Commented Jul 31, 2018 at 21:12
  • @coffeinjunky Sorry, it was meant to be 4 in df1. Already edited. Commented Jul 31, 2018 at 21:14

3 Answers 3

2

Add the ID variable to A and B, rbind A and B together, and use ID and CLASS to merge:

A$ID = 'A'
B$ID = 'B'

AB <- rbind(A, B)

merge(df1, AB, by = c('ID', 'CLASS'))

  ID CLASS DESCRIPTION
1  A     1     Unknown
2  A     1     Unknown
3  A     2        Tall
4  B     1         Big
5  B     4    Very Big

I would suggest using stringsAsFactors = FALSE when creating the data:

df1 <- data.frame(ID = c("A","B","A","A","B"),CLASS = c(1,1,2,1,4),
                  stringsAsFactors = FALSE)
A <- data.frame(CLASS = c(1,2,3), 
                DESCRIPTION = c("Unknown", "Tall", "Short"),
                stringsAsFactors = FALSE)
B <- data.frame(CLASS = c(1,2,3,4), 
                DESCRIPTION = c("Big", "Small", "Medium", "Very Big"),
                stringsAsFactors = FALSE)
Sign up to request clarification or add additional context in comments.

Comments

1

To merge multiple dataframes in one go, Reduce is often helpful:

out <- Reduce(function(x,y) merge(x,y, by = "CLASS", all.x=T), list(df1, A, B))
out
  CLASS ID DESCRIPTION.x DESCRIPTION.y
1     1  A       Unknown           Big
2     1  B       Unknown           Big
3     1  A       Unknown           Big
4     2  A          Tall         Small
5     4  B          <NA>      Very Big

As you can see, columns that were present in all dataframes were added a suffix (default merge behavior). This allows you to apply whatever logic you want to get the final column you wish for. For instance,

out$Description <- ifelse(out$ID == "A", as.character(out$DESCRIPTION.x), as.character(out$DESCRIPTION.y))
> out
  CLASS ID DESCRIPTION.x DESCRIPTION.y Description
1     1  A       Unknown           Big     Unknown
2     1  B       Unknown           Big         Big
3     1  A       Unknown           Big     Unknown
4     2  A          Tall         Small        Tall
5     4  B          <NA>      Very Big    Very Big

Note that ifelse is vectorized and quite efficient.

Comments

1

A dplyr solution:

library(dplyr)
bind_rows(lst(A,B),.id="ID") %>% inner_join(df1)
#   ID CLASS DESCRIPTION
# 1  A     1     Unknown
# 2  A     1     Unknown
# 3  A     2        Tall
# 4  B     1         Big
# 5  B     4    Very Big

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.