0

Suppose that we have two data frames as shown below:

df1 <- data.frame(Team1 = c("A","B","C"), Team2 = c("D","E","F"), Winner = c("A","E","F"))

df2 <- data.frame(Country = c("A","B","C","D","E","F"), Index = c(1,2,3,4,5,6))

What i want is create three columns in df2 as Team1_index, Team2_index, and Winner_index.

Team1 Team2 Winner Team1_index Team2_index Winner_index
A     D      A           1           4            1
B     E      E           2           5            5
C     F      F           3           6            6

I tried many ways but failed. Tips and advice!

7 Answers 7

1

If you just have a small number of columns, you can use the match function as in the example:

df1$Team1_index <- df2$Index[match(df1$Team1, df2$Country)]
df1$Team2_index <- df2$Index[match(df1$Team2, df2$Country)]
df1$Winner_index <- df2$Index[match(df1$Winner, df2$Country)]
df1
Sign up to request clarification or add additional context in comments.

Comments

1

If you have more columns, you may look for more systematic solutions, but if it's really just three cases, this should do:

library("tidyverse")
df1 <- data.frame(Team1 = c("A","B","C"), Team2 = c("D","E","F"), Winner = c("A","E","F"))
df2 <- data.frame(Country = c("A","B","C","D","E","F"), Index = c(1,2,3,4,5,6))

df1 %>% 
  left_join(df2 %>% rename(Team1 = Country), by = "Team1") %>% 
  rename(Team1_Index = Index) %>% 
  left_join(df2 %>% rename(Team2 = Country), by = "Team2") %>% 
  rename(Team2_Index = Index) %>%
  left_join(df2 %>% rename(Winner = Country), by = "Winner") %>% 
  rename(Winner_Index = Index) 
#> Warning: Column `Team1` joining factors with different levels, coercing to
#> character vector
#> Warning: Column `Team2` joining factors with different levels, coercing to
#> character vector
#> Warning: Column `Winner` joining factors with different levels, coercing to
#> character vector
#>   Team1 Team2 Winner Team1_Index Team2_Index Winner_Index
#> 1     A     D      A           1           4            1
#> 2     B     E      E           2           5            5
#> 3     C     F      F           3           6            6

You can safely ignore the warnings.

1 Comment

The easy way out. Thanks!
1

To get new columns as factors :

df1[paste0(colnames(df1),"_index")] <- lapply(df1,factor,df2$Country,df2$Index)
#   Team1 Team2 Winner Team1_index Team2_index Winner_index
# 1     A     D      A           1           4            1
# 2     B     E      E           2           5            5
# 3     C     F      F           3           6            6

To get new columns as numeric :

df1[paste0(colnames(df1),"_index")] <-
  lapply(df1,function(x) as.numeric(as.character(factor(x,df2$Country,df2$Index))))
#   Team1 Team2 Winner Team1_index Team2_index Winner_index
# 1     A     D      A           1           4            1
# 2     B     E      E           2           5            5
# 3     C     F      F           3           6            6

Note that for this specific case (index from 1 incremented by 1), this shorter version works:

df1[paste0(colnames(df1),"_index")] <-
  lapply(df1,function(x) as.numeric(factor(x,df2$Country)))

Comments

0

I have an almost solution with data.table, using melt and dacst to change shape

library(data.table)

df1 <- data.table(Team1 = c("A","B","C"), Team2 = c("D","E","F"), Winner = c("A","E","F")) 
df2 <- data.table(Country = c("A","B","C","D","E","F"), Index = c(1,2,3,4,5,6))

melt(data = df1 , id.vars = )
plouf <- merge(df2,melt(df1,measure = 1:2), by.x = "Country", by.y = "value")
plouf[,winneridx := Index[Country == Winner]]
dcast(plouf,Country+winneridx~variable,value.var = "Index")


   Country winneridx Team1 Team2
1:       A         1     1    NA
2:       B         5     2    NA
3:       C         6     3    NA
4:       D         1    NA     4
5:       E         5    NA     5
6:       F         6    NA     6

Comments

0

This is basically the same as giocomai's answer, just uses purrr to help eliminate duplication:

library(rlang)
library(dplyr)

getIndexCols <- function(df1, df2, colName){
     idxColName <- sym(paste0(colName, "_Index"))
     df1 %>% left_join(df2 %>% rename(!! sym(colName) := Country, !! idxColName := Index))
}


names(df1) %>% purrr::map(~ getIndexCols(df1, df2, .)) %>% reduce(~ left_join(.x, .y))

1 Comment

Complex simplicity. Thanks!
0

You can use chartr This will take into consideration both the country column and the index column:

df3=as.matrix(setNames(df1,paste0(names(df1),"_index")))

cbind(df1,chartr(paste0(df2$Country,collapse=""),paste0(df2$Index,collapse=""),df3))

  Team1 Team2 Winner Team1_index Team2_index Winner_index
1     A     D      A           1           4            1
2     B     E      E           2           5            5
3     C     F      F           3           6            6

you can also do:

cbind(df1,do.call(chartr,c(as.list(sapply(unname(df2),paste,collapse="")),list(df3))))

  Team1 Team2 Winner Team1_index Team2_index Winner_index
1     A     D      A           1           4            1
2     B     E      E           2           5            5
3     C     F      F           3           6            6

1 Comment

This stores the index columns as factors
0

Here is another option for you that uses match and cbind.

df3 <- as.matrix(df1)
colnames(df3) <- paste0(colnames(df3), "_index")

# match the positions
df3[] <- match(df3, df2$Country)
cbind(df1, df3)
#  Team1 Team2 Winner Team1_index Team2_index Winner_index
#1     A     D      A           1           4            1
#2     B     E      E           2           5            5
#3     C     F      F           3           6            6

df3 is created as a matrix, i.e. a vector with dimensions attribute, such that we can replace its entries with the result of match (a vector) right away and don't need to repeat the code for every column.

Or in one go

df1[paste0(colnames(df1), "_index")] <- match(as.matrix(df1), df2$Country)

Note however, that this ignores the index column of df2.


Thanks to @Moody_Mudskipper we could also write this more general as

df1[paste0(colnames(df1), "_index")] <- lapply(df1, function(x) df2$Index[match(x, df2$Country)])

4 Comments

Cool! Thank you.
can be rewritten as df1[paste0(colnames(df3), "_index")] <- lapply(df1,match,df2$Country), but the column Index is ignored, values are just assumed to be equal to row number
This would give a general solution : df1[paste0(colnames(df3), "_index")] <- lapply(df1,function(x) df2$Index[match(x,df2$Country)])
@Moody_Mudskipper Thanks for your useful comments. (Just the df3 should be df1)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.