I want to generate a new variable named new_p which takes the values of other variables based on paste0. My approach below produces the new variable, but does not assign the values correctly. It only assigns one value and returns NA for the other values of lev. Thanks for any help.
My data:
tempDF <- structure(list(d1 = c("A", "B", "C"), d2 = c(40L, 50L, 20L),
d3 = c(20L, 40L, 50L), d4 = c(60L, 30L, 30L), p_A = c(1L,
3L, 2L), p_B = c(3L, 4L, 3L), p_C = c(2L, 1L, 1L), p4 = c(5L,
5L, 4L)), class = "data.frame", row.names = c(NA, -3L))
lev<-levels(as.factor(tempDF$d1))
View(tempDF)
My approach:
for(i in seq_along(lev)){
func<-function(tempDF, i, lev){
newDT<-tempDF%>%
mutate(.,
new_p = case_when (
d1 == paste0(lev[i]) ~ .[, paste0("p", "_", lev[i])]
))%>%
as.data.frame(.)
}
newDT<-func(tempDF, i, lev) %>%
as.data.frame(.)
}
View(newDT)
newDT
d1 d2 d3 d4 p_A p_B p_C p4 new_p
A 40 20 60 1 3 2 5 NA
B 50 40 30 3 4 1 5 NA
C 20 50 30 2 3 1 4 1
Expected output:
newDT
d1 d2 d3 d4 p_A p_B p_C p4 new_p
A 40 20 60 1 3 2 5 1
B 50 40 30 3 4 1 5 4
C 20 50 30 2 3 1 4 1
Edit: Barradas's function applied to a larger data:
tempDF <- structure(list(d1 = c("A", "B", "C", "A", "C"), d2 = c(40L, 50L, 20L, 50L, 20L),
d3 = c(20L, 40L, 50L, 40L, 50L), d4 = c(60L, 30L, 30L,60L, 30L), p_A = c(1L,
3L, 2L, 3L, 2L), p_B = c(3L, 4L, 3L, 3L, 4L), p_C = c(2L, 1L, 1L,2L, 1L), p4 = c(5L,
5L, 4L, 5L, 4L)), class = "data.frame", row.names = c(NA, -5L))
View(tempDF)
lev<-levels(as.factor(tempDF$d1))
func <- function(tempDF, lev){
i <- match(tempDF$d1, lev)
j <- match(paste0("p", "_", lev), names(tempDF))
tempDF$new_p <- tempDF[cbind(i, j)]
tempDF
}
newDT <- func(tempDF, lev)
Warning message:
In cbind(i, j) :
number of rows of result is not a multiple of vector length (arg 2)
View(newDT)
newDT
d1 d2 d3 d4 p_A p_B p_C p4 new_p
A 40 20 60 1 3 2 5 1
B 50 40 30 3 4 1 5 4
C 20 50 30 2 3 1 4 1
A 50 40 60 3 3 2 5 1 //wrong, new_p should be 3, not 1
C 20 50 30 2 4 1 4 3 //wrong, new_p should be 1, not 3