0

I want to generate a new variable named new_p which takes the values of other variables based on paste0. My approach below produces the new variable, but does not assign the values correctly. It only assigns one value and returns NA for the other values of lev. Thanks for any help.

My data:

tempDF <- structure(list(d1 = c("A", "B", "C"), d2 = c(40L, 50L, 20L), 
    d3 = c(20L, 40L, 50L), d4 = c(60L, 30L, 30L), p_A = c(1L, 
    3L, 2L), p_B = c(3L, 4L, 3L), p_C = c(2L, 1L, 1L), p4 = c(5L, 
    5L, 4L)), class = "data.frame", row.names = c(NA, -3L))

lev<-levels(as.factor(tempDF$d1))

View(tempDF) 

My approach:

for(i in seq_along(lev)){

func<-function(tempDF, i, lev){

newDT<-tempDF%>%
mutate(.,  
        new_p = case_when (
         d1  ==  paste0(lev[i]) ~ .[, paste0("p", "_", lev[i])]
        ))%>%
        as.data.frame(.)
        }

newDT<-func(tempDF, i, lev) %>%
        as.data.frame(.)

}

View(newDT)

newDT
  d1 d2 d3 d4 p_A p_B p_C p4 new_p
   A 40 20 60   1   3   2  5 NA
   B 50 40 30   3   4   1  5 NA
   C 20 50 30   2   3   1  4 1

Expected output:

newDT
  d1 d2 d3 d4 p_A p_B p_C p4 new_p
   A 40 20 60   1   3   2  5 1
   B 50 40 30   3   4   1  5 4
   C 20 50 30   2   3   1  4 1

Edit: Barradas's function applied to a larger data:

tempDF <- structure(list(d1 = c("A", "B", "C", "A", "C"), d2 = c(40L, 50L, 20L, 50L, 20L), 
    d3 = c(20L, 40L, 50L, 40L, 50L), d4 = c(60L, 30L, 30L,60L, 30L), p_A = c(1L, 
    3L, 2L, 3L, 2L), p_B = c(3L, 4L, 3L, 3L, 4L), p_C = c(2L, 1L, 1L,2L, 1L), p4 = c(5L, 
    5L, 4L, 5L, 4L)), class = "data.frame", row.names = c(NA, -5L))

View(tempDF)    

lev<-levels(as.factor(tempDF$d1))

func <- function(tempDF, lev){
  i <- match(tempDF$d1, lev)
  j <- match(paste0("p", "_", lev), names(tempDF))
  tempDF$new_p <- tempDF[cbind(i, j)]
  tempDF
}

newDT <- func(tempDF, lev)

Warning message:
In cbind(i, j) :
  number of rows of result is not a multiple of vector length (arg 2)

View(newDT)

newDT
  d1 d2 d3 d4 p_A p_B p_C p4 new_p
   A 40 20 60   1   3   2  5     1
   B 50 40 30   3   4   1  5     4
   C 20 50 30   2   3   1  4     1
   A 50 40 60   3   3   2  5     1  //wrong, new_p should be 3, not 1
   C 20 50 30   2   4   1  4     3  //wrong, new_p should be 1, not 3

1 Answer 1

2

You don't need loops nor pipes to do what the question asks for, match and plain dataframe extraction can solve the problem.

func <- function(tempDF, lev){
  i <- match(tempDF$d1, lev)
  j <- match(paste0("p", "_", lev), names(tempDF))
  tempDF$new_p <- tempDF[cbind(i, j)]
  tempDF
}

newDT <- func(tempDF, lev)

newDT
#  d1 d2 d3 d4 p_A p_B p_C p4 new_p
#1  A 40 20 60   1   3   2  5     1
#2  B 50 40 30   3   4   1  5     4
#3  C 20 50 30   2   3   1  4     1

Edit.

The following function, returns the correct output, with both the original data and with the bigger one.

func <- function(DF, levs){
  i <- sapply(levs, function(l) which(DF$d1 == l))
  j <- rep(match(paste0("p", "_", levs), names(DF)), lengths(i))
  i <- unlist(i)
  o <- cbind(unlist(i),j)
  o <- o[order(o[,1]),]
  DF$new_p <- DF[o]
  DF
}
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks, @Rui_Barradas. This is great and very helpful!
Hi, Barradas. Again thanks for this. But when I apply the function to a larger data, it gives wrong output. For example, vide my edited question.
@Rui_Barradas, any thoughts?
@Rui_Barradas, Great. Thanks again for your effort to help me. This indeed works. Well done.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.