1

I wanted to create new columns for my data.table based on ratio calculation. The names of my variables are slightly in a standard way so I think there must be a way to easily achieve this in data.table. However I am not able to get how to achieve this. Below is my sample data and code -

set.seed(1200)

ID <- seq(1001,1100)
region <- sample(1:10,100,replace = T)
Q21 <- sample(1:5,100,replace = T)
Q22 <- sample(1:15,100,replace = T)
Q24_LOC_1 <- sample(1:8,100,replace = T)
Q24_LOC_2 <- sample(1:8,100,replace = T)
Q24_LOC_3 <- sample(1:8,100,replace = T)
Q24_LOC_4 <- sample(1:8,100,replace = T)

Q21_PAN <- sample(1:5,100,replace = T)
Q22_PAN <- sample(1:15,100,replace = T)
Q24_LOC_1_PAN <- sample(1:8,100,replace = T)
Q24_LOC_2_PAN <- sample(1:8,100,replace = T)
Q24_LOC_3_PAN <- sample(1:8,100,replace = T)
Q24_LOC_4_PAN <- sample(1:8,100,replace = T)

df1 <- as.data.table(data.frame(ID,region,Q21,Q22,Q24_LOC_1,Q24_LOC_2,Q24_LOC_3,Q24_LOC_4,Q21_PAN,Q22_PAN,Q24_LOC_1_PAN,Q24_LOC_2_PAN,Q24_LOC_3_PAN,Q24_LOC_4_PAN))

col_needed <- c("Q21","Q22","Q24_LOC_1","Q24_LOC_2","Q24_LOC_3","Q24_LOC_4")

check1 <- df1[,Q21_R := mean(Q21,na.rm = T)/mean(Q21_PAN,na.rm = T),by=region]

check1 works for one variable. I was looking for a solution where I can pass all needed variables and get the new variables calculated in a single line. So in this case something like passing col_needed. I tried below code as well -

check2 <- df1[,`:=`(paste0(col_needed,"_R"),(mean(col_needed,na.rm = T)/mean(paste0(col_needed,"_PAN"),na.rm = T))),by=region][]

However this gives me multiple warnings and the result is having all NAs. The warnings are - In mean(col_needed, na.rm = T) : argument is not numeric or logical: returning NA

Can you please suggest where I am going wrong.

1 Answer 1

4

If I understand correctly, you could do the following:

df1[, paste(col_needed, "R", sep = "_") := 
      Map(function(x,y) mean(get(x), na.rm = TRUE)/mean(get(y), na.rm=TRUE), 
           col_needed, 
           paste(col_needed, "PAN", sep = "_")),
    by=region]
Sign up to request clarification or add additional context in comments.

5 Comments

Beat me to it. Alternatively, could use the .SD without needing get, but it's really just the same thing. Not suggesting a change, just showing a possible super minor tweak
@docendo, Thank you, this works. I understand by Map we are able to pass the function to each element col_needed.
@MikeH.I had seen your post as well before you deleted. Thank you!! However I wanted to confirm with you on the usage of with = FALSE with Map. I am not clear why we had to use with = FALSE. Sorry if I am asking something very basic....still learning R and trying to get my basics clear
@docendodiscimus Thank you for confirming !!
@user1412, I used with = FALSE to select the columns by name. So .SD[, col_needed, with = FALSE] was a data.table of the columns we wanted

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.