2

I'm currently using case_when to define a new variable in my data as such:

data[,46] <- NA

data[,46] <- case_when(
   data[,35] ==  1 ~ data[,36],
   data[,35] ==  2 ~ data[,37],
   data[,35] ==  3 ~ data[,38],
   data[,35] ==  4 ~ data[,39],
   data[,35] ==  5 ~ data[,40],
   data[,35] ==  6 ~ data[,41],
   data[,35] ==  7 ~ data[,42],
   data[,35] ==  8 ~ data[,43],
   data[,35] ==  9 ~ data[,44],
   data[,35] ==  10 ~ data[,45]
)

I'm trying to write a loop to make this function more efficient, but am running into some trouble. Here is what I have attempted:

for (j in 1:10) {
data[,46] <- case_when(
   data[,35] ==  j ~ data[,35+j]
)
}

However, this is returning NAs for all of my values of data[,46]. Any thoughts on what might be going wrong? I would be happy to provide sample data if necessary, but I'm thinking this is more related to me making a simple programming mistake. Thanks in advance!

4
  • 1
    This seems like a better problem so solve by shaping your data with tidyr perhaps. It would be easier to help if you provided a simple reproducible example with sample input and desired output that can be used to test and verify possible solutions. Show what your real goal is rather than just the code you tried to write to solve it. Commented Oct 8, 2018 at 18:29
  • 3
    Just do data[, 35] <- data[, 35 + data[, 35]]? Commented Oct 8, 2018 at 18:37
  • @RuiBarradas, post your comment as answer ... ?? Commented Oct 8, 2018 at 19:00
  • @BenBolker Will do. Commented Oct 8, 2018 at 19:26

2 Answers 2

3

All you have to do is to remember that R is vectorized.
You are comparing data[, 35] to the integers 1 to 10 and for each of these assign data[, 35 + <1 to 10>] back to data[, 35]. So all you have to do is

data[, 35] <- data[, 35 + data[, 35]]

If there are values in data[, 35] not in 1:10 then an ifelse will be more appropriate.

data[, 35] <- ifelse(data[, 35] %in% 1:10, data[, 35 + data[, 35]], data[, 35])
Sign up to request clarification or add additional context in comments.

2 Comments

Not exactly. I'm checking to see whether data[,35] is equal to the values of 1-10 and depending on that, inputting data[,36] into data[,46] into the values where data[,35] == 1, data[,37] into data[,46] when data[,35]==2, etc. Doing data[, 35] <- data[, 35 + data[, 35]] gives me the following error: Error in [.data.frame(data, , 35 + data[, 35]) : undefined columns selected
@Zereg Then you must have values not in 1-10. See the edit.
1

You may need [j] as shown below to store its iteration in data[,46]

for (j in 1:10) {
data[,46][j]<- case_when(
   data[,35] ==  j ~ data[,35+j]
)}

2 Comments

Thank you! Your solution worked for me about an hour ago... but now I feel like I'm going crazy because it's not replicating. I'm getting this error now: for (j in 2:10) { data[,46][j] <- case_when( data$since == 1 ~ lag(data[,31], 1), data$since == j ~ data[,36+j] ) } (I know the code is a bit different, I kept the example in the original post simple to make the question as easy to answer as possible). Any thoughts as to what's going on?
It’s hard to understand without knowing your data fully. The lag function may be causing the result stored in data[,46] to be smaller than the dimensions of the data frame, ie you have 1 result short of the number of rows for your data frame..

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.