1

This is the sample dataframe I'm working with:

numbers <- data.frame(value=c(-5000,1,2,100,200,300,400,500, 1000, 2000, 10000, 12000))

I'm looking to create a new column in this data frame called "output" that contains values as follows:
-Same value as in column "value" if value between 1 and 10000
-10000 if the value in column "value" is more than 10000 and
-1 if the value in column "value" is less than 1

Desired output in the new column "output": 1,1,2,100,200,300,400,500, 1000,2000, 10000, 10000.

I would really like to learn how to use for loop, if, else if and else statements to get this output and have tried the following:

for (i in 1:nrow (numbers$value)){
  if (numbers$value[i] >10000){
    numbers$output <- 10000)
  } else if (numbers$value[i] < 1){
    numbers$output <- 1)
  } else {
    numbers$output <- numbers$value)
  }
}

Unfortunately, this gives me an error, Error: unexpected '}' in "}"

Appreciate your help in fixing this code!

1
  • This is not the question, but I suggest to consider case_when in cases with more then two if. library(dplyr) numbers %>% mutate(newCol = case_when(between(value, 1,10000) ~ value, value > 10000 ~ 10000, value < 1 ~ 1)) Commented Dec 4, 2021 at 6:14

3 Answers 3

4

I see why you are trying to solve this problem with a for loop (I have been there..). In R, there is a useful thing called vectorization. You can use the *apply family to apply a function over an input vector. By this, you give the function an input, and you automatically get an output of the same length.

sapply(numbers$value, function(x){
  if (x >10000) return(10000)
  else if (x < 1) return(1)
  else return(x)
}) -> numbers$output
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks so much! This works great.
As a follow up, can I ask how I could replicate this code for multiple columns in the same dataframe. Example dataframe: numbersnew <- data.frame(value1=c(-5000,1,2,100,200,300,400,500, 1000, 2000, 10000, 12000), value2= c(-4000, 3,4, 150, 250, 350, 450, 550, 1050, 2050, 10050, 12050))
purrr.tidyverse.org/reference/map_if.html - map_if will let you choose columns to apply the function to
4

There are several errors in the original code: Not initializing the output variable, unmatched and unneeded ")", not using subscripts when necessary, and other errors. See the corrected code below.

numbers <- data.frame(value=c(-5000,1,2,100,200,300,400,500, 1000, 2000, 10000, 12000))
numbers$output<-NA
for (i in 1:nrow(numbers)){
   if (numbers$value[i] >10000){
      numbers$output[i] <- 10000
   } else if (numbers$value[i] < 1){
      numbers$output[i] <- 1
   } else {
      numbers$output[i] <- numbers$value[i]
   }
}
numbers

Here is a solution more straight forward using the case_when function from the dplyr package:

numbers <- data.frame(value=c(-5000,1,2,100,200,300,400,500, 1000, 2000, 10000, 12000))
library(dplyr)
numbers$output <-case_when(
   numbers$value >10000 ~ 10000,
   numbers$value < 1 ~ 1,
   TRUE ~ numbers$value. #default case
)
numbers

1 Comment

Thanks so much for correcting the code and offering another solution! Appreciate it
1

in the spirit of ifelse-ness you could also use the ifelse() function.

numbers$output <- ifelse(numbers$value > 1000, 1000, ifelse(numbers$value < 0, 1, numbers$value))

2 Comments

ifelse(number$value, < 1, -1, min(number$value, 10000))
smart, still a comma too much

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.