1

I'm a relative beginner with R so apologies for the simplistic question.

I have a simple data frame with columns x, y and z. They all contain numerical values and I'd like to write a piece of code that allows me to replaces a all z values with "115" whenever 300 < x < 600, 0 < y < 100, and z > 160.

Very simple problem but I am not sure why I am having so much trouble figuring out how to piece together code for this. I'm sure its some hodge-podge of replace and ifelse arguments but I can't seem to put it together.

Help is much appreciated! Thanks!

2
  • 1
    It would be easier to help if you create a small reproducible example along with expected output. Read about how to give a reproducible example. Commented May 17, 2021 at 4:27
  • Some of the answers already provided were able to give me the help I needed! Thank you though! Commented May 18, 2021 at 4:19

4 Answers 4

4

This is how I would do it:

library(tidyverse)
set.seed(1)
df <- data_frame("x" = sample(x = 200:700, size = 10, replace = TRUE),
                 "y" = sample(x = 0:400, size = 10, replace = TRUE),
                 "z" = sample(x = 0:200, size = 10, replace = TRUE))
df
#> A tibble: 10 x 3
#>       x     y     z
#>   <int> <int> <int>
#> 1   523    84   109
#> 2   366   276   164
#> 3   328   361    33
#> 4   617   329   105
#> 5   670   262   125
#> 6   498   328    88
#> 7   469    78   171
#> 8   665   212    32
#> 9   386    36    83
#>10   506   104   162

df$z <- ifelse((df$x > 300 & df$x < 600) & (df$y > 0 & df$y < 100) & (df$z > 160), 115, df$z)
df
#> A tibble: 10 x 3
#>       x     y     z
#>   <int> <int> <dbl>
#> 1   523    84   109
#> 2   366   276   164
#> 3   328   361    33
#> 4   617   329   105
#> 5   670   262   125
#> 6   498   328    88
#> 7   469    78   115
#> 8   665   212    32
#> 9   386    36    83
#>10   506   104   162

#(#7 was updated to 115 as it met all the criteria)

Edit

As usual, @TIC's answer is better than mine (fewer steps -> faster) but not by much on my system with a million rows. The data.table method is quickest:

library(tidyverse)
set.seed(1)
df <- data_frame("x" = sample(x = 0:700, size = 1000000, replace = TRUE),
                 "y" = sample(x = 0:400, size = 1000000, replace = TRUE),
                 "z" = sample(x = 0:200, size = 1000000, replace = TRUE))

ifelse_func <- function(df){
  df$z <- ifelse((df$x > 300 & df$x < 600) & (df$y > 0 & df$y < 100) & (df$z > 160), 115, df$z)
}

transform_func <- function(df){
  transform(df, z = replace(z, 300 < x & x < 600 & 0 < y & y < 100 & z > 160, 115))
}

rowsums_func <- function(df){
  df$z[!rowSums(!(df >list(300, 0, 160) & df < list(600, 100, Inf)))] <- 115
}

library(data.table)
dt_func <- function(df){
  setDT(df)
  df[x > 300 & x < 600 & y > 0 & y < 100 & z > 160, z := 115]
}

mbm <- microbenchmark::microbenchmark(ifelse_func(df), transform_func(df),
                                      rowsums_func(df), dt_func(df))
autoplot(mbm)

example_2.png

Edit 2

> system.time(ifelse_func(df))
   user  system elapsed 
  0.064   0.020   0.085 
> system.time(transform_func(df))
   user  system elapsed 
  0.060   0.009   0.069 
> system.time(rowsums_func(df))
   user  system elapsed 
  0.090   0.021   0.110 
> system.time(dt_func(df))
   user  system elapsed 
  0.036   0.003   0.039 
Sign up to request clarification or add additional context in comments.

2 Comments

Worked like a charm! Frustrated to see that I was nearly there with my own code but couldn't quite put it together! Thanks for smoothing out the details, really helped me out!
I think I figured it out @akrun - in your benchmark the dataframe isn't passed to the function i.e. system.time(rowsums_func) vs system.time(rowsums_func(df))
2

So we can do this with an ifelse conditions:

Some sample data:

df <- data.frame(x=c(450, runif(10)*200),
                 y=c(50, runif(10)*100),
                 z=c(170, runif(10)*100))

> df
           x        y         z
1  450.00000 50.00000 170.00000
2   10.38674 93.33277  74.72619
3  117.66350 48.88015  27.60769
4  128.85086 35.74645  61.32745
5   93.21923 87.15894  53.37949
6   30.09869 86.72846  94.64611
7  104.03966 55.12932  89.78309
8   17.48741 16.50095  42.26284
9  183.52845 39.65171  27.60766
10  79.68355 18.14510  84.17454
11 110.14051 77.85835  33.67199

Then run this:

df$z <- ifelse(df$x > 300 & df$x < 600 & df$y > 0 & df$y < 100 & df$z > 160, 115, df$z)

And we get this:

> df
           x        y         z
1  450.00000 50.00000 115.00000
2   10.38674 93.33277  74.72619
3  117.66350 48.88015  27.60769
4  128.85086 35.74645  61.32745
5   93.21923 87.15894  53.37949
6   30.09869 86.72846  94.64611
7  104.03966 55.12932  89.78309
8   17.48741 16.50095  42.26284
9  183.52845 39.65171  27.60766
10  79.68355 18.14510  84.17454
11 110.14051 77.85835  33.67199

2 Comments

Yep! This was exactly what I was looking for. You and @jared_mamrot came up with pretty much the same solution and it worked perfectly. Really appreciate it!!
Great minds think alike! (And as my Mum would add, "and fools' seldom differ" 🤣). Happy to help!
2

Do you want this?

transform(
  df,
  z = replace(z, 300 < x & x < 600 & 0 < y & y < 100 & z > 160, 115)
)

Comments

2

Another option in base R is with rowSums

df$z[!rowSums(!(df >list(300, 0, 160) & df < list(600, 100, Inf)))] <- 115

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.