Replacing specific values in a data frame column according to multiple conditions in other columns in R

Question

I'm a relative beginner with R so apologies for the simplistic question.

I have a simple data frame with columns x, y and z. They all contain numerical values and I'd like to write a piece of code that allows me to replaces a all z values with "115" whenever 300 < x < 600, 0 < y < 100, and z > 160.

Very simple problem but I am not sure why I am having so much trouble figuring out how to piece together code for this. I'm sure its some hodge-podge of replace and ifelse arguments but I can't seem to put it together.

Help is much appreciated! Thanks!

It would be easier to help if you create a small reproducible example along with expected output. Read about how to give a reproducible example. — Ronak Shah
– Ronak Shah, Commented May 17, 2021 at 4:27
Some of the answers already provided were able to give me the help I needed! Thank you though! — Olivia Floyd
– Olivia Floyd, Commented May 18, 2021 at 4:19

jared_mamrot · Accepted Answer · 2021-05-17 01:01:25Z

4

This is how I would do it:

library(tidyverse)
set.seed(1)
df <- data_frame("x" = sample(x = 200:700, size = 10, replace = TRUE),
                 "y" = sample(x = 0:400, size = 10, replace = TRUE),
                 "z" = sample(x = 0:200, size = 10, replace = TRUE))
df
#> A tibble: 10 x 3
#>       x     y     z
#>   <int> <int> <int>
#> 1   523    84   109
#> 2   366   276   164
#> 3   328   361    33
#> 4   617   329   105
#> 5   670   262   125
#> 6   498   328    88
#> 7   469    78   171
#> 8   665   212    32
#> 9   386    36    83
#>10   506   104   162

df$z <- ifelse((df$x > 300 & df$x < 600) & (df$y > 0 & df$y < 100) & (df$z > 160), 115, df$z)
df
#> A tibble: 10 x 3
#>       x     y     z
#>   <int> <int> <dbl>
#> 1   523    84   109
#> 2   366   276   164
#> 3   328   361    33
#> 4   617   329   105
#> 5   670   262   125
#> 6   498   328    88
#> 7   469    78   115
#> 8   665   212    32
#> 9   386    36    83
#>10   506   104   162

#(#7 was updated to 115 as it met all the criteria)

Edit

As usual, @TIC's answer is better than mine (fewer steps -> faster) but not by much on my system with a million rows. The data.table method is quickest:

library(tidyverse)
set.seed(1)
df <- data_frame("x" = sample(x = 0:700, size = 1000000, replace = TRUE),
                 "y" = sample(x = 0:400, size = 1000000, replace = TRUE),
                 "z" = sample(x = 0:200, size = 1000000, replace = TRUE))

ifelse_func <- function(df){
  df$z <- ifelse((df$x > 300 & df$x < 600) & (df$y > 0 & df$y < 100) & (df$z > 160), 115, df$z)
}

transform_func <- function(df){
  transform(df, z = replace(z, 300 < x & x < 600 & 0 < y & y < 100 & z > 160, 115))
}

rowsums_func <- function(df){
  df$z[!rowSums(!(df >list(300, 0, 160) & df < list(600, 100, Inf)))] <- 115
}

library(data.table)
dt_func <- function(df){
  setDT(df)
  df[x > 300 & x < 600 & y > 0 & y < 100 & z > 160, z := 115]
}

mbm <- microbenchmark::microbenchmark(ifelse_func(df), transform_func(df),
                                      rowsums_func(df), dt_func(df))
autoplot(mbm)

Edit 2

> system.time(ifelse_func(df))
   user  system elapsed 
  0.064   0.020   0.085 
> system.time(transform_func(df))
   user  system elapsed 
  0.060   0.009   0.069 
> system.time(rowsums_func(df))
   user  system elapsed 
  0.090   0.021   0.110 
> system.time(dt_func(df))
   user  system elapsed 
  0.036   0.003   0.039

edited May 17, 2021 at 1:01

answered May 16, 2021 at 23:33

jared_mamrot

26.5k5 gold badges27 silver badges56 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Olivia Floyd Over a year ago

Worked like a charm! Frustrated to see that I was nearly there with my own code but couldn't quite put it together! Thanks for smoothing out the details, really helped me out!

jared_mamrot Over a year ago

I think I figured it out @akrun - in your benchmark the dataframe isn't passed to the function i.e. system.time(rowsums_func) vs system.time(rowsums_func(df))

Elk · Accepted Answer · 2021-05-16 23:32:31Z

2

So we can do this with an ifelse conditions:

Some sample data:

df <- data.frame(x=c(450, runif(10)*200),
                 y=c(50, runif(10)*100),
                 z=c(170, runif(10)*100))

> df
           x        y         z
1  450.00000 50.00000 170.00000
2   10.38674 93.33277  74.72619
3  117.66350 48.88015  27.60769
4  128.85086 35.74645  61.32745
5   93.21923 87.15894  53.37949
6   30.09869 86.72846  94.64611
7  104.03966 55.12932  89.78309
8   17.48741 16.50095  42.26284
9  183.52845 39.65171  27.60766
10  79.68355 18.14510  84.17454
11 110.14051 77.85835  33.67199

Then run this:

df$z <- ifelse(df$x > 300 & df$x < 600 & df$y > 0 & df$y < 100 & df$z > 160, 115, df$z)

And we get this:

> df
           x        y         z
1  450.00000 50.00000 115.00000
2   10.38674 93.33277  74.72619
3  117.66350 48.88015  27.60769
4  128.85086 35.74645  61.32745
5   93.21923 87.15894  53.37949
6   30.09869 86.72846  94.64611
7  104.03966 55.12932  89.78309
8   17.48741 16.50095  42.26284
9  183.52845 39.65171  27.60766
10  79.68355 18.14510  84.17454
11 110.14051 77.85835  33.67199

edited May 16, 2021 at 23:32

answered May 16, 2021 at 23:24

Elk

5422 silver badges9 bronze badges

2 Comments

Olivia Floyd Over a year ago

Yep! This was exactly what I was looking for. You and @jared_mamrot came up with pretty much the same solution and it worked perfectly. Really appreciate it!!

Elk Over a year ago

Great minds think alike! (And as my Mum would add, "and fools' seldom differ" 🤣). Happy to help!

ThomasIsCoding · Accepted Answer · 2021-05-16 23:34:01Z

2

Do you want this?

transform(
  df,
  z = replace(z, 300 < x & x < 600 & 0 < y & y < 100 & z > 160, 115)
)

edited May 16, 2021 at 23:34

answered May 16, 2021 at 23:23

ThomasIsCoding

106k9 gold badges38 silver badges110 bronze badges

Comments

akrun · Accepted Answer · 2021-05-17 01:05:10Z

2

Another option in base R is with rowSums

df$z[!rowSums(!(df >list(300, 0, 160) & df < list(600, 100, Inf)))] <- 115

edited May 17, 2021 at 1:05

answered May 17, 2021 at 0:02

akrun

891k38 gold badges590 silver badges700 bronze badges

Collectives™ on Stack Overflow

Replacing specific values in a data frame column according to multiple conditions in other columns in R

4 Answers 4

Edit

Edit 2

2 Comments

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Edit

Edit 2

2 Comments

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related