Using if else on a dataframe across multiple columns

Question

I have a large dataset of samples with descriptors of whether the sample is viable - it looks (kind of) like this, where 'desc' is the description column and 'blank' indicates the sample is not viable:

     desc        x        y        z
1   blank 4.529976 5.297952 5.581013
2   blank 5.906855 4.557389 4.901660
3  sample 4.322014 4.798248 4.995959
4  sample 3.997565 5.975604 7.160871
5   blank 4.898922 7.666193 5.551385
6   blank 5.667884 5.195825 5.232072
7   blank 5.524773 6.726074 4.767475
8  sample 4.382937 5.926217 5.203737
9  sample 4.976908 3.079191 4.614121
10  blank 4.572954 4.772373 6.077195

I want to use an if else statement to set the rows with unuseable data to NA. The final data set should look like this:

     desc        x        y        z
1   blank       NA       NA       NA
2   blank       NA       NA       NA
3  sample 4.322014 4.798248 4.995959
4  sample 3.997565 5.975604 7.160871
5   blank       NA       NA       NA
6   blank       NA       NA       NA
7   blank       NA       NA       NA
8  sample 4.382937 5.926217 5.203737
9  sample 4.976908 3.079191 4.614121
10  blank       NA       NA       NA

I have tried a for loop, but I'm having trouble getting the for-loop to change all the columns in one loop. My real dataset has 40 columns, so I'd rather not have to process it in separate loops! Here is the code to change one column at a time:

for(i in 1:length(desc)){
    if(dat$desc[i] =="blank"){
    dat$x[i] <- NA
    } 
    else {
    dat$x[i] <- dat$x[i]
    }
}

I made the sample data with this script:

desc <- c("blank", "blank", "sample", "sample", "blank", "blank", "blank",    "sample", "sample", "blank")
x <-  rnorm(10, mean=5, sd=1)
y <-  rnorm(10, mean=5, sd=1)
z <-  rnorm(10, mean=5, sd=1)

dat <- data.frame(desc,x,y,z)

Sorry if this is a basic question, I've spent all morning looking at forums and haven't been able to find a solution.

Any help is much appreciated!

dww · Accepted Answer · 2016-05-19 03:51:56Z

11

For your example dataset this will work;

Option 1, name the columns to change:

dat[which(dat$desc == "blank"), c("x", "y", "z")] <- NA

In your actual data with 40 columns, if you just want to set the last 39 columns to NA, then the following may be simpler than naming each of the columns to change;

Option 2, select columns using a range:

dat[which(dat$desc == "blank"), 2:40] <- NA

Option 3, exclude the 1st column:

dat[which(dat$desc == "blank"), -1] <- NA

Option 4, exclude a named column:

dat[which(dat$desc == "blank"), !names(dat) %in% "desc"] <- NA

As you can see, there are many ways to do this kind of operation (this is far from a complete list), and understanding how each of these options works will help you to get a better understanding of the language.

edited May 19, 2016 at 3:51

answered May 19, 2016 at 3:24

dww

31.6k8 gold badges75 silver badges126 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

mlcyo Over a year ago

Thankyou so much, I think option 2 will be the one to go with :) And thanks for the additional examples! I hadn't come across which() before.

jennybryan · Accepted Answer · 2016-05-19 04:55:41Z

8

Here's another dplyr solution with a small custom function and mutate_each().

library(dplyr)

f <- function(x) if_else(dat$desc == "blank", NA_real_, x)
dat %>% 
  mutate_each(funs(f), -desc)
#>      desc        x        y        z
#> 1   blank       NA       NA       NA
#> 2   blank       NA       NA       NA
#> 3  sample 3.624941 6.430955 5.486632
#> 4  sample 3.236359 4.935453 4.319202
#> 5   blank       NA       NA       NA
#> 6   blank       NA       NA       NA
#> 7   blank       NA       NA       NA
#> 8  sample 5.058725 6.751650 4.750529
#> 9  sample 5.837206 4.323562 4.914780
#> 10  blank       NA       NA       NA

answered May 19, 2016 at 4:55

jennybryan

2,6562 gold badges21 silver badges33 bronze badges

1 Comment

mlcyo Over a year ago

Thanks for the solution! I went with dww's one-line solution above, but this looks good too :)

cardosof · Accepted Answer · 2016-05-19 04:01:35Z

3

You can use dplyr and a custom function to mutate values on certain conditions.

`

library(dplyr)
mutate_cond <- function(.data, condition, ..., envir = parent.frame()) {
        condition <- eval(substitute(condition), .data, envir)
        .data[condition, ] <- .data[condition, ] %>% mutate(...)
        .data
}
data <- data %>% 
mutate_cond( desc == "blank", x = NA, y = NA, z = NA)

`

answered May 19, 2016 at 4:01

cardosof

312 bronze badges

Comments

Carlos Ahumada · Accepted Answer · 2019-10-29 13:49:06Z

3

Using your first initial approach with loops I figured out this:

    for(i in 1:nrow(dat)){
  if(dat[i, 1] =="blank"){
    dat[i, 2:4] <- NA
  } 
  else {
    dat[i,length(dat)] <- dat[i, length(dat)]
  }
}

I tested it with your data and worked. Hope this is useful for everyone dealing with loops in rows and columns with conditions.

answered Oct 29, 2019 at 13:49

Carlos Ahumada

312 bronze badges

1 Comment

mlcyo Over a year ago

Awesome, thanks for commenting - I'm sure someone will find this useful one day :)

akrun · Accepted Answer · 2016-05-19 05:37:21Z

2

Here is an option using set from data.table. It should be faster as the overhead of [.data.table is avoided. We convert the 'data.frame' to 'data.table' (setDT(df1)), loop through the column names of 'df1' (excluding the 'desc' column'), assign the elements to "NA" where the logical condition is 'i' is met.

library(data.table)
setDT(df1)
for(j in names(df1)[-1]){
   set(df1, i= which(df1[["desc"]]=="blank"), j= j, value= NA)
}
df1
#      desc        x        y        z
# 1:  blank       NA       NA       NA
# 2:  blank       NA       NA       NA
# 3: sample 4.322014 4.798248 4.995959
# 4: sample 3.997565 5.975604 7.160871
# 5:  blank       NA       NA       NA
# 6:  blank       NA       NA       NA
# 7:  blank       NA       NA       NA
# 8: sample 4.382937 5.926217 5.203737
# 9: sample 4.976908 3.079191 4.614121
#10:  blank       NA       NA       NA

Or another option (based on @dww's comment)

setDT(df1, key = "desc")["blank", names(df1)[-1] := NA][]

edited May 19, 2016 at 5:37

answered May 19, 2016 at 4:21

akrun

891k38 gold badges590 silver badges700 bronze badges

6 Comments

dww Over a year ago

or, if using data tables, just df1[desc=="blank", c(2:NCOL(df1)):=NA, with=F] would do it.

akrun Over a year ago

@dww It could be done, but i think set would be fast

dww Over a year ago

microbenchmarking these, it seems that the version in my comment is an order of magnitude faster. As you say, set should be fast. Could it be the overhead of which(df1[[ that slows your one?

akrun Over a year ago

@dww microbenchmarking with a large dataset or the example showed by the OP?

dww Over a year ago

I used 100,000 rows, but just the 4 columns of OP.

|

SF Feldman · Accepted Answer · 2022-10-13 06:33:23Z

2

Here is another dplyr solution, using the new function across:

library(dplyr)

f <- function(x)ifelse(desc=="blank", NA, x)
dat %>% 
  mutate(across(.cols = c(x,y,z), .fns=f))

answered Oct 13, 2022 at 6:33

SF Feldman

334 bronze badges

Comments

bramtayl · Accepted Answer · 2016-05-19 03:20:24Z

1

This should work. Though honestly, if the data is unusable, why not delete the rows altogether?

library(dplyr)

blanks = 
  dat %>%
  filter(desc == "blank") %>%
  select(desc)

dat %>%
  filter(desc == "sample") %>%
  bind_rows(blanks)

answered May 19, 2016 at 3:20

bramtayl

4,0242 gold badges13 silver badges20 bronze badges

1 Comment

mlcyo Over a year ago

Thanks very much for taking the time to answer :) I definitely need to get more familiar with dplyr, it seems to be really useful. And as for deleting it, it's a timeseries (at 0.5 second intervals) and I think it would make my life harder in the long run if I deleted the bad rows!

Collectives™ on Stack Overflow

Using if else on a dataframe across multiple columns

7 Answers 7

1 Comment

1 Comment

Comments

1 Comment

6 Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

1 Comment

1 Comment

Comments

1 Comment

6 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related