Subset data within function based on value in any column

Question

Let's say I want to write a function like:

Fn <- function(df, to_remove = NULL) {
  df <- df[!df %in% to_remove,]
}

The purpose is to remove all values in a row (not row numbers/indices/names) where one of the values is equal to value(s) specified in to_remove.

Any idea why this doesn't work without specifying a column?

Example:

df <- data.frame(a = c("a", "a", "a"), b = c("a", "b", "a"))

  a b
1 a a
2 a b
3 a a

Expected output:

  a b
1 a a
3 a a

I'm looking for a base R or data.table solution.

Are you trying to remove a row if the to_remove argument appears in any column? You might check out the filter_* variants in dplyr, e.g (filter_all). — zack
– zack, Commented Oct 23, 2018 at 22:13
Should work, but would like to avoid writing functions with dplyr for the moment (data.table approach would be fine though). — arg0naut91
– arg0naut91, Commented Oct 24, 2018 at 7:33

Shree · Accepted Answer · 2018-10-24 11:28:38Z

1

To remove rows, you need to provide row indices with negative sign or vector (typically of same length as nrow(df)) with TRUE and FALSE. Your code !df %in% to_remove does not do that. Try this -

Fn <- function(df, to_remove = NULL) {
  df[!apply(df, 1, function(x) any(x %in% to_remove)), ]
}

Fn(df, "b")
  a b
1 a a
3 a a

Fn(df, c("a", "b"))
[1] a b
<0 rows> (or 0-length row.names)

Fn(df, "d")
  a b
1 a a
2 a b
3 a a

edited Oct 24, 2018 at 11:28

answered Oct 23, 2018 at 22:00

Shree

11.2k1 gold badge16 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

arg0naut91 Over a year ago

Thanks! However, this only removes the rows with indices in to_remove. What I'm interested in is to remove the values within the columns. For instance, in above case I may be interested in removing all rows in the dataset where one of the rows is equal to 0.2.

Roman Over a year ago

@arg0naut, I think you are contradicting yourself: Removing values within the columns != removing all rows in the dataset where one of the rows is equal to 0.2

arg0naut91 Over a year ago

@Roman not sure where the contradiction lies as I'm not referring to row names or row indices or row numbers. Perhaps I should have specified row values; but I've added an example now.

Shree Over a year ago

@arg0naut...I misunderstood your question...anyways have updated the answer.

arg0naut91 Over a year ago

Thanks @Shree, this seems to be working. However, will wait a bit before accepting since I find any to be quite slow and I hesitate to use functions like that.

|

Roman · Accepted Answer · 2018-10-24 10:36:00Z

1

Why not a simple loop?

rowrem <- function(x, val) {
    for(i in 1:nrow(x)){
        for(j in 1:ncol(x)){
            if(paste(x[i,j]) == val)(
                x <- x[-i,]                
            )
        }
    }
    print(x)
}

Result

> rowrem(df1, "b")
  a b
1 a a
3 a a

Explanation: What you want to do is check every single value of every single cell and refer that back to the row number. With base R your choices are a bit limited in that regard. A sensible (i.e., maintainable) solution would probably be something like above, but I'm sure someone will come up with a lapply or subsetting solution as well.

Data

df1 <- data.frame(a = c("a", "a", "a"), b = c("a", "b", "a"))

edited Oct 24, 2018 at 10:36

answered Oct 24, 2018 at 10:26

Roman

5,0192 gold badges23 silver badges61 bronze badges

1 Comment

arg0naut91 Over a year ago

While this works, I would like to avoid using loops inside functions; will rather stick to apply family, but thanks anyway!

Collectives™ on Stack Overflow

Subset data within function based on value in any column

2 Answers 2

7 Comments

Data

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

Data

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related