0

Let's say I want to write a function like:

Fn <- function(df, to_remove = NULL) {
  df <- df[!df %in% to_remove,]
}

The purpose is to remove all values in a row (not row numbers/indices/names) where one of the values is equal to value(s) specified in to_remove.

Any idea why this doesn't work without specifying a column?

Example:

df <- data.frame(a = c("a", "a", "a"), b = c("a", "b", "a"))

  a b
1 a a
2 a b
3 a a

Expected output:

  a b
1 a a
3 a a

I'm looking for a base R or data.table solution.

2
  • 2
    Are you trying to remove a row if the to_remove argument appears in any column? You might check out the filter_* variants in dplyr, e.g (filter_all). Commented Oct 23, 2018 at 22:13
  • Should work, but would like to avoid writing functions with dplyr for the moment (data.table approach would be fine though). Commented Oct 24, 2018 at 7:33

2 Answers 2

1

To remove rows, you need to provide row indices with negative sign or vector (typically of same length as nrow(df)) with TRUE and FALSE. Your code !df %in% to_remove does not do that. Try this -

Fn <- function(df, to_remove = NULL) {
  df[!apply(df, 1, function(x) any(x %in% to_remove)), ]
}

Fn(df, "b")
  a b
1 a a
3 a a

Fn(df, c("a", "b"))
[1] a b
<0 rows> (or 0-length row.names)

Fn(df, "d")
  a b
1 a a
2 a b
3 a a
Sign up to request clarification or add additional context in comments.

7 Comments

Thanks! However, this only removes the rows with indices in to_remove. What I'm interested in is to remove the values within the columns. For instance, in above case I may be interested in removing all rows in the dataset where one of the rows is equal to 0.2.
@arg0naut, I think you are contradicting yourself: Removing values within the columns != removing all rows in the dataset where one of the rows is equal to 0.2
@Roman not sure where the contradiction lies as I'm not referring to row names or row indices or row numbers. Perhaps I should have specified row values; but I've added an example now.
@arg0naut...I misunderstood your question...anyways have updated the answer.
Thanks @Shree, this seems to be working. However, will wait a bit before accepting since I find any to be quite slow and I hesitate to use functions like that.
|
1

Why not a simple loop?

rowrem <- function(x, val) {
    for(i in 1:nrow(x)){
        for(j in 1:ncol(x)){
            if(paste(x[i,j]) == val)(
                x <- x[-i,]                
            )
        }
    }
    print(x)
}
Result
> rowrem(df1, "b")
  a b
1 a a
3 a a

Explanation: What you want to do is check every single value of every single cell and refer that back to the row number. With base R your choices are a bit limited in that regard. A sensible (i.e., maintainable) solution would probably be something like above, but I'm sure someone will come up with a lapply or subsetting solution as well.

Data

df1 <- data.frame(a = c("a", "a", "a"), b = c("a", "b", "a"))

1 Comment

While this works, I would like to avoid using loops inside functions; will rather stick to apply family, but thanks anyway!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.