Why does the %in% operator not behave analogously to the == operator with data frame indexing

Question

I'm writing a function to clean some CEX data (doesn't really matter), and I cannot figure out why I am unable to use %in% to subset a data frame with a list when I am able to perform the analogous operation with == on a single item. What I am attempting to perform is like f_fails() below. Unless I'm mistaken, I need to be able to feed a string but cannot.

Is there something distinct about %in% in items 6 and 8 below that does not apply for ==? How can I perform 6 and 8 in another way?

# Test Data
set.seed(123)
df <- data.frame(
  NEWID = rep(1:10, 1, each = 10),
  COST = rnorm(100, 1000, 10),
  UCC = round(runif(100, 3995, 4005))
)

# All of these work except the 6th one
# 1.
df[df$UCC == 4000,]
# 2. 
df[df$"UCC" == 4000,]
# 3. 
df[df["UCC"] == 4000,]

# 4. 
df[df$UCC %in% c(4000,4001),]
# 5. 
df[df$"UCC" %in% c(4000,4001),]
# 6.  The one I need does not work
df[df["UCC"] %in% c(4000,4001),]

# 7. This works fine
f_works <- function(data, filter_var, one_val){
  # I can feed values with == and filter
  d <- data[data[filter_var] == one_val,]
  d
}
# 8. This (what I want) returns an empty data frame.
f_fails <- function(data = df, filter_var, many_vals){
  # I cannot feed 2+ values with %in% and filter
  d <- data[data[filter_var] %in% many_vals,]
  d
}

f_works(df, "UCC", 4000)
f_fails(df, "UCC", c(4000,4001))

L Tyrone · Accepted Answer · 2024-11-23 01:12:26Z

2

In this case, %in% expects a vector either side and data[filter_var] returns a dataframe on the left. You need to use [[]] instead:

f <- function(data = df, filter_var, many_vals){
  d <- data[data[[filter_var]] %in% many_vals,]
}

head(f(df, "UCC", c(4000, 4001)))
#    NEWID     COST  UCC
# 3      1 1015.587 4001
# 4      1 1000.705 4000
# 11     2 1012.241 4000
# 27     3 1008.378 4000
# 28     3 1001.534 4001
# 31     4 1004.265 4001

answered Nov 23, 2024 at 1:12

L Tyrone

8,36123 gold badges34 silver badges47 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

dcoy Over a year ago

Nice, this works. I suppose a general takeaway is that == can test a string on a vector OR a df$column but %in% can only test vectors on vectors. I can't think of a reason this is necessary since df[["UCC"]] %in% c(4000,4001) != c(4000,4001) %in% df[["UCC"]]. The left side of %in% could take a vector or a df$column like ==. Anyone know of a reason?

Onyambu Over a year ago

@dcoy. The %in% operator does take a vector. Note that df$column is a vectir. But df[column] is not a vector but a list/data.frame of length 1. Which behaves differently than df[[column]]--a vector.

dcoy Dec 4, 2024 at 23:23

@Onyambu,I think I was unclear but can't edit now. I agree with all you said. I meant to ask something like "is there any reason to limit %in% by not allowing it to be more inclusive and take either a vector or a non-vector on the LHS, as we allow with ==". I.e., why not allow either df["UCC"] or df[[UCC]] with %in% as we allow when using==? Is there any reason aside from "that's the way it is"?

Onyambu Dec 5, 2024 at 1:29

@dcoy Yes there is a reason to limit %in%. Note that dataframes do have a method ==, while lists do not have the == method. On the other hand, lists do possess the %in% method. This enables checks like list(1,1:2) %in% list(1:2,3:4,5:6). But you can not do list(1,1:2) == list(1:2,3:4,5:6) to return FALSE. Without the limitation, when do you know whether you are comparing equality or element inclusion?? The two operators are different and %in% is used to check for membership and not equality

dcoy Dec 5, 2024 at 19:36

@Onyambu, thanks for the replies. I actually did not know you cannot use == on two lists with the same dimensions. I think for my broader question, I'd need more space to articulate and maybe simulate scenarios more. I might be misunderstanding something, but to me the answer to the question in your penultimate sentence is your final sentence. It's intuitive that == could never work for inclusion (lists/vectors/anything with differing dimensions). I simply do not understand why my question example #3 works, but example #6 does not. Not really an issue of equality vs inclusion, imo.

|

Katia · Accepted Answer · 2024-11-23 01:18:44Z

1

If you use the class() or str() functions, you will see that df$UCC is a numeric vector:

class(df$UCC)
## [1] "numeric"

At the same time

class(df["UCC"])
## [1] "data.frame"

You can compare a numeric vector with a value or use %in% operator:

df$UCC == 4000
##  [1] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE 
## etc.

df$UCC %in% c(4000, 4001)
##  [1] FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE 
## etc.

If you will try to compare a dataframe with a value (which has the same "numeric" type), you will get a matrix as a result:

class( df["UCC"] == 4000)
## [1] "matrix" "array"

When you use %in% operator you ask if the object on the left is equal to one of the objects in the set on the right. The data frame is not a part of a numeric vector object.

class( df["UCC"]  %in% c(4000, 4001))
## [1] "logical"

If, however, instead you use a numeric vector df$UCC, it will work since both left and right side of the %in% operator have the same "numeric vector" class:

df$UCC  %in% c(4000, 4001)
##  [1] FALSE FALSE  TRUE  TRUE FALSE FALSE

The easiest way to implement your function, is to use the dplyr package

library(dplyr)
d <- filter(data, get({{filter_var}}) %in% many_vals)

answered Nov 23, 2024 at 1:18

Katia

3,9641 gold badge18 silver badges30 bronze badges

2 Comments

dcoy Over a year ago

Your answer is more thorough, and I appreciate the class() component. I am sorry for giving it to the other person, since they were first and the [[many_values]] is a simpler fix. I appreciate your response.

Friede Over a year ago

Why dplyr + get?

Collectives™ on Stack Overflow

Why does the %in% operator not behave analogously to the == operator with data frame indexing

2 Answers 2

6 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related