24

How can I avoid using a loop to subset a dataframe based on multiple factor levels?

In the following example my desired output is a dataframe. The dataframe should contain the rows of the original dataframe where the value in "Code" equals one of the values in "selected".

Working example:

#sample data
Code<-c("A","B","C","D","C","D","A","A")
Value<-c(1, 2, 3, 4, 1, 2, 3, 4)
data<-data.frame(cbind(Code, Value))

selected<-c("A","B") #want rows that contain A and B

#Begin subsetting
result<-data[which(data$Code==selected[1]),]
s1<-2
while(s1<length(selected)+1)
{
  result<-rbind(result,data[which(data$Code==selected[s1]),])
  s1<-s1+1
}

This is a toy example of a much larger dataset, so "selected" may contain a great number of elements and the data a great number of rows. Therefore I would like to avoid the loop.

0

3 Answers 3

44

You can use %in%

  data[data$Code %in% selected,]
  Code Value
1    A     1
2    B     2
7    A     3
8    A     4
Sign up to request clarification or add additional context in comments.

Comments

5

Here's another:

data[data$Code == "A" | data$Code == "B", ]

It's also worth mentioning that the subsetting factor doesn't have to be part of the data frame if it matches the data frame rows in length and order. In this case we made our data frame from this factor anyway. So,

data[Code == "A" | Code == "B", ]

also works, which is one of the really useful things about R.

1 Comment

The second part did not work for me in Jupyter notebook.
4

Try this:

> data[match(as.character(data$Code), selected, nomatch = FALSE), ]
    Code Value
1      A     1
2      B     2
1.1    A     1
1.2    A     1

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.