1

I have a data frame which includes a vector of individual identifiers (which are 6 letters) and vectors of numbers

I would like to subset it using a vector of elements (again 6-letters identifiers) taken from another dataframe

Here is what I did (in a simplified version, my dataframe has over 200 columns and 64 rows)

n = c(2, 3, 5, 7, 8, 1) 

i = c("abazzz", "bbaxxx", "ccbeee","dddfre", "sdtyuo", "loatvz" ) 

c = c(10, 2, 10, 2, 12, 34) 

df1 = data.frame(n, i, c) 

attach(example)

This is the vector whose elements I want to use for subsetting:

v<- c("abazzz", "ccbeee", "lllaaa")

This is what I do to subset

df2<-example[, i==abazzz | ccbeee | lllaaa]

This does not work, the error I get is "abazzz" not found ( I tried with and without "", I tried using the command subset, same error appears)

Moreover I would like to avoid the or operator as the vector I need to use for subsetting has about 50 elements. So, in words, what I would like to do is to subset df2 in order to extract only those individuals who already appear in df1 using their identifiers (column in df1)

Writing this makes me think this must be very easy to do, but I can't figure it out by myself, I tried looking up similar questions but could not find what I was looking for. I hope someone can help me, suggest other posts or manuals so I can learn. Thanks!

1
  • 1
    You have to use quotes "abazzz", Please don't attach the dataset.. Use %in% ie. df1[df1$i %in% v,] Commented Jan 8, 2015 at 10:27

2 Answers 2

3

Here's another nice option using data.tables binary search (for efficiency)

library(data.table)
setkey(setDT(df1), i)[J(v), nomatch = 0]
#    n      i  c
# 1: 2 abazzz 10
# 2: 5 ccbeee 10

Or if you don't want to reorder the data set and keep the syntax similar to base R, you could set a secondary key instead (contributed by @Arun)

set2key(setDT(df1), i) 
df1[i %in% v]

Or dplyr (for simplicity)

library(dplyr)
df1 %>% filter(i %in% v)
#    n      i  c
# 1: 2 abazzz 10
# 2: 5 ccbeee 10

As a side note: as mentioned in comments, never use attach

Sign up to request clarification or add additional context in comments.

Comments

2

(1) Instead of

attach(df1)
df2<-df1[, i==abazzz | ccbeee | lllaaa]
detach(df1)

try

df2 <- with(df1, df1[i=="abazzz" | i=="ccbeee" | i=="lllaaa", ])

(2)

with(df1, df1[i %in% v, ])

Both yield

#   n      i  c
# 1 2 abazzz 10
# 3 5 ccbeee 10

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.