In R, extract rows based on strings in different columns

Question

Sorry if the solution to my problem is already out there, and I overlooked it. There are a lot of similar topics which all helped me understand the basics of what I'm trying to do, but did not quite solve my exact problem.

I have a data frame df:

> type = c("A","A","A","A","A","A","B","B","B","B","B","B")
> place = c("x","y","z","x","y","z","x","y","z","x","y","z")
> value = c(1:12)
> 
> df=data.frame(type,place,value)
> df
   type place value
1     A     x     1
2     A     y     2
3     A     z     3
4     A     x     4
5     A     y     5
6     A     z     6
7     B     x     7
8     B     y     8
9     B     z     9
10    B     x    10
11    B     y    11
12    B     z    12
>

(my real data has 3 different values in type and 10 in place, if that makes a difference)

I want to extract rows based on the strings in columns m and n. E.g. I want to extract all rows that contain A in type and x and z in place, or all rows with A and B in type and y in place.

This works perfectly with subset, but I want to run my scripts on different combinations of extracted rows, and adjusting the subset command every time isn't very effective.

I thought of using a vector containing as elements what to get from type and place, respectively.

I tried:

v=c("A","x","z")
df.extract <- df[df$type&df$place %in% v]

but this returns an error.

I'm a total beginner with R and programming, so please bear with me.

You can also use subset which is a convenience function for interactive use only. For example: subset(df, type = "A", place %in% v) — talat
– talat, Commented Dec 16, 2014 at 15:54

akrun · Accepted Answer · 2014-12-16 16:11:12Z

4

You could try

df[df$type=='A' & df$place %in% c('x','y'),]
#   type place value
#1    A     x     1
#2    A     y     2
#4    A     x     4
#5    A     y     5

For the second case

df[df$type %in% c('A', 'B') & df$place=='y',]

Update

Suppose, you have many columns and needs to subset the dataset based on values from many columns. For example.

 set.seed(24)
 df1 <- cbind(df, df[sample(1:nrow(df)),], df[sample(1:nrow(df)),])
 colnames(df1) <- paste0(c('type', 'place', 'value'), rep(1:3, each=3))
 row.names(df1) <- NULL

You can create a list of the values from the columns of interest

 v1 <- setNames(list('A', 'x', c('A', 'B'),
          'x', 'B', 'z'), paste0(c('type', 'place'), rep(1:3, each=2)))

and then use Reduce

 df1[Reduce(`&`,Map(`%in%`, df1[names(v1)], v1)),]

edited Dec 16, 2014 at 16:11

answered Dec 16, 2014 at 15:20

akrun

891k38 gold badges590 silver badges700 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Anca Over a year ago

Thank you, your first entry already does exactly what I need! I'll also have a look into reduce, thanks for the tip.

Cath · Accepted Answer · 2014-12-16 15:24:36Z

2

you can make a function extract :

extract<-function(df,type,place){
             df[df$type %in% type & df$place %in% place,]
         }

that will work for the different subsets you want to do :

df.extract<-extract(df=df,type="A",place=c("x","y")) # or just extract(df,"A",c("x","y"))
> df.extract
  type place value
1    A     x     1
2    A     y     2
4    A     x     4
5    A     y     5

df.extract<-extract(df=df,type=c("A","B"),place="y") # or just extract(df,c("A","B"),"y")
> df.extract
   type place value
2     A     y     2
5     A     y     5
8     B     y     8
11    B     y    11

answered Dec 16, 2014 at 15:24

Cath

24.1k5 gold badges56 silver badges87 bronze badges

Collectives™ on Stack Overflow

In R, extract rows based on strings in different columns

2 Answers 2

Update

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Update

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related