specify different subsets or intervals of a variable in the by argument of data.table

Question

Using the following reaction time data (simplified for demonstrative purposes):

>dt
   subject trialnum blockcode values.trialtype latency correct
1        1        1  practice        cueswitch    3020       1
2        1        1      test           cuerep    4284       1
3        1       21      test        cueswitch    2094       1
4        1       34      test           cuerep    3443       1
5        1       50      test       taskswitch    3313       1
6        2        1  practice        cueswitch    3020       1
7        2        1      test           cuerep    1109       1
8        2       21      test        cueswitch    3470       1
9        2       34      test           cuerep    2753       1
10       2       50      test       taskswitch    3321       1

I have been using data.table to obtain reaction time variables for consecutive subsets of trials (specified by trialnum, which ranges from 1 to 170 in the full dataset):

dt1=dt[blockcode=="test" & correct==1, list(
RT1=.SD[trialnum>=1 & trialnum<=30 & values.trialtype=="cuerep", mean(latency)],
RT2=.SD[trialnum>=31 & trialnum<=60 & values.trialtype=="cuerep", mean(latency)]
), by="subject"]

The output is

   subject     RT1     RT2
1:       1    4284    3443
2:       2    1109    2753

However, it becomes tedious creating a variable for each subset when there are more than 2 or 3 subsets. How can I specify those subsets more efficiently?

You need to provide more complete example data, and better example of what your expected outcome is. — mnel
– mnel, Commented Apr 14, 2014 at 2:18
My guess is that you are looking something along these lines right? (df1 <- dt[dt$subject == 1,]); (df2 <- df1[df1$blockcode == "test",]); (df3 <- df2[df2$correct == 1,]); (df4 <- df3[df3$trialnum %in% c(1, 30),]); (df5 <- df4[df4$values.trialtype == "cuerepetition",]); — Sathish
– Sathish, Commented Apr 14, 2014 at 3:11
Not quite. I'm looking for a more automated way of generating RT variables for N subsets of the data. — AlexR
– AlexR, Commented Apr 14, 2014 at 3:31

mnel · Accepted Answer · 2014-04-14 04:24:30Z

2

Use findInterval or cut to subset your trialnum column`

An example

# set the key to use binary search
setkey(dt, blockcode,correct,values.trialtype)
# the subset you want
dt1 <- dt[.('test',1,'cuerepetition')]

# use cut  to define subsets

dt2 <- dt1[,list(latency = mean(latency)),
     by=list(subject, trialset = cut(trialnum,seq(0,180,by=30)))]
dt2
#    subject trialset latency
# 1:       1   (0,30]    4284
# 2:       1  (30,60]    3443
# 3:       2   (0,30]    1109
# 4:       2  (30,60]    2753

#If you want separate columns, it is a simple as using `dcast`
library(reshape2)

dcast(dt2,subject~trialset, value.var = 'latency')
#   subject (0,30] (30,60]
# 1       1   4284    3443
# 2       2   1109    2753

edited Apr 14, 2014 at 4:24

answered Apr 14, 2014 at 2:26

mnel

116k28 gold badges269 silver badges255 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

AlexR Over a year ago

This is close, but the challenge for me is creating a separate column/variable for each subset as shown in the output (crepRT1,crepRT2... and many more in the real data) without actually typing out each variable.

AlexR Over a year ago

Should have realized that myself as I know dcast. Thanks, this works very well and is convenient.

Collectives™ on Stack Overflow

specify different subsets or intervals of a variable in the by argument of data.table

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related