Filtering data in one data frame based on values from variables in a different data frame

Question

I want to filter RT data in one data frame based on the values in a different data frame. Here the first 15 rows of the first data frame, let's call it df1:

    ExperimentName        Subject Block practisexperimental Trial  Congruency     Acc RT
1   Bayesmicro-53-SINGLE       1     2        experimental     1   Congruent        1 549
2   Bayesmicro-53-SINGLE       1     2        experimental     2 Incongruent        1 510
3   Bayesmicro-53-SINGLE       1     2        experimental     3   Congruent        1 476
4   Bayesmicro-53-SINGLE       1     2        experimental     4   Congruent        1 568
5   Bayesmicro-53-SINGLE       1     2        experimental     5   Congruent        1 401
6   Bayesmicro-53-SINGLE       1     2        experimental     6 Incongruent        1 458
7   Bayesmicro-53-SINGLE       1     2        experimental     7 Incongruent        1 494
8   Bayesmicro-53-SINGLE       1     2        experimental     8 Incongruent        1 876
9   Bayesmicro-53-SINGLE       1     2        experimental     9 Incongruent        1 567
10  Bayesmicro-53-SINGLE       1     2        experimental    11   Congruent        1 444
11  Bayesmicro-53-SINGLE       1     2        experimental    13 Incongruent        1 507
12  Bayesmicro-53-SINGLE       1     2        experimental    14 Incongruent        1 658
13  Bayesmicro-53-SINGLE       1     2        experimental    15 Incongruent        1 613
14  Bayesmicro-53-SINGLE       1     2        experimental    16   Congruent        1 529
15  Bayesmicro-53-SINGLE       1     2        experimental    18 Incongruent        1 513

Here the complete second data frame, df2:

Subject Mean_RT SD_RT   RTUpperLimit  RTLowerLimit
 1 1          485. 102.          688.        281. 
 2 10         596. 143.          881.        311. 
 3 11         608. 149.          907.        309. 
 4 12         546.  89.9         726.        366. 
 5 13         465.  81.3         627.        302. 
 6 14         559. 232.         1024.         93.8
 7 15         464.  66.4         597.        332. 
 8 16         803. 174.         1152.        455. 
 9 17         598. 124.          846.        350. 
10 18         485.  83.1         651.        319. 
11 19         483. 204.          892.         74.3
12 2          548. 144.          835.        260. 
13 20         547. 111.          769.        326. 
14 3          496. 100.          696.        295. 
15 4          576. 165.          906.        245. 
16 5          546. 122.          789.        303. 
17 6          543. 169.          882.        204. 
18 7          514.  93.1         700.        328. 
19 8          578. 118.          814.        341. 
20 9          556.  99.4         755.        358.

So, in df1 there is the raw data of 20 different subjects, and I need the RTs in df1 to be filtered following this logic: df1$RT > df2$RTLowerLimit & df1$RT < df2$RTUpperLimit

Importantly, the filtering needs to be done for each subject independently (e.g., each subject has its own value in the RTUpperLimit and RTlowerLimit columns in df2). Ideally, the output should be saved as a new variable (RT_filtered) in df1.

I have tried several things using dplyr "filter" and "mutate" but I cannot make it work. Any idea on how to get this done would be very much appreciated.

Thanks, Mikel

I would try joining the two data frames by subject, and then applying your filtering. — Seth
– Seth, Commented May 1, 2023 at 16:36

Seth · Accepted Answer · 2023-05-01 20:05:54Z

You can join the two dataframes using the Subject variable as a common key. That will let you apply a filter to each row, keeping only experiments where the RT is between the upper and lower bounds for each subject.

Edited to add some toy data to better demonstrate join and filter

The data frame that results from left_join will have the upper and lower bounds for your RT added as extra columns, matched up by Subject ID. You should inspect the output prior to filtering to make sure its working as you expect.

The filter operates on each row, checking whatever is in that row's RT column against the upper and lower bounds brought in by joining the tables.

library(dplyr)


df2 <- tibble::tibble(Subject = c(1,2,3,4),
              Mean_RT = c(485,485,485,485),
              SD_RT = c(102,102,102,102),
              RTUpper = c(688,650,640,700),
              RTLower = c(281,280,250,300))

df1 <- structure(list(Experiment = c("Bayesmicro-53-SINGLE", "Bayesmicro-53-SINGLE", 
                                     "Bayesmicro-53-SINGLE", "Bayesmicro-53-SINGLE", "Bayesmicro-53-SINGLE", 
                                     "Bayesmicro-53-SINGLE", "Bayesmicro-53-SINGLE", "Bayesmicro-53-SINGLE", 
                                     "Bayesmicro-53-SINGLE", "Bayesmicro-53-SINGLE", "Bayesmicro-53-SINGLE", 
                                     "Bayesmicro-53-SINGLE", "Bayesmicro-53-SINGLE", "Bayesmicro-53-SINGLE", 
                                     "Bayesmicro-53-SINGLE", "Bayesmicro-53-SINGLE", "Bayesmicro-53-SINGLE", 
                                     "Bayesmicro-53-SINGLE", "Bayesmicro-53-SINGLE", "Bayesmicro-53-SINGLE", 
                                     "Bayesmicro-53-SINGLE", "Bayesmicro-53-SINGLE", "Bayesmicro-53-SINGLE", 
                                     "Bayesmicro-53-SINGLE", "Bayesmicro-53-SINGLE", "Bayesmicro-53-SINGLE", 
                                     "Bayesmicro-53-SINGLE", "Bayesmicro-53-SINGLE", "Bayesmicro-53-SINGLE", 
                                     "Bayesmicro-53-SINGLE", "Bayesmicro-53-SINGLE", "Bayesmicro-53-SINGLE", 
                                     "Bayesmicro-53-SINGLE", "Bayesmicro-53-SINGLE", "Bayesmicro-53-SINGLE", 
                                     "Bayesmicro-53-SINGLE", "Bayesmicro-53-SINGLE", "Bayesmicro-53-SINGLE", 
                                     "Bayesmicro-53-SINGLE", "Bayesmicro-53-SINGLE", "Bayesmicro-53-SINGLE", 
                                     "Bayesmicro-53-SINGLE", "Bayesmicro-53-SINGLE", "Bayesmicro-53-SINGLE", 
                                     "Bayesmicro-53-SINGLE"), Subject = c(1, 1, 1, 1, 1, 1, 1, 1, 
                                                                          1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 
                                                                          3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4), Block = c(2, 
                                                                                                                                     2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
                                                                                                                                     2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
                                                                                                                                     2, 2), practice_experimental = c("experimental", "experimental", 
                                                                                                                                                                      "experimental", "experimental", "experimental", "experimental", 
                                                                                                                                                                      "experimental", "experimental", "experimental", "experimental", 
                                                                                                                                                                      "experimental", "experimental", "experimental", "experimental", 
                                                                                                                                                                      "experimental", "experimental", "experimental", "experimental", 
                                                                                                                                                                      "experimental", "experimental", "experimental", "experimental", 
                                                                                                                                                                      "experimental", "experimental", "experimental", "experimental", 
                                                                                                                                                                      "experimental", "experimental", "experimental", "experimental", 
                                                                                                                                                                      "experimental", "experimental", "experimental", "experimental", 
                                                                                                                                                                      "experimental", "experimental", "experimental", "experimental", 
                                                                                                                                                                      "experimental", "experimental", "experimental", "experimental", 
                                                                                                                                                                      "experimental", "experimental", "experimental"), Trial = c(1, 
                                                                                                                                                                                                                                 2, 3, 4, 5, 6, 7, 8, 9, 11, 13, 14, 15, 16, 18, 1, 2, 3, 4, 5, 
                                                                                                                                                                                                                                 6, 7, 8, 9, 11, 13, 14, 15, 16, 18, 1, 2, 3, 4, 5, 6, 7, 8, 9, 
                                                                                                                                                                                                                                 11, 13, 14, 15, 16, 18), Congruency = c("Congruent", "Incongruent", 
                                                                                                                                                                                                                                                                         "Congruent", "Congruent", "Congruent", "Incongruent", "Incongruent", 
                                                                                                                                                                                                                                                                         "Incongruent", "Incongruent", "Congruent", "Incongruent", "Incongruent", 
                                                                                                                                                                                                                                                                         "Incongruent", "Congruent", "Incongruent", "Congruent", "Incongruent", 
                                                                                                                                                                                                                                                                         "Congruent", "Congruent", "Congruent", "Incongruent", "Incongruent", 
                                                                                                                                                                                                                                                                         "Incongruent", "Incongruent", "Congruent", "Incongruent", "Incongruent", 
                                                                                                                                                                                                                                                                         "Incongruent", "Congruent", "Incongruent", "Congruent", "Incongruent", 
                                                                                                                                                                                                                                                                         "Congruent", "Congruent", "Congruent", "Incongruent", "Incongruent", 
                                                                                                                                                                                                                                                                         "Incongruent", "Incongruent", "Congruent", "Incongruent", "Incongruent", 
                                                                                                                                                                                                                                                                         "Incongruent", "Congruent", "Incongruent"), Acc = c(1, 1, 1, 
                                                                                                                                                                                                                                                                                                                             1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
                                                                                                                                                                                                                                                                                                                             1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
                                                                                                                                                                                                                                                                         ), RT = c(549, 510, 106, 568, 401, 458, 494, 876, 567, 444, 507, 
                                                                                                                                                                                                                                                                                   658, 613, 150, 513, 549, 510, 476, 900, 401, 458, 494, 876, 567, 
                                                                                                                                                                                                                                                                                   444, 507, 658, 778, 129, 13, 549, 510, 476, 568, 401, 458, 494, 
                                                                                                                                                                                                                                                                                   876, 567, 444, 507, 658, 613, 529, 513)), row.names = c(NA, -45L
                                                                                                                                                                                                                                                                                   ), spec = structure(list(cols = list(Experiment = structure(list(), class = c("collector_character", 
                                                                                                                                                                                                                                                                                                                                                                 "collector")), Subject = structure(list(), class = c("collector_double", 
                                                                                                                                                                                                                                                                                                                                                                                                                      "collector")), Block = structure(list(), class = c("collector_double", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                         "collector")), practice_experimental = structure(list(), class = c("collector_character", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            "collector")), Trial = structure(list(), class = c("collector_double", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               "collector")), Congruency = structure(list(), class = c("collector_character", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       "collector")), Acc = structure(list(), class = c("collector_double", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        "collector")), RT = structure(list(), class = c("collector_double", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        "collector"))), default = structure(list(), class = c("collector_guess", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              "collector")), delim = ","), class = "col_spec"), class = c("spec_tbl_df", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             "tbl_df", "tbl", "data.frame"))
df1
#> # A tibble: 45 × 8
#>    Experiment   Subject Block practice_experimental Trial Congruency   Acc    RT
#>    <chr>          <dbl> <dbl> <chr>                 <dbl> <chr>      <dbl> <dbl>
#>  1 Bayesmicro-…       1     2 experimental              1 Congruent      1   549
#>  2 Bayesmicro-…       1     2 experimental              2 Incongrue…     1   510
#>  3 Bayesmicro-…       1     2 experimental              3 Congruent      1   106
#>  4 Bayesmicro-…       1     2 experimental              4 Congruent      1   568
#>  5 Bayesmicro-…       1     2 experimental              5 Congruent      1   401
#>  6 Bayesmicro-…       1     2 experimental              6 Incongrue…     1   458
#>  7 Bayesmicro-…       1     2 experimental              7 Incongrue…     1   494
#>  8 Bayesmicro-…       1     2 experimental              8 Incongrue…     1   876
#>  9 Bayesmicro-…       1     2 experimental              9 Incongrue…     1   567
#> 10 Bayesmicro-…       2     2 experimental             11 Congruent      1   444
#> # ℹ 35 more rows

df1 %>%
  left_join(df2, by = 'Subject') %>% 
  filter(RTLower < RT & RT < RTUpper)
#> # A tibble: 34 × 12
#>    Experiment   Subject Block practice_experimental Trial Congruency   Acc    RT
#>    <chr>          <dbl> <dbl> <chr>                 <dbl> <chr>      <dbl> <dbl>
#>  1 Bayesmicro-…       1     2 experimental              1 Congruent      1   549
#>  2 Bayesmicro-…       1     2 experimental              2 Incongrue…     1   510
#>  3 Bayesmicro-…       1     2 experimental              4 Congruent      1   568
#>  4 Bayesmicro-…       1     2 experimental              5 Congruent      1   401
#>  5 Bayesmicro-…       1     2 experimental              6 Incongrue…     1   458
#>  6 Bayesmicro-…       1     2 experimental              7 Incongrue…     1   494
#>  7 Bayesmicro-…       1     2 experimental              9 Incongrue…     1   567
#>  8 Bayesmicro-…       2     2 experimental             11 Congruent      1   444
#>  9 Bayesmicro-…       2     2 experimental             13 Incongrue…     1   507
#> 10 Bayesmicro-…       2     2 experimental             15 Incongrue…     1   613
#> # ℹ 24 more rows
#> # ℹ 4 more variables: Mean_RT <dbl>, SD_RT <dbl>, RTUpper <dbl>, RTLower <dbl>

^{Created on 2023-05-01 with reprex v2.0.2}

Hi Seth, thanks! My only question is, the filtering needs to be done by Subject. That's it, all the RT values for Subject 1 in df1 should be filtered by the Upper & Lower limits calculated for Subject 1 in the df2, and the same for all the remaining subjects. Does your code take this into account? I'm not very familiar with the "left_join" function. Thanks
Hi Mikel - I've updated the answer with some made up data to make it more apparent whats happening. I hope this helps!
Hi Seth, thank you so much for this. It seems to be working fine, but as you say I'll double check that the code is dropping the values correctly. Thanks again!

Collectives™ on Stack Overflow

Filtering data in one data frame based on values from variables in a different data frame

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related