How do I subset a dataframe using values from another dataframe?

Question

Let's say I have one dataframe that represents a subset brief. Let's call this dataframe brief. Here, the values are the subset rules.

library(tidyverse)

brief <- data.frame(apple = 1, orange = 2, pear = 3)

In a second dataframe, called types, this is the data I'd like to subset.

types <- data.frame(apple_cake = rnorm(5,0,1),
                    apple_pie = rnorm(5,0, 1),
                    apple_ice = rnorm(5,0, 1),
                    orange_cake = rnorm(5,0,1),
                    orange_pie = rnorm(5, 0, 1),
                    orange_ice = rnorm(5,0, 1),
                    pear_cake = rnorm(5,0,1),
                    pear_pie = rnorm(5, 0, 1),
                    pear_ice = rnorm(5,0, 1)
)

In my example, I'd like to subset specific apple, orange, and pear food items using dataframe one against dataframe two. I'd like to use the values from the first dataframe brief to subset the columns from the second dataframe types.

I would like to end up with a final dataframe as follows:

final <- types %>% 
  select(apple_cake, orange_pie, pear_ice)

In the final solution, apple item 1, orange item 2, and pear item 3 remain.

I've tried various dplyr functions but to no avail. I have also looked for similar solutions but do not think they address my problem because these examples might have similar column names across two dataframes.

Dec · Accepted Answer · 2022-12-19 16:44:43Z

2

In base R, we split the data columns into a list of data.frame having similar prefix by splitting on the suffix removed column names, Extract ([) the columns from the corresponding index in brief data with Map and cbind the columns in the list with do.call

do.call(cbind,  Map(`[`, split.default(types, trimws(names(types), 
       whitespace = "_.*")), brief))

-output

 apple_cake   orange_pie   pear_ice
1  0.9065347  0.661994002  0.1988529
2  1.0429968  0.736754956  0.9714787
3  0.5517950 -0.003860467  0.4486231
4 -0.4304989  1.160817987 -0.3410640
5 -0.1701651  0.489398131 -1.1846220

Or another option is to loop over the data.frame brief with imap, select the columns with a select-helper (starts_with) on the names of the brief data column (.y) and then select based on the value index of the brief column (.x)

library(dplyr)
library(purrr)
imap_dfc(brief, ~ types %>%
     select(starts_with(.y)) %>% 
     select(all_of(.x)))
  apple_cake   orange_pie   pear_ice
1  0.9065347  0.661994002  0.1988529
2  1.0429968  0.736754956  0.9714787
3  0.5517950 -0.003860467  0.4486231
4 -0.4304989  1.160817987 -0.3410640
5 -0.1701651  0.489398131 -1.1846220

edited Dec 19, 2022 at 16:44

Dec

574 bronze badges

answered Dec 12, 2022 at 16:49

akrun

891k38 gold badges590 silver badges700 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Dec Over a year ago

Thanks for this! I particularly like the tidyverse solution because I can see the steps clearly. Because I had created an object with a similar name as a tidyverse function, I have now edited my original post and have named the object brief.

Collectives™ on Stack Overflow

How do I subset a dataframe using values from another dataframe?

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related