5

Let's say I have one dataframe that represents a subset brief. Let's call this dataframe brief. Here, the values are the subset rules.

library(tidyverse)

brief <- data.frame(apple = 1, orange = 2, pear = 3)

In a second dataframe, called types, this is the data I'd like to subset.

types <- data.frame(apple_cake = rnorm(5,0,1),
                    apple_pie = rnorm(5,0, 1),
                    apple_ice = rnorm(5,0, 1),
                    orange_cake = rnorm(5,0,1),
                    orange_pie = rnorm(5, 0, 1),
                    orange_ice = rnorm(5,0, 1),
                    pear_cake = rnorm(5,0,1),
                    pear_pie = rnorm(5, 0, 1),
                    pear_ice = rnorm(5,0, 1)
)

In my example, I'd like to subset specific apple, orange, and pear food items using dataframe one against dataframe two. I'd like to use the values from the first dataframe brief to subset the columns from the second dataframe types.

I would like to end up with a final dataframe as follows:

final <- types %>% 
  select(apple_cake, orange_pie, pear_ice)

In the final solution, apple item 1, orange item 2, and pear item 3 remain.

I've tried various dplyr functions but to no avail. I have also looked for similar solutions but do not think they address my problem because these examples might have similar column names across two dataframes.

1 Answer 1

2

In base R, we split the data columns into a list of data.frame having similar prefix by splitting on the suffix removed column names, Extract ([) the columns from the corresponding index in brief data with Map and cbind the columns in the list with do.call

do.call(cbind,  Map(`[`, split.default(types, trimws(names(types), 
       whitespace = "_.*")), brief))

-output

 apple_cake   orange_pie   pear_ice
1  0.9065347  0.661994002  0.1988529
2  1.0429968  0.736754956  0.9714787
3  0.5517950 -0.003860467  0.4486231
4 -0.4304989  1.160817987 -0.3410640
5 -0.1701651  0.489398131 -1.1846220

Or another option is to loop over the data.frame brief with imap, select the columns with a select-helper (starts_with) on the names of the brief data column (.y) and then select based on the value index of the brief column (.x)

library(dplyr)
library(purrr)
imap_dfc(brief, ~ types %>%
     select(starts_with(.y)) %>% 
     select(all_of(.x)))
  apple_cake   orange_pie   pear_ice
1  0.9065347  0.661994002  0.1988529
2  1.0429968  0.736754956  0.9714787
3  0.5517950 -0.003860467  0.4486231
4 -0.4304989  1.160817987 -0.3410640
5 -0.1701651  0.489398131 -1.1846220
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for this! I particularly like the tidyverse solution because I can see the steps clearly. Because I had created an object with a similar name as a tidyverse function, I have now edited my original post and have named the object brief.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.