0

I have a large spreaded data frame:

df: a1 a2 a3 a4 a5 ...............
    r  w  sd w  y ........

I have another input which is a subset of df.

subset_df: a3 a4 a5
           f  e  u 

My goal is to take the column names of subset_df, select these columns in df and continue from there (in my case to compare the values).

When I do this the simple way:

df[,names(subset_df)] it works, but why it refuses to work with dplyr select?

Here is the error when running:

names_sub_df <- names(subset_df)
df %>% select(names_sub_df)


Error: All select() inputs must resolve to integer column positions.
The following do not:
*  as.vector(names_sub_df)

Here is a reproducible example:

key <- c("a1", "a2", "a3", "a4", "a5", "a6", "a7", "a8", "a9", "a10", "a11", "a12", "a13", "a14", "a15", "a16", "a17", "a18")

value <- c("G", "CTT", "C", "C", "G", "C", "T", "C", "C", "C", "G", "T", "C", "G", "T", "A", "T", "G")


test2 <- data.frame(key, value, stringsAsFactors = FALSE)

library(tidyr)

4
  • 2
    Perhaps, a reproducible example would help... Commented Nov 26, 2017 at 10:18
  • @Christoph updating my question, sorry. Commented Nov 26, 2017 at 11:25
  • @Christoph please tell me why the error occurs? What am I missing? Commented Nov 26, 2017 at 11:50
  • Possible duplicate of Pass a vector of variable names to arrange() in dplyr Commented Nov 26, 2017 at 11:57

2 Answers 2

2

In the absence of a minimal reproducible example using mtcars as an example.

You can wrap your subset dataframe in colnames so select uses the names, not the whole dataframe, for the subsetting:

mtcars
subset_mtcars = c("hp", "drat", "wt")
subset_mtcars = mtcars[, subset_mtcars]
subset_mtcars

library("tidyverse")    
mtcars %>% 
  select(colnames(subset_mtcars))

#                      hp drat    wt
# Mazda RX4           110 3.90 2.620
# Mazda RX4 Wag       110 3.90 2.875
# Datsun 710           93 3.85 2.320
# ...
Sign up to request clarification or add additional context in comments.

5 Comments

thanks a lot for the answer, I am sorry for not providing the example, I thought my explanation will be enough, apologizing. Please be so kind and tell me why the error occured?
Because you are passing a list of quoted strings and select wants unquoted names. There are a number of answers to this question if you search.
@elin I don't think that's quite right; I think the OP was trying to pass a data frame to select. Admittedly that's still not the correct structure 😀
@Elin it worked perfectly before on my pc, when I have transferred the same code to my work pc, it failed. Will check again.
I would also suggest always using the namespace notation when you are moving things to different computers since you may or may not have dplyr loaded. @Phil names_sub_df <- names(subset_df) is a character vector. Here's another duplicate with a select() example stackoverflow.com/questions/33284790/….
0

From your example I am not sure whether you are looking for selecting columns or the values in a column. If you are searching for the latter, the following will do the job:

subset_df <- c("a3", "a4", "a5")
test2[test2$key %in% subset_df, ]

1 Comment

I am searching for column names in the big dataframe, I want to subset using dplyr select.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.