Selecting specific columns from a data frame using data from another table

Question

I have a table with ~8000 observations and 65 variables. I have another table with 35 observations ad 11 variables.

The larger table looks like this: portion of the larger table

and the smaller table looks like this: portion of the smaller table

As you can see, the first column of the smaller table contains some of the column names of the larger table. How can I, in a way more compact than simply writing out which columns I want to select, make R create a table that has the data in the larger table with only the columns specified in the smaller table?

Any help would be greatly appreciated!

UPDATE: Thank you to the answerer for the data. I was wondering if it would be possible to match the order of the columns in the large.df with the order the names appear in the small.df

large.df <- data.frame(A=rnorm(5), B=abs(rnorm(5, sd=0.08)),
             C=rnorm(5), D=abs(rnorm(5, sd=0.08)))


        A           B          C          D
1  0.2367193 0.002297593 -0.1958682 0.03877595
2 -1.2419638 0.034031808  0.3253622 0.02578829
3 -0.2718915 0.188627689  0.4844783 0.04405741
4 -0.6587699 0.024045926 -1.1209473 0.09849541
5  1.7890422 0.055520325  0.1093612 0.11637796

samll.df <- data.frame(Category = c("B","D"))
samll.df

  Category
1        D
2        B

I would like the output to have the columns ordered 'D', 'B', not 'B', 'D'. My example has ~35 columns so a way that is more compact than typing out the column names in the desired order would be ideal. Thank you

try to provide a reproducible example.

user5249203
– user5249203

2016-06-01 16:15:56 +00:00
Commented Jun 1, 2016 at 16:15 — user5249203
– user5249203, Commented Jun 1, 2016 at 16:15

Sowmya S. Manian · Accepted Answer · 2016-06-01 16:11:33Z

1

Use %in%

  > a <- data.frame(A=1:10,B=11:20,C=1:10)   # Small data frame
  > b <- data.frame(A=1:10,D=11:20,C=21:30,E=41:50) # Big data frame

  # Column names common are A and C
  > R <- b[,names(b) %in% names(a)]
  > R
      A  C
  1   1 21
  2   2 22
  3   3 23
  4   4 24
  5   5 25
  6   6 26
  7   7 27
  8   8 28
  9   9 29
  10 10 30

answered Jun 1, 2016 at 16:11

Sowmya S. Manian

3,8533 gold badges21 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Noah Over a year ago

thank you, the %in% function was a big part of what I was looking for

user5249203 · Accepted Answer · 2016-06-01 18:13:46Z

0

cols.small_table<-as.character(samll.df$Category)

Solution:1 # to have the same order as the small.df

# order columns in large.df based on cols.small_table and subset data
large.df[ ,match(cols.keep, names(large.df))]
            D           B
1 0.0007403109 0.080096733
2 0.0528159794 0.045623426
3 0.0327912984 0.038420719
4 0.0976794958 0.108335834
5 0.0974624753 0.008220431

Solution 2

# Keep the columns in large table based on match in small table 
large.df[ , which(names(large.df) %in% cols.small_table)] 
            B          D
1 0.002297593 0.03877595
2 0.034031808 0.02578829
3 0.188627689 0.04405741
4 0.024045926 0.09849541
5 0.055520325 0.11637796

# Remove the columns in large table based on match in small table
large.df[ , -which(names(large.df) %in% cols.small_table)] 

           A          C
1  0.2367193 -0.1958682
2 -1.2419638  0.3253622
3 -0.2718915  0.4844783
4 -0.6587699 -1.1209473
5  1.7890422  0.1093612

Data

large.df <- data.frame(A=rnorm(5), B=abs(rnorm(5, sd=0.08)),
                 C=rnorm(5), D=abs(rnorm(5, sd=0.08)))


            A           B          C          D
1  0.2367193 0.002297593 -0.1958682 0.03877595
2 -1.2419638 0.034031808  0.3253622 0.02578829
3 -0.2718915 0.188627689  0.4844783 0.04405741
4 -0.6587699 0.024045926 -1.1209473 0.09849541
5  1.7890422 0.055520325  0.1093612 0.11637796

samll.df <- data.frame(Category = c("D","B"))
samll.df

  Category
1        D
2        B

edited Jun 1, 2016 at 18:13

answered Jun 1, 2016 at 16:31

user5249203

4,6781 gold badge22 silver badges52 bronze badges

7 Comments

Noah Over a year ago

fantastic! your cols.small_table solution was exactly what I was looking for, thank you!

Noah Over a year ago

I have a followup question - is there any way to match the order of the columns in large.df with the order in small.df? Thank you!

user5249203 Over a year ago

There will be certainly a way, but I cannot understand your goal unless it is worked out in a reproducible example and you display your expected output. if it is a related Q, update your Q with what you expect the output to look like, else open a new Q.

Noah Over a year ago

would you mind if I updated my question using the dataset you created in the solution?

user5249203 Over a year ago

feel free to use it

|

Collectives™ on Stack Overflow

Selecting specific columns from a data frame using data from another table

2 Answers 2

1 Comment

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related