Manipulate dataframe columns by ID in column name

Question

I have a dataset that looks like this:

DT <- data.frame( rnorm(5),
              rnorm(5),
              rnorm(5),
              rnorm(5),
              rnorm(5),
              rnorm(5))
names(DT) = c('a1[1]','a1[2]','a1[3]','a2[1]','a2[2]','a2[3]')
str(DT)

I would like to create new columns like:

diffa1 = a1[1] - a2[1] 
diffa2 = a1[2] - a2[2]
diffa3 = a1[3] - a2[3]

I am wondering if there is anyway to do it without having to manually mutate through the IDs in the brackets because I have a1[1] up to a1[100], a2[1] up to a2[100], etc. Thanks!

I'd avoid square brackets in column names. Use something like a1_1. And you can assign names in data.frame, no need to add them later: data.frame(a1_1 = rnorm(5). For adding columns look at dplyr::mutate. — neilfws
– neilfws, Commented Aug 15, 2018 at 0:08
1. Thank you for pointing out, the data was imported elsewhere and the brackets were already in the column names so I had to deal with it. 2. To replicate how the column names actually looked like, I had to assign brackets to the names, which cannot be done in the data.frame command (will be coerced to a.1.1). — helen
– helen, Commented Aug 15, 2018 at 0:20
dplyr::rename is another friend for those issues. Basically, if you're dealing with data frames, it's worth getting to know dplyr. — neilfws
– neilfws, Commented Aug 15, 2018 at 0:22

www · Accepted Answer · 2018-08-15 00:06:49Z

We can use lapply to loop through the numbers in your column name.

diffa <- as.data.frame(lapply(1:3, function(x){
  DT[paste0("a1[", x, "]")] - DT[paste0("a2[", x, "]")] 
}))
diffa
#        a1.1.      a1.2.       a1.3.
# 1  0.9160836 -0.3508354  0.04981186
# 2  0.7397111  1.9147110 -1.47307780
# 3  0.6889159 -0.7672135 -4.24234927
# 4 -0.2701030 -1.3199004  2.55248732
# 5  1.2267170 -2.0815192 -1.97941609

Or use grepl to select columns to create two data frames, and then conduct the operation.

DT1 <- DT[grep("^a1", names(DT))]
DT2 <- DT[grep("^a2", names(DT))]
diffa <- DT1 - DT2
diffa
#        a1[1]      a1[2]       a1[3]
# 1  0.9160836 -0.3508354  0.04981186
# 2  0.7397111  1.9147110 -1.47307780
# 3  0.6889159 -0.7672135 -4.24234927
# 4 -0.2701030 -1.3199004  2.55248732
# 5  1.2267170 -2.0815192 -1.97941609

DATA

set.seed(158)

DT <- data.frame( rnorm(5),
                  rnorm(5),
                  rnorm(5),
                  rnorm(5),
                  rnorm(5),
                  rnorm(5))
names(DT) = c('a1[1]','a1[2]','a1[3]','a2[1]','a2[2]','a2[3]')

akrun · Accepted Answer · 2018-08-15 01:07:58Z

1

Here is another option with map2 to subtract the corresponding columns

library(tidyverse)
map2_df(DT %>% 
          select(matches("a1")),
        DT %>% 
          select(matches("a2")), `-`)

answered Aug 15, 2018 at 1:07

akrun

891k38 gold badges590 silver badges700 bronze badges

Collectives™ on Stack Overflow

Manipulate dataframe columns by ID in column name

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related