2

I have two data tables, dat and dat2. There are a few columns that appear in both tables, though the values are not necessarily the same in each.

When I merge the two tables using dat[dat2] everything works as expected, except that I have some duplicate column names. For instance, a column named Status appears in both tables and, when merged, the column from dat2 appears as i.Status. Rather than renaming these columns, I simply want to drop them from the table altogether. What is the simplest way to do this?

3
  • 1
    You can drop them after the merge or you can subset during the merge. Commented Jun 29, 2016 at 16:27
  • 2
    Simplest way is to provide columns which you want to keep from join to j argument. Commented Jun 29, 2016 at 16:33
  • I'm trying to avoid manually typing out all of the column names. I can drop columns that start with i. after the merge, but that's a hack Commented Jun 29, 2016 at 16:59

1 Answer 1

2

Below is some code to illustrate methods for the two scenarios I mentioned, though there may be some fancier (more efficient) data.table (version 1.9.6) methods.

Both methods will dynamically adapt to the variable overlap, so you don't have to worry about manually typing out the names.

# get some data
set.seed(1234)
dt <- data.table(id=1:10, a=letters[1:10], b=rnorm(10), d=rnorm(10))
dt2 <- data.table(id=1:10, a=letters[5:14], c=rnorm(10), d=rnorm(10))

Here's the data without dropping:

dt[dt2, on="id"]

    id a          b           d i.a          c        i.d
 1:  1 a -1.2070657 -0.47719270   e  0.1340882  1.1022975
 2:  2 b  0.2774292 -0.99838644   f -0.4906859 -0.4755931
 3:  3 c  1.0844412 -0.77625389   g -0.4405479 -0.7094400
 4:  4 d -2.3456977  0.06445882   h  0.4595894 -0.5012581
 5:  5 e  0.4291247  0.95949406   i -0.6937202 -1.6290935
 6:  6 f  0.5060559 -0.11028549   j -1.4482049 -1.1676193
 7:  7 g -0.5747400 -0.51100951   k  0.5747557 -2.1800396
 8:  8 h -0.5466319 -0.91119542   l -1.0236557 -1.3409932
 9:  9 i -0.5644520 -0.83717168   m -0.0151383 -0.2942939
10: 10 j -0.8900378  2.41583518   n -0.9359486 -0.4658975

method 1: subset during the merge / join using the intersect and mget functions.

# assuming your id variable is the first column in both sets:
dropVars <- intersect(names(dt), names(dt2))[-1]

dt[dt2[, mget(names(dt2)[-which(names(dt2) %in% dropVars)])], on="id"]

method 2: drop after merge using grep

dt3 <- dt[dt2, on="id"]
dt3[, grep("^i\\.", names(dt3), value=TRUE) := NULL]

Both of these methods return

    id a          b           d          c
 1:  1 a -1.2070657 -0.47719270  0.1340882
 2:  2 b  0.2774292 -0.99838644 -0.4906859
 3:  3 c  1.0844412 -0.77625389 -0.4405479
 4:  4 d -2.3456977  0.06445882  0.4595894
 5:  5 e  0.4291247  0.95949406 -0.6937202
 6:  6 f  0.5060559 -0.11028549 -1.4482049
 7:  7 g -0.5747400 -0.51100951  0.5747557
 8:  8 h -0.5466319 -0.91119542 -1.0236557
 9:  9 i -0.5644520 -0.83717168 -0.0151383
10: 10 j -0.8900378  2.41583518 -0.9359486
Sign up to request clarification or add additional context in comments.

2 Comments

or keep <- union(names(dt), names(dt2)) ; dt[dt2, mget(keep), on = "id"]
@DavidArenburg That's better, it avoids the irritating -which() syntax.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.