1

I have 3 .csv files that I need to analyse in R. File one contains columns with user id and signupdate. File two contains columns with user id, purchase date and amount of purchases. File three contains columns with user id, message date and number of messages.

Please note that the order of the user id is not the same in each of the three files, thus cop.

Would love some help merging these files so that the large dataset has order user id, signupdate, purchase date, amount of purchases, message date and number of messages. Can't seem to find code to do this in R Thanks in advance

3
  • I'm assuming you know how to read them in separately. It would be a good idea to post some sample data so it would be easier to offer specific suggestions. Commented Aug 21, 2014 at 2:30
  • @MatthewLundberg: Thanks for correcting. One might do: df=merge(df1,df2); df=merge(df,df3) Commented Aug 21, 2014 at 3:09
  • @rnso Of course. That's what Reduce will do. Commented Aug 21, 2014 at 3:29

1 Answer 1

3

While merge doesn't take three arguments, Reduce is made for the task of iterating over a list and passing pairs to a function. Here's an example of a three-way merge:

d1 <- data.frame(id=letters[1:3], x=2:4)
d2 <- data.frame(id=letters[3:1], y=5:7)
d3 <- data.frame(id=c('b', 'c', 'a'), z=c(5,6,8))
Reduce(merge, list(d1, d2, d3))
##   id x y z
## 1  a 2 7 8
## 2  b 3 6 5
## 3  c 4 5 6

Note that the order of the column id is not the same, but the values are matched.

In the case where you have non-matching data and want to keep all possible rows, you need an outer join, by supplying all=TRUE to merge. As Reduce does not have a way to pass additional arguments to the function, a new function must be created to call merge:

d1 <- data.frame(id=letters[1:3], x=2:4)
d2 <- data.frame(id=letters[3:1], y=5:7)
d3 <- data.frame(id=c('b', 'c', 'd'), z=c(5,6,8))
Reduce(function(x,y) merge(x,y,all=TRUE), list(d1, d2, d3))
##   id  x  y  z
## 1  a  2  7 NA
## 2  b  3  6  5
## 3  c  4  5  6
## 4  d NA NA  8

NA is used to indicate data in non-matched rows.

Sign up to request clarification or add additional context in comments.

1 Comment

Hi, I just played with your sample and have a question. I changed one of the ids in d3 from a to d and ran the script. I did not see rows for id a and d in the outcome. If you want to preserve all ids in the outcome, what could we do?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.