I am relatively new to R and probably the solution to this problem is rather simple.
I have a dataframe that looks like this:
id1 id2 v1 v2 v3 ... v100
A X 1 NA NA ... 1
B Y 1 3 4 ... 1
C X 1 3 4 ... 1
D X 1 3 4 ... 1
E Y 1 3 4 ... 1
A X NA 3 4 ... NA
What I would like to do is to 'merge' two observations with the same id (id1 and id2) to one observation. The missing values of an observation should be replaced by the values of the other observation.
For example in the dataframe from above these are 'observation 1' and 'observation 6' and the result should look something like this:
id1 id2 v1 v2 v3 ... v100
A X 1 3 4 ... 1
B Y 1 3 4 ... 1
C X 1 3 4 ... 1
D X 1 3 4 ... 1
E Y 1 3 4 ... 1
Currently I am using loops for this and I know it is very slow and probably not the best solution. I have more than 1000 observations with approximately 100 duplicate observations and a few thousand variables. If anyone could provide an idea how to speed up things, I would be really happy.
Many thanks in advance!
Edit: 03/10/2014
Many thanks for all the helpful comments! The answer by David Armstrong is what I wanted! Thank you so much!
I am sorry for being not precisely enough in my first post, so here are some specifications.
Observations with identical ids can occur multiple times in the dataset and not only twice.
Further, of all those identical observations only one observation will have a non-missing value per variable (if it all). It can also be the case that all observations of a variable are missing, but it can never be the caset that two observaions have a non-missing value. The following example might make things more clearer.
id1 id2 v1 v2 v3 v4 v5 v6 v7
A X 6 9 3 1 2 1 1
B X 2 2 1 4 2 3 3
C X 1 6 7 1 3 4 5
D X 4 2 9 2 3 6 2
E X NA 3 NA NA NA NA NA
E X NA NA 4 NA NA NA NA
E X NA NA NA 3 NA NA NA
E X NA NA NA NA 6 NA NA
E X NA NA NA NA NA 4 NA
E X NA NA NA NA NA NA 1
And the result I would like to have would be:
id1 id2 v1 v2 v3 v4 v5 v6 v7
A X 6 9 3 1 2 1 1
B X 2 2 1 4 2 3 3
C X 1 6 7 1 3 4 5
D X 4 2 9 2 3 6 2
E X NA 3 4 3 6 4 1
I hope this helps.
Thank you very much!
x[is.na(x)] <- na.omit(y)?summarisethe rows with the same groups that have non-missing values. In the example you provided, the values are just identical, which may not be the case in your original dataset. What if there are triplicates etc.?