In R: Replace values of a row if missing with values of another row

Question

I am relatively new to R and probably the solution to this problem is rather simple.

I have a dataframe that looks like this:

id1    id2    v1    v2    v3    ...    v100
  A      X     1    NA    NA    ...       1
  B      Y     1     3     4    ...       1
  C      X     1     3     4    ...       1
  D      X     1     3     4    ...       1
  E      Y     1     3     4    ...       1
  A      X    NA     3     4    ...      NA

What I would like to do is to 'merge' two observations with the same id (id1 and id2) to one observation. The missing values of an observation should be replaced by the values of the other observation.

For example in the dataframe from above these are 'observation 1' and 'observation 6' and the result should look something like this:

id1    id2    v1    v2    v3    ...    v100
  A      X     1     3     4    ...       1
  B      Y     1     3     4    ...       1
  C      X     1     3     4    ...       1
  D      X     1     3     4    ...       1
  E      Y     1     3     4    ...       1

Currently I am using loops for this and I know it is very slow and probably not the best solution. I have more than 1000 observations with approximately 100 duplicate observations and a few thousand variables. If anyone could provide an idea how to speed up things, I would be really happy.

Many thanks in advance!

Edit: 03/10/2014

Many thanks for all the helpful comments! The answer by David Armstrong is what I wanted! Thank you so much!

I am sorry for being not precisely enough in my first post, so here are some specifications.

Observations with identical ids can occur multiple times in the dataset and not only twice.

Further, of all those identical observations only one observation will have a non-missing value per variable (if it all). It can also be the case that all observations of a variable are missing, but it can never be the caset that two observaions have a non-missing value. The following example might make things more clearer.

id1    id2    v1    v2    v3    v4    v5    v6    v7
  A      X     6     9     3     1     2     1     1
  B      X     2     2     1     4     2     3     3
  C      X     1     6     7     1     3     4     5
  D      X     4     2     9     2     3     6     2
  E      X    NA     3    NA    NA    NA    NA    NA
  E      X    NA    NA     4    NA    NA    NA    NA
  E      X    NA    NA    NA     3    NA    NA    NA
  E      X    NA    NA    NA    NA     6    NA    NA
  E      X    NA    NA    NA    NA    NA     4    NA
  E      X    NA    NA    NA    NA    NA    NA     1

And the result I would like to have would be:

id1    id2    v1    v2    v3    v4    v5    v6    v7
  A      X     6     9     3     1     2     1     1
  B      X     2     2     1     4     2     3     3
  C      X     1     6     7     1     3     4     5
  D      X     4     2     9     2     3     6     2
  E      X    NA     3     4     3     6     4     1

I hope this helps.

Thank you very much!

Can we assume that there are always pairs of observations with missing values such that missing values of one observation are always values in the other observation and the other way around? E.g., can we do something like x[is.na(x)] <- na.omit(y)? — Roland
– Roland, Commented Oct 2, 2014 at 13:56
@vandm It is not clear about how you want to summarise the rows with the same groups that have non-missing values. In the example you provided, the values are just identical, which may not be the case in your original dataset. What if there are triplicates etc.? — akrun
– akrun, Commented Oct 2, 2014 at 15:28
@vandm, you don't need to create a completely new account in here. Just add another account to your already existing CrossValidated account — David Arenburg
– David Arenburg, Commented Oct 4, 2014 at 23:39

David Arenburg · Accepted Answer · 2014-10-02 15:37:07Z

2

Also, maybe

library(data.table)
setDT(df)[, lapply(.SD, na.omit), by = list(id1, id2)]
#    id1 id2 v1 v2 v3 v100
# 1:   A   X  1  3  4    1
# 2:   B   Y  1  3  4    1
# 3:   C   X  1  3  4    1
# 4:   D   X  1  3  4    1
# 5:   E   Y  1  3  4    1

If we can't always assume that there missing values (like mentioned in @Rolands comment), you can add unique (if you always want only one pair). Something like

unique(setDT(df)[, lapply(.SD, na.omit), by = list(id1, id2)])

answered Oct 2, 2014 at 15:37

David Arenburg

92.4k18 gold badges145 silver badges202 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

David Arenburg Over a year ago

Thanks @akrun, it is actually hard to tell what they exactly want, so added unique too

akrun · Accepted Answer · 2014-10-02 15:49:02Z

1

Try:

library(dplyr) 
df %>%
    group_by(id1, id2) %>%
    summarise_each(funs(mean=mean(., na.rm=TRUE)))

#    id1 id2 v1 v2 v3
# 1   A   X  1  3  4
# 2   B   Y  1  3  4
# 3   C   X  1  3  4
# 4   D   X  1  3  4
# 5   E   Y  1  3  4

Or perhaps

df %>% 
    group_by(id1, id2) %>%
    mutate_each(funs(replace(., is.na(.), stats::na.omit(.)))) %>%
    unique()

data

df <- structure(list(id1 = c("A", "B", "C", "D", "E", "A"), id2 = c("X", 
"Y", "X", "X", "Y", "X"), v1 = c(1L, 1L, 1L, 1L, 1L, NA), v2 = c(NA, 
3L, 3L, 3L, 3L, 3L), v3 = c(NA, 4L, 4L, 4L, 4L, 4L)), .Names = c("id1", 
"id2", "v1", "v2", "v3"), class = "data.frame", row.names = c(NA, 
-6L))

edited Oct 2, 2014 at 15:49

answered Oct 2, 2014 at 15:19

akrun

891k38 gold badges590 silver badges700 bronze badges

Comments

rnso · Accepted Answer · 2014-10-02 15:24:35Z

0

If ddf is your data frame:

> t(sapply(split(ddf, paste(ddf$id1, ddf$id2)), 
           function(x) sapply(x[3:ncol(ddf)], sum, na.rm=T)))
    v1 v2 v3 v4
A X  1  3  4  1
B Y  1  3  4  1
C X  1  3  4  1
D X  1  3  4  1
E Y  1  3  4  1

edited Oct 2, 2014 at 15:24

answered Oct 2, 2014 at 15:18

rnso

24.7k26 gold badges127 silver badges270 bronze badges

Collectives™ on Stack Overflow

In R: Replace values of a row if missing with values of another row

3 Answers 3

1 Comment

data

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

data

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related