1

I have the following problem (I guess there is an easy answer to it, but I can't figure it out...).

I want to combine multiple columns into a single column. I have 3 variables and there are cases that answered variable 1, cases that answered variable 2, cases that answered variable 3 and cases that answered none of the variables.

Now, I want to combine them all in a single variable, that looks like column vx:

Ideal result: 

  v1 v2 v3 vx
1   1 NA NA  1
2   3 NA NA  3
3   6 NA NA  6
4  NA  5 NA  5
5  NA  1 NA  1
6  NA  3 NA  3
7  NA NA  4  4
8  NA NA  2  2
9  NA NA  1  1
10 NA NA NA NA

v1 <- c(1, 3, 6, NA, NA, NA, NA, NA, NA, NA)
v2 <- c(NA, NA, NA, 5, 1, 3, NA, NA, NA, NA)
v3 <- c(NA, NA, NA, NA, NA, NA, 4, 2, 1, NA)

df <- data.frame(v1, v2, v3)

I tried it with df$vx <- paste(df$v1, df$v2, df$v3) but then I get the following result:

My result: 

 v1 v2 v3       vx
1   1 NA NA  1 NA NA
2   3 NA NA  3 NA NA
3   6 NA NA  6 NA NA
4  NA  5 NA  NA 5 NA
5  NA  1 NA  NA 1 NA
6  NA  3 NA  NA 3 NA
7  NA NA  4  NA NA 4
8  NA NA  2  NA NA 2
9  NA NA  1  NA NA 1
10 NA NA NA NA NA NA

Can someone tell me how I get a result like the one above (ideal result) without the NAs (except if there are only NAs then I would like to have only one NA in column vx)

I hope I made clear what my issue is.

Thanks a lot!

3 Answers 3

2

That is what dplyr::coalesce was made for:

library(dplyr)
df$v4 <- coalesce(!!!df)

#Also works:
df %>% 
  mutate(v4 = coalesce(v1, v2, v3))

output

   v1 v2 v3 v4
1   1 NA NA  1
2   3 NA NA  3
3   6 NA NA  6
4  NA  5 NA  5
5  NA  1 NA  1
6  NA  3 NA  3
7  NA NA  4  4
8  NA NA  2  2
9  NA NA  1  1
10 NA NA NA NA
Sign up to request clarification or add additional context in comments.

Comments

1

Using apply()

# Your data.
v1 = c(1, 3, 6, NA, NA, NA, NA, NA, NA, NA)
v2 = c(NA, NA, NA, 5, 1, 3, NA, NA, NA, NA)
v3 = c(NA, NA, NA, NA, NA, NA, 4, 2, 1, NA)

df = data.frame(v1, v2, v3)
df

# Solution: writing a function to be passed in apply().
useful.function = function(x)
{
  # The input "x" is a row of a dataframe.
  
  # If all the values are NA, return NA.
  if(sum(!is.na(x)) == 0)
    return(NA)
  
  # Otherwise, return the non-NA value.
  return(x[!is.na(x)])
}

df$vx = apply(df, MARGIN = 1, useful.function)
df

Clearly, other solutions may be faster and require less coding (as those relying on the dplyr package, posted by @Maël). However, I really suggest you to get confident in using apply() and the other functions from the same family (see lapply() and sapply()), as they are really flexible (and sometimes you may not be aware of the existence of a certain function or package).

1 Comment

Thank you, I went for the easy sollution with coalesce but nevertheless thanks for the advice! :-)
1

Using max.col in base R -

df$vx <- df[cbind(1:nrow(df), max.col(!is.na(df)))]
df

#   v1 v2 v3 vx
#1   1 NA NA  1
#2   3 NA NA  3
#3   6 NA NA  6
#4  NA  5 NA  5
#5  NA  1 NA  1
#6  NA  3 NA  3
#7  NA NA  4  4
#8  NA NA  2  2
#9  NA NA  1  1
#10 NA NA NA NA

max.col returns the index of max value in each row. With !is.na(df) we'll get the index of TRUE value in each row because TRUE > FALSE. We create a matrix with cbind to extract the max value from each row.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.