Combining multiple columns/variables into a single column

Question

I have the following problem (I guess there is an easy answer to it, but I can't figure it out...).

I want to combine multiple columns into a single column. I have 3 variables and there are cases that answered variable 1, cases that answered variable 2, cases that answered variable 3 and cases that answered none of the variables.

Now, I want to combine them all in a single variable, that looks like column vx:

Ideal result: 

  v1 v2 v3 vx
1   1 NA NA  1
2   3 NA NA  3
3   6 NA NA  6
4  NA  5 NA  5
5  NA  1 NA  1
6  NA  3 NA  3
7  NA NA  4  4
8  NA NA  2  2
9  NA NA  1  1
10 NA NA NA NA

v1 <- c(1, 3, 6, NA, NA, NA, NA, NA, NA, NA)
v2 <- c(NA, NA, NA, 5, 1, 3, NA, NA, NA, NA)
v3 <- c(NA, NA, NA, NA, NA, NA, 4, 2, 1, NA)

df <- data.frame(v1, v2, v3)

I tried it with df$vx <- paste(df$v1, df$v2, df$v3) but then I get the following result:

My result: 

 v1 v2 v3       vx
1   1 NA NA  1 NA NA
2   3 NA NA  3 NA NA
3   6 NA NA  6 NA NA
4  NA  5 NA  NA 5 NA
5  NA  1 NA  NA 1 NA
6  NA  3 NA  NA 3 NA
7  NA NA  4  NA NA 4
8  NA NA  2  NA NA 2
9  NA NA  1  NA NA 1
10 NA NA NA NA NA NA

Can someone tell me how I get a result like the one above (ideal result) without the NAs (except if there are only NAs then I would like to have only one NA in column vx)

I hope I made clear what my issue is.

Thanks a lot!

Maël · Accepted Answer · 2022-02-09 10:45:12Z

2

That is what dplyr::coalesce was made for:

library(dplyr)
df$v4 <- coalesce(!!!df)

#Also works:
df %>% 
  mutate(v4 = coalesce(v1, v2, v3))

output

   v1 v2 v3 v4
1   1 NA NA  1
2   3 NA NA  3
3   6 NA NA  6
4  NA  5 NA  5
5  NA  1 NA  1
6  NA  3 NA  3
7  NA NA  4  4
8  NA NA  2  2
9  NA NA  1  1
10 NA NA NA NA

answered Feb 9, 2022 at 10:45

Maël

53k6 gold badges47 silver badges85 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

riccardo-df · Accepted Answer · 2022-02-09 10:54:24Z

1

Using apply()

# Your data.
v1 = c(1, 3, 6, NA, NA, NA, NA, NA, NA, NA)
v2 = c(NA, NA, NA, 5, 1, 3, NA, NA, NA, NA)
v3 = c(NA, NA, NA, NA, NA, NA, 4, 2, 1, NA)

df = data.frame(v1, v2, v3)
df

# Solution: writing a function to be passed in apply().
useful.function = function(x)
{
  # The input "x" is a row of a dataframe.
  
  # If all the values are NA, return NA.
  if(sum(!is.na(x)) == 0)
    return(NA)
  
  # Otherwise, return the non-NA value.
  return(x[!is.na(x)])
}

df$vx = apply(df, MARGIN = 1, useful.function)
df

Clearly, other solutions may be faster and require less coding (as those relying on the dplyr package, posted by @Maël). However, I really suggest you to get confident in using apply() and the other functions from the same family (see lapply() and sapply()), as they are really flexible (and sometimes you may not be aware of the existence of a certain function or package).

edited Feb 9, 2022 at 10:54

answered Feb 9, 2022 at 10:49

riccardo-df

5721 gold badge5 silver badges10 bronze badges

1 Comment

Twizzle Over a year ago

Thank you, I went for the easy sollution with coalesce but nevertheless thanks for the advice! :-)

Ronak Shah · Accepted Answer · 2022-02-09 11:03:08Z

1

Using max.col in base R -

df$vx <- df[cbind(1:nrow(df), max.col(!is.na(df)))]
df

#   v1 v2 v3 vx
#1   1 NA NA  1
#2   3 NA NA  3
#3   6 NA NA  6
#4  NA  5 NA  5
#5  NA  1 NA  1
#6  NA  3 NA  3
#7  NA NA  4  4
#8  NA NA  2  2
#9  NA NA  1  1
#10 NA NA NA NA

max.col returns the index of max value in each row. With !is.na(df) we'll get the index of TRUE value in each row because TRUE > FALSE. We create a matrix with cbind to extract the max value from each row.

answered Feb 9, 2022 at 11:03

Ronak Shah

391k20 gold badges173 silver badges237 bronze badges

Collectives™ on Stack Overflow

Combining multiple columns/variables into a single column

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related