39

I have a data frame with columns that, when concatenated (row-wise) as a string, would allow me to partition the data frame into a desired form.

> str(data)
'data.frame':   680420 obs. of  10 variables:
 $ A              : chr  "2011-01-26" "2011-01-26" "2011-02-09" "2011-02-09" ...
 $ B              : chr  "2011-01-26" "2011-01-27" "2011-02-09" "2011-02-10" ...
 $ C              : chr  "2011-01-26" "2011-01-26" "2011-02-09" "2011-02-09" ...
 $ D              : chr  "AAA" "AAA" "BCB" "CCC" ...
 $ E              : chr  "A00001" "A00002" "B00002" "B00001" ...
 $ F              : int  9 9 37 37 37 37 191 191 191 191 ...
 $ G              : int  NA NA NA NA NA NA NA NA NA NA ...
 $ H              : int  4 4 4 4 4 4 4 4 4 4 ...

For each row, I would like to concatenate the data in columns F, E, D, and C into a string (with the underscore character as separator). Below is my unsuccessful attempt at this:

data$id <- sapply(as.data.frame(cbind(data$F,data$E,data$D,data$C)), paste, sep="_")

And below is the undesired result:

  > str(data)
    'data.frame':   680420 obs. of  10 variables:
     $ A              : chr  "2011-01-26" "2011-01-26" "2011-02-09" "2011-02-09" ...
     $ B              : chr  "2011-01-26" "2011-01-27" "2011-02-09" "2011-02-10" ...
     $ C              : chr  "2011-01-26" "2011-01-26" "2011-02-09" "2011-02-09" ...
     $ D              : chr  "AAA" "AAA" "BCB" "CCC" ...
     $ E              : chr  "A00001" "A00002" "B00002" "B00001" ...
     $ F              : int  9 9 37 37 37 37 191 191 191 191 ...
     $ G              : int  NA NA NA NA NA NA NA NA NA NA ...
     $ H              : int  4 4 4 4 4 4 4 4 4 4 ...
     $ id             : chr [1:680420, 1:4] "9" "9" "37" "37" ...
      ..- attr(*, "dimnames")=List of 2
      .. ..$ : NULL
      .. ..$ : chr  "V1" "V2" "V3" "V4"

Any help would be greatly appreciated.

3 Answers 3

70

Try

 data$id <- paste(data$F, data$E, data$D, data$C, sep="_")

instead. The beauty of vectorized code is that you do not need row-by-row loops, or loop-equivalent *apply functions.

Edit Even better is

 data <- within(data,  id <- paste(F, E, D, C, sep=""))
Sign up to request clarification or add additional context in comments.

7 Comments

Wow. That was quick and correct. It looks like I was over-complicating the process. I am really trying to wean myself off of complete 'loop dependency' and am still learning to appreciate vectorized code.
And try with(data, paste(...)) for even more succinctness.
Correct. Or even within() as the less-loved cousin of with(). Amended my answer accordingly.
So that's how within differs!?! (Why the hell wasn't it named with(..., modify=TRUE) ?)
What if you use data$id <- paste(data[, c(1)], data[,c(2)]) oddly then the code doesn't do what expected. Is there something I'm missing here?
|
13

Use unite of tidyr package:

require(tidyr)
data <- data %>% unite(id, F, E, D, C, sep = '_')

First parameter is the desired name, all next up to sep - columns to concatenate.

1 Comment

When I needed to collapse dynamic vectors by groups, this method was the only one that consistently worked as such mycols <- c("F","E","D","C") data <- data %>% unite_("id", mycols, sep = "_")
7

Either stringr::str_c() or paste() will work.

require(stringr)
data <- within(data, str_c(F,E,D,C, sep="_")

or else

data <- within(data, paste(F,E,D,C, sep="_")

(stringr is better performance on large datasets)

1 Comment

@the serial downvoter: stringr::str_c can be used and noone said that. What's your problem?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.