8

(Related question that does not include sorting. It's easy to just use paste when you don't need to sort.)

I have a less-than-ideally-structured table with character columns that are generic "item1","item2" etc. I would like to create a new character variable that is the alphabetized, comma-separated concatenation of these columns. So for example, in row 5, if item1 = "milk", item2 = "eggs", and item3 = "butter", the new variable in row 5 might be "butter, eggs, milk"

I wrote a function f() below that works on two character variables. However, I am having trouble

  • Using mapply or other "vectorization" (I know it's really just a for loop)
  • Generalizing the function to an arbitrary number of columns

Any help much appreciated.

df <- data.frame(a =c("foo","bar"), 
                 b= c("baz","qux"))   
paste(df$a,df$b, sep=", ")
# returns [1] "foo, baz" "bar, qux" ... but I want [1] "baz, foo" "bar, qux"

f <- function(a,b) paste(c(a,b)[order(c(a,b))],collapse=", ")
f("foo","baz") 
# returns [1] "baz, foo" ... which is what I want ... how to vectorize?

df$new_var <- mapply(f, df$a, df$b)
df 
#     a   b new_var      <- new_var is not what I want
# 1 foo baz    1, 2
# 2 bar qux    1, 2

# Interestingly, data.table is smart enough to fix my bad mapply
library(data.table)
dt <- data.table(a =c("foo","bar"), 
                 b= c("baz","qux"))  
dt[,new_var:=mapply(f, a, b)]
dt
#     a    b  new_var    <- new var IS what I want
# 1: foo baz baz, foo
# 2: bar qux bar, qux

2 Answers 2

7

Just apply down rows:

apply(df,1,function(x){
  paste(sort(x),collapse = ",")
})

Wrap it in a function if you want. You'll either have to define which columns to send or assume all. i.e. apply(df[ ,2:3],1,f()...

sort(x) is the same as x[order(x)]

Sign up to request clarification or add additional context in comments.

Comments

4

My first thought would've been to do this:

dt[, new_var := paste(sort(.SD), collapse = ", "), by = 1:nrow(dt)]

But you could make your function work with a couple of simple modifications:

f = function(...) paste(c(...)[order(c(...))],collapse=", ")

dt[, new_var := do.call(function(...) mapply(f, ...), .SD)]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.