2

I have downloaded data from lending club and loaded it into R using data.table's fread() function.

For each row, I would like data.table to collect all of the information from all of the columns and put it into a single string, in the most efficient manner possible. My current function works, but think that this is probably quite slow and could deal with some improvement from some data.table experts on SO.

foo <- function(y, dt_obj, col_names=colnames(dt_obj)){
   paste0("http://localhost:8080/predict?",
          paste0(col_names,"=",unlist(dt_obj[y,],use.names=FALSE),
                 collapse="&")
          )
 }

In the above function, y is the row number, the dt_obj is the csv data that was read into R using fread.

I then go through each row and add in the data to my original data.table object dt using the following line

dt[,strg:=sapply(seq(nrow(dt)),function(x){foo(x,dt_obj=dt)})]

However this seems to take a while and believe that the speed could be improved if a more efficient foo function were created or data.table was used in a more efficient manner...

As always any help would be greatly appreciated...

4
  • I have a very naive suggestion. Did you try removing the column delimiter while reading the CSV? Rows should be read correctly, and the commas separating the fields would be considered as text. Commented Nov 4, 2015 at 14:15
  • your link is blocked from my location. I posted sample data if it helps. Commented Nov 4, 2015 at 14:34
  • Thanks for the suggestion, quite neat...but unfortunately, for my situation, the csv data is just an example...my data is already in R which is normally read in using readRDS as it is an RDS format. Commented Nov 4, 2015 at 14:39
  • Could you add an example dataset and output? Commented Nov 4, 2015 at 15:47

1 Answer 1

1

I think you are using data.table for something that isn't using its unique strengths. Here's a straightforward matrix method with base R that took 3.5 seconds when I downloaded the dataset from Lending Club:

system.time({
mat <- as.matrix(dt)
a <- apply(mat, 1, function(x) paste(colnames(mat), unlist(x), sep="="))
newvec <- paste0("http://localhost:8080/predict?", apply(a, 2, paste, collapse="&"))
})
#   user  system elapsed 
#   3.50    0.03    3.54 

#compare to your original function
system.time(
+ dt[,strg:=sapply(seq(nrow(dt)),function(x){foo(x,dt_obj=dt)})]
+ )
#   user  system elapsed 
# 135.45    0.03  136.02

all.equal(newvec[1], dt[1,strg])
#[1] TRUE
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for this, the use of matrices is good, but I wanted to keep the character strings, which i think get converted in your example...
They are character strings. What conversion are you referring to?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.