How to optimize a code by avoiding loops?

Question

I have some troubles with a code which take a huge amount of time to run. Could someone give me some help? Thanks in advance!

all_dist=c()
ddim=dim(b)[1]
ddimi=ddim-1

for (k in 1:dim(b)[2]){
    for (i in seq(1,ddimi,1)){
        for (j in seq(i+1,ddim,1)){
        ze=(b[i,k])-(b[j,k])*(b[i,k])-(b[j,k])
        all_dist=c(all_dist,ze)
}}}

Note that:

str(b)

num [1:5, 1:30007] -0.000292 -0.001384 -0.001412 -0.002603 -0.002848 ... - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ : chr [1:30007] "V1" "V2" "V3" "V4" ...

If your code is working then this will be better asked on codereview.stackexchange.com — B001ᛦ
– B001ᛦ, Commented Jul 8, 2019 at 10:02

user10191355 · Accepted Answer · 2019-07-08 13:32:49Z

1

For-loops and growing your vector with c() are slowing you down. It's best to try to take advantage of vectorization, and to use *apply (or map) functions as much as possible. Here's something that does a little of both by iterating over the columns with sapply, creating combinations and computing the products and differences of those combinations:

mat <- sapply(b, function(x) {y <- combn(x, 2); y[1,] - y[2,] * y[1,] - y[2,]})

It should be fast – maybe not quite as fast as user10488504's very efficient solution, but still pretty fast. It also has very tight syntax, and you might also find it useful that the output is a matrix, with each column corresponding to a column from b.

Data:

set.seed(12345)
b <- as.data.frame(matrix(runif(5*30007, -.001, -.0003), byrow = T, nrow = 5))

answered Jul 8, 2019 at 13:32

user10191355

Sign up to request clarification or add additional context in comments.

Comments

GKi · Accepted Answer · 2019-07-09 09:17:51Z

set.seed(0)
b <- matrix(rnorm(5*30007), nrow=5)

all_dist=c()
ddim=dim(b)[1]
ddimi=ddim-1

system.time(
#With foor-Loop
for (k in 1:dim(b)[2]){
    for (i in seq(1,ddimi,1)){
        for (j in seq(i+1,ddim,1)){
        ze=(b[i,k])-(b[j,k])*(b[i,k])-(b[j,k])
        all_dist=c(all_dist,ze)
        }}}
)
#       User      System verstrichen 
#    104.568       3.636     108.206 


#Vectorized with matrix indices
system.time({
K <- 1:dim(b)[2]     #for (k in 1:dim(b)[2]){... creates this vector
I <- seq(1,ddimi,1)  #for (i in seq(1,ddimi,1)){... creates this vector
J <- unlist(lapply(I+1, function(x) seq(x,ddim,1)))  #for (j in seq(i+1,ddim,1)){... creates this vector

IK <- as.matrix(expand.grid(I, K))  #Get all combinations between I and K as you will have with the nested for loops of k and i
IK <- IK[rep(seq_len(nrow(IK)), rep((ddim-1):1,length.out=nrow(IK))),]  #IK-rows need to be repeated, as it is used repeatedly in the "for (j in seq(i+1,ddim,1)){" loop
JK <- as.matrix(expand.grid(j=J, k=K)) #Get all combinations between J and K as you will have with the nested for loops of k and j

#Now you have all the indexes of your for loop as vectors and can make the calculations
tt <- b[IK] - b[JK] * b[IK] - b[JK]
})
#      User      System verstrichen 
#      0.056       0.000       0.097 


identical(all_dist, tt)
#[1] TRUE

As you are using k only on the left side without interaction with the other loops you can partly vectorize by simply leaving the k loop and the index away.

system.time({
tt=c()
for (i in seq(1,ddimi,1)){
  for (j in seq(i+1,ddim,1)){
    tt=c(tt, (b[i,])-(b[j,])*(b[i,])-(b[j,]))
  }}
dim(tt)  <- c(30007, 10)
tt <- as.vector(t(tt))
})
#       User      System verstrichen 
#      0.017       0.000       0.017 
identical(all_dist, tt)
#[1] TRUE

Or you can replace the inner two for loops with index vectors and make an apply loop instead of the k-for loop:

system.time({
I <- seq(1,ddimi,1)
J <- unlist(lapply(I+1, function(x) seq(x,ddim,1)))
I <- I[rep(seq_along(I), rep((ddim-1):1,length.out=length(I)))]
tt  <- as.vector(apply(b, 2, function(x) {x[I] - x[J] * x[I] - x[J]}))
})
#       User      System verstrichen 
#      0.085       0.000       0.085 
identical(all_dist, tt)
#[1] TRUE

Used time of the nice solution from gersht:

system.time({
mat <- as.vector(sapply(as.data.frame(b), function(x) {y <- combn(x, 2); y[1,] - y[2,] * y[1,] - y[2,]}))
})
#       User      System verstrichen 
#      1.083       0.000       1.082 
identical(all_dist, mat)
#[1] TRUE

Collectives™ on Stack Overflow

How to optimize a code by avoiding loops?

2 Answers 2

Data:

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Data:

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related