25

I was clustering around 40000 points using kmean algorithm. In the first version of the program I wrote the euclidean distance function like this

var euclideanDistance = function( p1, p2 ) { // p1.length === p2.length == 3
    var sum = 0;
    for( var i in p1 ){
        sum += Math.pow( p1[i] - p2[i], 2 );
    }
    return Math.sqrt( sum );
};

The overall program was quite slow taking on average 7sec to execute. After some profiling I rewrote the above function like this

var euclideanDistance = function( p1, p2 ) { // p1.length === p2.length == 3
    var sum = 0;
    for( var i = 0; i < p1.length; i++ ) {
        sum += Math.pow( p1[i] - p2[i], 2 );
    }
    return Math.sqrt( sum );
};

Now the programs on average take around 400ms. That's a huge time difference just because of the way I wrote the for loop. I normally don't use for..in loop for arrays but for some reason I used it while writing this function.

Can someone explain why there is this huge difference in performance between these 2 styles?

11
  • 2
    Please note that using for..in loops on arrays can behave different from a regular for loop. Commented Nov 30, 2012 at 13:11
  • 1
    for..in enumerates object keys, for loop increases integer and checks a simple condition .. isn't obvious? Commented Nov 30, 2012 at 13:12
  • p1 and p2 look like dense arrays of values. I would suspect the interpreter to optimize exactly this case. Also try pulling the querying of p1.length (you know it doesn't change, but the interpreter cannot assume that) out of the loop - should improve even more. Commented Nov 30, 2012 at 13:14
  • Duplicate of stackoverflow.com/questions/242841/javascript-for-in-vs-for Commented Nov 30, 2012 at 13:16
  • This answer here is explanatory: stackoverflow.com/questions/5263847/… Commented Nov 30, 2012 at 13:17

4 Answers 4

29

Look at what's happening differently in each iteration:

for( var i = 0; i < p1.length; i++ ) 
  1. Check if i < p1.length
  2. Increment i by one

Very simple and fast.

Now look at what's happening in each iteration for this:

for( var i in p1 )

Repeat

  1. Let P be the name of the next property of obj whose [[Enumerable]] attribute is true. If there is no such property, return (normal, V, empty).

It has to find next property in the object that is enumerable. With your array you know that this can be achieved by a simple integer increment, where as the algorithm to find next enumerable is most likely not that simple because it has to work on arbitrary object and its prototype chain keys.

Sign up to request clarification or add additional context in comments.

2 Comments

@DeletedComment, yeah only the first line in the specification is really relevant. What I am trying to say that running the complex algorithm of finding the next enumerable key in an object and its protoype chain is far less performant than an integer addition and a boolean check.
this is vague, not sure what's going on here
9

As a side note, if you cache the length of p1:

var plen = p1.length;
for( var i = 0; i < plen; i++ )

you will get a slight speed increase.

...And if you memoize the function, it will cache results, so if the user tries the same numbers you will see a massive speed increase.

var eDistance = memoize(euclideanDistance);  

function memoize( fn ) {  
    return function () {  
        var args = Array.prototype.slice.call(arguments),  
            hash = "",  
            i = args.length;  
        currentArg = null;  
        while (i--) {  
            currentArg = args[i];  
            hash += (currentArg === Object(currentArg)) ?  
            JSON.stringify(currentArg) : currentArg;  
            fn.memoize || (fn.memoize = {});  
        }  
        return (hash in fn.memoize) ? fn.memoize[hash] :  
        fn.memoize[hash] = fn.apply(this, args);  
    };  
}

eDistance([1,2,3],[1,2,3]);
eDistance([1,2,3],[1,2,3]); //Returns cached value

credit: http://addyosmani.com/blog/faster-javascript-memoization/

1 Comment

Why is the conditional object initialization happening inside the while loop?
6

First You should be aware of this in the case of for/in and arrays. No big deal if You know what You are doing.

I run some very simple tests to show the difference in performance between different loops: http://jsben.ch/#/BQhED

That is why prefer to use classic for loop for arrays.

1 Comment

In both FF and Chrome for loop works well without caching. However Opera is different story. Version without caching is 2 times slower in that browser...
0

The For/In loop, simply loops through all properties in an object. Since you are not specifying the number of iterations the loop needs to take, it simply 'guesses' at it, and continues on until there are no more objects.

With the second loop, you are specifying all possible variable... a)a starting point, b) the number of iterations the loop should take before stopping, c) increasing the count of the starting point.

You can think of it this way... For/In = guesses the number of iterations, For(a,b,c) you are specifying

2 Comments

so, the for/in loop already "knows" the number of iterations it needs to make? (sorry, I don't speak technical... too much time spent talking to clients and dumming it down for them)
yep, it does. Internally there is a property list storing all the properties of a object, and there used to have __count__ property for the number of enumerable properties in Mozilla's implementation. :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.