2

When processing a very large in-memory array of uniform (same-type) JavaScript objects (each with just a few columns)...

Is there any impact on performance and/or or any other penalty to consider when choosing to iterate through it as row->column versus column->row?

Example

We have 100,000 data rows from a CSV file, each is an object with 10 integer values, and we need to touch every value in it for a certain calculation.

Will it make any difference whether we iterate through it vertically or horizontally? Does the modern V8 care about such things in the slightest?

var data = [...]; // array of many same-type objects;

// horizontal iteration:
for (var p in data[0]) {
    for (var i = 0; i < data.length; i++) {
        var value = data[i][p]; // current value;
        // calculate here from the value;
    }
}

// vertical iteration:
for (var i = 0; i < data.length; i++) {
    for (var p in data[0]) {
        var value = data[i][p]; // current value;
        // calculate here from the value;
    }
}
8
  • how can you iterate through an array by column without iterating through the rows on the inner loop? That doesn't make sense to me. The notion of column isn't well defined in javascript. So, yes, any method that I can imagine for iterating "by column" would be much slower than by row. Can you show some code to explain what you mean? Commented Feb 13, 2016 at 3:17
  • @ChristianFritz: If the members (properties/indices) of the inner structure are consistent, that can be defined in the outer loop. If the inner structure is an Object and the outer is an Array, I think it can make a performance difference to have the object on the outside since for-in iteration is usually slower than for. Commented Feb 13, 2016 at 3:21
  • At a guess, I'd assume that by row is probably faster. Given that the data is probably constructed by row and not by column, then the likelihood that the data for a row is stored in a contiguous block of memory (IE: More likely to have the rest of the row's data in cache already), I'd guess that it would be faster. Just a thought, may be wrong, but is what I'd be thinking. Overall, wouldn't expect much difference. Commented Feb 13, 2016 at 3:21
  • @ChristianFritz, I added an example to your question. Commented Feb 13, 2016 at 3:21
  • I just saw the code edit, given that, and Squint's comment, I would choose the horizontal iteration, only need to loop for-in numColumns times, vs numColumns * numRows if using vertical. Commented Feb 13, 2016 at 3:23

3 Answers 3

2

Only one way to be sure is to run some benchmarks, I made a jsperf: http://jsperf.com/vertical-vs-horizontal-loop The results depends on the engine like expected. Early results from tests I did (Chrome 42 is Edge on Window 10):

|     UserAgent    | horizontal iteration | vertical iteration | vertical iteration with caching | # Tests |
|:----------------:|:--------------------:|:------------------:|:-------------------------------:|:-------:|
| Chrome 42.0.2311 |         1,067        |         287        |               226               |    2    |
|   Firefox 43.0   |         5,621        |         415        |               443               |    2    |
|      IE 11.0     |          976         |         441        |               313               |    2    |
|  Iron 46.0.2450  |         1,557        |         901        |              1,907              |    2    |
(numbers are ops/s, the higher the better)

Interestingly enough, horizontal iteration ranges from twice faster to dozen of times faster (on Firefox). But vertical iteration with caching is the fastest only on Iron 46 (Chromium fork so V8 engine).

Benchmarks with node v5.1.0:

horizontal iteration x 1,140 ops/sec ±1.11% (63 runs sampled)
vertical iteration x 833 ops/sec ±0.92% (68 runs sampled)
vertical iteration with caching x 1,678 ops/sec ±1.13% (67 runs sampled)
Fastest is vertical iteration with caching

Sign up to request clarification or add additional context in comments.

3 Comments

Your follow-up contradicts the result, i.e. you reversed it, by mistake, I believe. You meant to say vertical is faster, not horizontal.
@vitaly-t No, without caching, horizontal is faster in all browsers/engines, if you add caching, vertical is fastest only with V8. The numbers are ops/s not timing so the higher, the better :)
Got it. I wrongfully assumed the figures were delays, with smaller being better, but it is the other way round. Good test! Thank you!
0

Vertical in reverse is better for better memory utilisation and performance. I would say iterate through data array in reverse. And few small things to taken into consideration is incrementing i with i += 1 instead of i++ or ++i. But these would show very small performance benefits as you are iterating through a large array. If you ask me, I follow either of below approaches

  1. Use Async library's each function to iterate through data array and do operations. The advantages are they are truly asynchronous, so wont block the execution thread. I have used this library to compare two large arrays where each array operation includes drawing html to canvas and comparing. Async js did that job very well for me.
  2. Create worker (n) threads depending on user machine, then slice the large array into smaller chunks, feed workers with chunked arrays. Whichever worker completes first will pick another chunked array from queue. At the end you can aggregate the result. I have tested this approach myself. I tried to sort an array of 50K array items. With regular execution browser was blocked but with this approach it was able to complete perfectly fine. I even tried with 300K items. I would say, this is better approach if you are not supporting lower end browsers.

4 Comments

Why in reverse is better? Any link to read about it? I just want to understand why it is better.
"And few small things to taken into consideration is incrementing i with i += 1 instead of i++ or ++i. But these would show very small performance benefits" - Why on earth would +=1 give a performance benefit over ++?
@nnnnnn Check this link jsperf.com/for-loop-i-vs-i/4. Earlier Crockford's approach used to be faster. Now browsers seems to be improved in that area. And Zakas also mentioned to use i += 1 instead of i++ or ++i in his design patterns book.
Modern JS engines are pretty smart about that sort of thing. Crockford doesn't recommend +=1 for performance reasons though.
0

Vertical is obviously much better since you can cache data[i] and only look it up once before checking the properties.

// vertical iteration:
for (var i = 0; i < data.length; i++) {
    var obj = data[i] // <--- here
    for (var p in obj) {
        var value = obj[p]; // current value;
        // calculate here from the value;
    }
}

2 Comments

"obviously much better" is subjective and not objective. The cost of assigning a variable for each iteration of the outer loop most likely completely negates any performance benefit you actually get. Calling data[i] several times is more efficient than assigning it to a variable once and calling that variable several times.
@michael, how it can be "subjective" when it is plain and simple "not doing unnecessary operation" vs. "doing it repeatedly". "Not doing" will always objectively be faster. Eliminating work completely is the basics of optimization.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.