Optimal array setup

Question

Problem

I have a few arrays, each of 2000 numbers which are any one of 0, 1, or 2.

e.g:

array1 = [0,0,0,0,1,2,2,2,2,0,2,2,0,1,..etc]
array2 = [1,1,1,1,1,1,2,2,2,2,0,2,2,1,..etc]

I want to test for relatedness, so ideally each element in these arrays should have two properties:
- the data: 0, 1,or 2
- a pointer to another array (should the data match for the same index, or null)
I'd prefer not a direct copy, but it's a not a must.

My attempt 1

I thought of converting each element in the array into an Object

{
  data: 0,   // or 1 or 2
  pointer: other_array[some_index].data
}

(I'm hoping here that blah.data returns a pointer and not the data)

But this feels like it would slow down the retrieval of data, since they are now likely scattered about in non-contiguous memory.

My attempt 2

Instead of having an Object array I thought I'd convert each array into two associative arrays:

my_array = {
   data_array: [0,1,2,2,2,2,1,1,1,1,1,2,2,2],
   pter_array: [o_a.data_array[0], o_a.data_array[1], o_a.data_array[2], ... ]
};

(where o_a is some other_array)

This way it seems like the data_array at least will be more contiguous, and retrieval will be hopefully faster, but my pointer array might not be pointers.

Questions:

Which implementation is better? Is the second attempt any faster than the first?
JS seems to only have 64-bit numeric type. Do I have to use 64-bit floats to store my numbers when a byte would more than do?
- Given that my data is no more than 3 different values, would it be any more efficient to implement an enum (via bools?), or would that waste just as much space as the JS numeric type?

"But this feels like it would slow down the retrieval of data, since they are now likely scattered about in non-contiguous memory." If that's a factor for you, look at typed arrays; the standard JavaScript array you're using isn't an array at all. — T.J. Crowder
– T.J. Crowder, Commented Mar 5, 2015 at 12:09

user1919238 · Accepted Answer · 2015-03-05 12:10:57Z

2

The first attempt is the one that clearly makes sense: it logically groups related information. The second one does not.

This might be just an aesthetic issue, but it could become more than that depending on your use case. For example, what if you needed to delete or insert an element? Your second attempt would require duplicated operations to make sure both arrays remain in sync.

You are worried about performance, but it is not clear why. Have you actually created code that didn't perform well, or is this just speculation?

My advice is:

Use the logically correct design (attempt 1).
Address performance when it actually becomes an issue.

There is no particular reason to assume the object approach will be inefficient. You can make an array of these objects, so it doesn't imply everything will be "scattered in non-contiguous memory".

If performance is a problem, take a step back and make a wider consideration of the design than just this one issue.

answered Mar 5, 2015 at 12:10

user1919238

Sign up to request clarification or add additional context in comments.

1 Comment

tetris11 Over a year ago

I've implemented it the first way, but speed is a crucial factor (despite the choice of language). Deleting and inserting is not neccesary. But I think you're right. The scale of the project requires clarity, and my first method is quick enough. I just wish there were smaller numeric types

Collectives™ on Stack Overflow

Optimal array setup

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related