1

I have 2 very large sets of data that, due to the limitations of my environment, I need to compare in the client side.

The size of the corresponding Array of objects are over 450k each, I have been testing different ways to compare them (For loops, .find, .indexOf, .reduce, $.grep) and all of them are running very slow (Around 700 calculations per minute).

The check consists to find out if each of the objects in one of the array is already included in the other one such as:

var Arr1 = [{ID:2, Name: Bar}, {ID:1, Name: Foo}]
var Arr2 = [{ID:2, Name: Fu}, {ID:2, Name: Bar}] 

If any of the objects in Arr2 is included in the first one by any property, in this case (Arr2[1].Name == Arr1[0].Name)? would return true

And in that case I would push it to a new Array of objects we can name Found: Found.push(Arr1[0])

I of course need to perform this check for all the 400k+ objects in my array so it gets pretty slow.

I know there are several "buts" in my request, such as available RAM and Processor speed but assuming the perfect environment, what would be the fastest way?

6
  • Could you give an example of inputs and expected output? Commented Aug 8, 2019 at 15:14
  • Just curious, do you need to keep these arrays as 400k+ or can you break them down into some smaller logical groupings? I'm also curious what the limitations are of your environment... some pre-processing on the server side might really help. Commented Aug 8, 2019 at 15:14
  • I think your task is for Backend side Commented Aug 8, 2019 at 15:17
  • Yes, sure give me one minute and I can get some sample data. AS for the other questions, I am working with extracts coming directly from my client's database and they refuse to perform any change in their side to reduce the load from mine, I can't set another DB in the middle to help with this due to economical limitations. And yes, we can divide the arrays into as many chunks as we want, I tried dividing them but the results were pretty much the same as an overall procedure. Commented Aug 8, 2019 at 15:21
  • Just a side question: is the data coming from the server or are you generating it on the client side? If so, are elements ordered already? Is that the only format you can acquire? Commented Aug 8, 2019 at 15:25

1 Answer 1

5

I think the most important thing is making sure your complexity doesn't go to O(n * m) (n being the length of Arr1, and m being the length of Arr2).

Looping over the second array and using indexOf or find on the first one, will give you the worst case of m * n operations (if none of the items in Arr2 appear in Arr1).

Therefore, you should create an index of Arr2 first, to ensure your lookups when going over Arr1 are inexpensive.

The hard part is determining how to index your array to support fast access. One way is to create a hash function:

// Include the properties that determine equality in this hash function
const hash = ({ Name, Results }) => `${Name}|${Results}`;

console.log(
  hash({ Name: "john.doe", Results: "Check", Timestamp: "-", Period: "Q2" })
);

Using this method, you can create an index of { string: Object } by going over all items in Arr2 once.

const hash = ({ Name, Results }) => `${Name}|${Results}`;
const arr2 = [
  { Name: "john", Results: "Check", Timestamp: "-", Period: "Q2" },
  { Name: "jane", Results: "Check", Timestamp: "-", Period: "Q2" },
  { Name: "aisha", Results: "Check", Timestamp: "-", Period: "Q2" }
];

console.log(
  Object.fromEntries(arr2.map(x => [hash(x), x])) 
);

Note: depending on the javascript engine, it might be better to rewrite this using a for or while loop. Creating the entry-array first will also consume some memory. Here, I'm just trying to explain the general approach.


Using this index, finding a match to an element of Arr2 will be (almost?) of constant time complexity.

const hash = ({ Name, Results }) => `${Name}|${Results}`;
const arr2 = [
  { Name: "john", Results: "Check", Timestamp: "-", Period: "Q2" },
  { Name: "jane", Results: "Check", Timestamp: "-", Period: "Q2" },
  { Name: "aisha", Results: "Check", Timestamp: "-", Period: "Q2" }
];

const arr1 = [
  { Name: "john", Results: "Check", Timestamp: "-", Period: "Q2" },
  { Name: "jane", Results: "Check", Timestamp: "-", Period: "Q2" },
  { Name: "aisha", Results: "Check", Timestamp: "-", Period: "Q2" },
  { Name: "robert", Results: "Check", Timestamp: "-", Period: "Q2" },
  { Name: "ellen", Results: "Check", Timestamp: "-", Period: "Q2" },
  { Name: "tin", Results: "Check", Timestamp: "-", Period: "Q2" }
];


const index = Object.fromEntries(arr2.map(x => [hash(x), x]));

const results = arr1.filter(p => index.hasOwnProperty(hash(p)));

console.log(`In both arrays: ${results.map(p => p.Name).join(", ")}`);

I'm no computer science graduate, but I think this will bring you close to O(n + m) complexity, which should be doable for 2 x 450k items?


P.S. If Object.fromEntries, map and filter slow things down, you can rewrite to:

const arr2 = [
  { Name: "john", Results: "Check", Timestamp: "-", Period: "Q2" },
  { Name: "jane", Results: "Check", Timestamp: "-", Period: "Q2" },
  { Name: "aisha", Results: "Check", Timestamp: "-", Period: "Q2" }
];

const arr1 = [
  { Name: "john", Results: "Check", Timestamp: "-", Period: "Q2" },
  { Name: "jane", Results: "Check", Timestamp: "-", Period: "Q2" },
  { Name: "aisha", Results: "Check", Timestamp: "-", Period: "Q2" },
  { Name: "robert", Results: "Check", Timestamp: "-", Period: "Q2" },
  { Name: "ellen", Results: "Check", Timestamp: "-", Period: "Q2" },
  { Name: "tin", Results: "Check", Timestamp: "-", Period: "Q2" }
];


const index = {};
for (let i = 0; i < arr2.length; i += 1) {
  const item = arr2[i];
  index[`${item.Name}|${item.Results}`] = item;
}

const results = [];
for (let i = 0; i < arr1.length; i += 1) {
  const item = arr1[i];
  const match = index[`${item.Name}|${item.Results}`];
  if (match) {
    results.push(match);
  }
}

console.log(`In both arrays: ${results.map(p => p.Name).join(", ")}`);

Sign up to request clarification or add additional context in comments.

1 Comment

This sounds very promising! I will be giving it a shot and will mark it as answered if it works! Thank you so much.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.