0

I have an array of arrays and need to filter them to have unique based off the first 2 items in the arrays.

[
    ["18", "-66"],
    ["26", "-80"],
    ["26", "-80"],
    ["26", "-80"],
    ["27", "-80"],
    ["27", "-80"],
    ["28", "-97"],
    ["29", "-99"],
    ["30", "-81"],
    ["30", "-87"],
    ["30", "-99"],
    ["31", "-110"],
    ["31", "-110"],
    ["31", "-111"],
    ["31", "-84"],
    ["31", "-87"],
    ["31", "-95"],
    ["31", "-95"],
    ["31", "-95"]
]

While a function such as this does a great job with a single unique, I fail to see how to modify it to search the first two items:

function filterArray(incomingArray){
  var unique = {};
  var distinct = [];
  for( var i in incomingArray ){
   if( typeof(unique[incomingArray[i][0]]) == "undefined"){
    distinct.push([incomingArray[i][0],{}]);
   }
   unique[incomingArray[i][0]] = 0;
  }
  return distinct;
}

The entire array doesn't need to be unique, just the first two items, so for example the following would match:

[
    ["26", "-80", 2],
    ["26", "-80", 3]
]

## Update ##

I tried each suggested method and found some interesting things. First let me say thank you for each of your ideas / ways of solving the problem. In performance testing, I supplied 4000 records and did a performance test on each method.

Mark's method of using Set() with 4000 records finished in 19ms, when supplied only about 50 records it finished in 2ms.

D. Seah's method with with 4000 records finished in 149ms, but when supplied only 50 records finished in under 1ms.

vol7ron's method with 4000 records finished in 30ms, but oddly took 52ms with 50 records

Ben's method I am still working through how it works and so far haven't had it return what I expected, but with it returning a 3 dimensional array, I am interested to see what possibilities there are in other applications, but it took around 38 seconds and I ended up with a very large array of undefined. It is possibly something I am doing wrong, but at least in this setting, it may be more capable than I need, and since the other solutions are fast enough then I may table it till next time.

Using Set() looks like the best way to go for a performance dependent solution where the dataset may keep growing, but in a smaller dataset vol7tron wins.

4 Answers 4

1

If you can find a character you know won't be in your data, you can use that to join your first two elements into a key. For example 18_-66. Then you can keep a Set of known values and use it to filter. Key lookups in Sets are constant time, so this should run linear time in n. For example:

let arr = [["18", "-66"],["26", "-80"],["26", "-80"],["26", "-80"],["27", "-80"],["27", "-80"],["28", "-97"],["29", "-99"],["30", "-81"],["30", "-87"],["30", "-99"],["31", "-110"],["31", "-110"],["31", "-111"],["31", "-84"],["31", "-87"],["31", "-95"],["31", "-95"],["31", "-95"]]

let s = new Set()
let f = arr.filter(item => {
    let key = item[0] + '_' +item[1]  // join elements 1 and 2 with _
    return !s.has(key) && s.add(key)  // add to set and return true if it's not already there
})
console.log(f)

Sign up to request clarification or add additional context in comments.

4 Comments

I think I like this answer the best. I always forget to use Set, maybe because of its lack of backwards compatibility in IE? Probably not, since that's less of a reason these days. Probably due to lack of exercise in thinking about them :P
@vol7ron the same technique should work about as well using a regular object.
Thanks Mark, I also haven't been accustomed to using Set() but am looking forward to trying it. The data is XY coordinates, so any letter would be a great separator, I will give it a try :)
@MarkMeyer yes, I had provided a regular object solution before coming across your answer. Using Set and filter is why this should be the accepted solution.
1

The following function will iterate through the array twice, and if the first two items match, it will push an array containing the two items to an output array.

function filterArray(arr) {
  var matching = []

  for (i in arr) {
    for (j in arr) {
      if (arr[i][0] === arr[j][0] && arr[i][1] === arr[j][1]) {
        matching.push([arr[i], arr[j]])
      }
    }
  }

  for (k in matching) {
    matching[k] = [matching[k][0][0], matching[k][0][1], matching[k].count]
  }

  return matching
}

11 Comments

This seems to return a 3 dimensional array.
Actually it will iterate through the array n-squared times. And you're also comparing items to themselves when i and j refer to the same index. If you want to use nested for loops, you'd be better off using an n-1 technique: for(var i=0;i<arr.length;i++) for(var j=i+1;j<arr.length;j++) { ... }
That will only look at the item and the one directly after it.
@MarkMeyer mine now returns an array of matching pairs formatted as [<first element>, <second element>, <amount of matches>]
@Alan if performance is a necessity, you need to specify that. Nothing is going to beat a for loop and a good algorithm.
|
1

The easy way to manage is to keep a cache variable of your previously processed list. If while evaluating your array it is already in the cache list, don't store it. If it is not in the cache, then keep it and make an entry in the cache.

let cache = {};
let data = getData();
let unique = data.reduce((acc,curr)=>{
  let key = [curr[0],curr[1]].join(',')
  if (!cache[key]){
    cache[key] = true;
    acc.push(curr)
  }
  return acc;
},[])

console.log('unique: ', prettyPrint(unique))
console.log('cache: ', prettyPrint(cache))

function getData(){ return [
    ["18", "-66"],
    ["26", "-80"],
    ["26", "-80"],
    ["26", "-80"],
    ["27", "-80"],
    ["27", "-80"],
    ["28", "-97"],
    ["29", "-99"],
    ["30", "-81"],
    ["30", "-87"],
    ["30", "-99"],
    ["31", "-110"],
    ["31", "-110"],
    ["31", "-111"],
    ["31", "-84"],
    ["31", "-87"],
    ["31", "-95"],
    ["31", "-95"],
    ["31", "-95"]
]}

function prettyPrint(data){
   if(Array.isArray(data)){
     return '\n' + JSON.stringify(data,null,2)
                     .replace(/,\n\s+[^[\s]/g, ',')
                     .replace(/(\[)\s+([^[\s])/g, '$1$2')
                     .replace(/"\s+/g,'"')
   }
   else if (typeof data === 'object')
     return '\n' + JSON.stringify(data,null,2)
   else 
     return data; 
   
}

  • let key = [curr[0],curr[1]].join(',') is part of the magic. It joins the two values by a delimiter so only one lookup is performed. This value is easier to see in the cache output

Comments

0

you can use a reduce function

var oDistinct = oData.reduce(function (acc, cur) {
  if (!acc.some(function(i) {
    return i[0] === cur[0] && i[1] === cur[1];
  })) {
    acc.push(cur);
  }
  return acc;
}, []);

2 Comments

I haven't ever seen oData before, I am going to look into it to compare it with JSON for performance / readability
I think oData is his Hungarian notation for your data. In this case data is an object denoted by the prefix “o”. Perhaps it should be “aData”?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.