Check similarity of array values

Question

I got an array that with different values and I'd like to calculate a percentage value that symbolizes the similarity of all it's elements using maybe a threshold property.

An array could look like this:

var array = [42.98, 42.89, 42.91, 42.98, 42.88] // should return nearly 100%

var array = [42.98, 22.89, 42.91, 42.98, 42.88] // should return maybe 80%

var array = [42.98, 332.89, 122.91, 5512.98, -12.88] // should return nearly 0%

So 100% stands if all elements are the same ... and 0 % stands for the case the elements are way different. The adjustment is set by editing the threshold

I do not really know how to solve the problem (I'm an absolutely newbie) - however this is all I got so far and obviously it is not working that way:

function checkSimilarity(array, threshold) {
    var sum = array.reduce((a, b) => a + b, 0),
        percentage = 0;
    for (var i =0; i< array.length; i++) {
       var diff = (sum / array.length) * i
       percentage += diff

    }
    return percentage * (threshold/100)
}

Any help how to solve my problem of creating a working algorithm would be very appreciated!

The for ... in loop gives you the indexes into the array, not the value. Don't use for ... in for plain arrays; use for ... of, or a simple for loop with an index variable, or .forEach() — Pointy
– Pointy, Commented Mar 12, 2018 at 22:39
seems you want something like standard deviation (then a little math to change 0 SD to 100%, )- though, not sure how you expect the threshold to change things — Jaromanda X
– Jaromanda X, Commented Mar 12, 2018 at 23:38

Jack Dalton · Accepted Answer · 2018-03-12 23:54:15Z

var array1 = [42.98, 42.89, 42.91, 42.98, 42.88] // should return nearly 100%
var array2 = [42.98, 22.89, 42.91, 42.98, 42.88] // should return maybe 80%
var array3 = [42.98, 332.89, 122.91, 5512.98, -12.88] // should return nearly 0%

function calculateRange(data) {
	var disimilarity;
	var sum = data.reduce((a, b) => a + b, 0);
  var mean = sum / data.length
	
  // loop through passed array
  data.forEach(function(item, idx) {
  	
    // calculate percentage diff from mean
  	var percentageDiff = 100 - (item / mean * 100)
    
    // insure value is always positive
    if (percentageDiff < 0) {
        percentageDiff = -percentageDiff;
    }
    
    // mean aggrigate the diff value
    if(disimilarity) {
        disimilarity = (disimilarity + percentageDiff) / 2
    } else {
    	disimilarity = percentageDiff
    }
    
  })
    
   // subtract mean disimiliarty from 100%
   return 100 - disimilarity;
}

var array1DOM = document.getElementById("array1")
var array2DOM = document.getElementById("array2")
var array3DOM = document.getElementById("array3")

array1DOM.innerHTML = calculateRange(array1)
array2DOM.innerHTML = calculateRange(array2)
array3DOM.innerHTML = calculateRange(array3)

<div>
    <div id="array1"></div>
    <div id="array2"></div>
    <div id="array3"></div>
</div>

This solution in simple terms is aggregating the percentage difference from the mean value of the data set to determine accuracy. You will notice that the first two arrays give answers in the nearly 100% and 80% as requested. The issue arises with the final array. As this model is based on variation from the mean, the lack of correlation between values in array3 leads to such a high dissimilarity score that it is a negative value.

I cannot resolve this issue as I cannot guess what your maximum difference value is. If that value is known I can normalise values using it such that the range returned is 0 - 100. If you can never know the maximum difference, the only potential solutions I can suggest are :

Using my method as is, and noting the lower the score the less similar it is (in theory it can go on for a long time)
Flooring anything below 0 to 0
Calculating several data sets and then using the lowest scoring one as your 0, and the highest as your 100. That way you have a relative degree of similarity between sets.
Estimating what the highest level of dissimilarity could be and pass it into the function. ie what is the minimum array value, or maximum array value you will ever receive in this process.

If you could supply information on the purpose/context of this task we may be able to specify more.

Geuis · Accepted Answer · 2018-03-12 23:31:29Z

1

Slightly different approach. By no means meant to be the most efficient, but it does work for your sample data.

https://codepen.io/anon/pen/RMWjRL?editors=0010

const array1 = [42.98, 42.89, 42.91, 42.98, 42.88]; // should return nearly 100%
const array2 = [42.98, 22.89, 42.91, 42.98, 42.88]; // should return maybe 80%
const array3 = [42.98, 332.89, 122.91, 5512.98, -12.88]; // should return nearly 0%

const similarity = (arr) => {
  const dict = {};

  arr.forEach(item => {
    const val = Math.round(item);
    dict[val] ? dict[val]++ : dict[val] = 1;
  });

  let largest = 1;

  Object.keys(dict).forEach(key => largest = dict[key] > largest ? dict[key] : largest);

  return largest / arr.length;
};

console.log(similarity(array1)); // 1
console.log(similarity(array2)); // 0.8
console.log(similarity(array3)); // 0.2

answered Mar 12, 2018 at 23:31

Geuis

42.5k57 gold badges164 silver badges223 bronze badges

2 Comments

Jack Dalton Over a year ago

Interesting approach to the problem. I do want to note that this solution is length dependant. In that [50, 60] will get the result 0.5, as well as say [50. 99]. This method only determines the answer based on frequency of certain rounded values. So if accuracy, or a more "absolute" similarity value is required this answer will perhaps not work.

Geuis Over a year ago

@JackDalton Agreed.

wisn · Accepted Answer · 2018-03-12 23:35:52Z

0

I'm using Euclidean distance for this problem. However, I don't know this will satisfy your problem or not.

const similarity = list => {
  if (list.length < 1) return 0;
  if (list.length < 2) return 100;
  
  let listPair = [];
  for (let i = 0; i < list.length - 1; i++)
    listPair.push({ a: list[i], b: list[i + 1] });
  
  const sum = listPair.reduce((acc, { a, b }) => acc + Math.pow(a - b, 2), 0);
  
  const calculation = 100 - Math.sqrt(sum);
  
  return calculation < 0 ? 0 : calculation;
};

let list = [];
console.log(similarity(list)); // 0%

list = [42.98, 42.89, 42.91, 42.98, 42.88];
console.log(similarity(list)); // ~99%

list = [42.98, 22.89, 42.91, 42.98, 42.88];
console.log(similarity(list)); // ~71%

list = [10, 10, 10, 20, 10];
console.log(similarity(list)); // ~85%

list = [42.98, 332.89, 122.91, 5512.98, -12.88];
console.log(similarity(list)); // 0%

list = [45.51, 45.51, 45.51, 45.51, 45.51];
console.log(similarity(list)); // 100%

list = [10];
console.log(similarity(list)); // 100%

answered Mar 12, 2018 at 23:35

wisn

1,03411 silver badges17 bronze badges

2 Comments

Jack Dalton Over a year ago

Very nice approach, although I would like to see higher accuracy on the result values. the second last list, and last list are considered just as equal, despite having dissimilarity. Additionally 0% should not be returned for the fourth list. Overall though great approach!

wisn Over a year ago

@JackDalton thanks, Jack! I would love to improve it later.

Collectives™ on Stack Overflow

Check similarity of array values

3 Answers 3

Comments

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related