1

I got an array that with different values and I'd like to calculate a percentage value that symbolizes the similarity of all it's elements using maybe a threshold property.

An array could look like this:

var array = [42.98, 42.89, 42.91, 42.98, 42.88] // should return nearly 100%

var array = [42.98, 22.89, 42.91, 42.98, 42.88] // should return maybe 80%

var array = [42.98, 332.89, 122.91, 5512.98, -12.88] // should return nearly 0%

So 100% stands if all elements are the same ... and 0 % stands for the case the elements are way different. The adjustment is set by editing the threshold

I do not really know how to solve the problem (I'm an absolutely newbie) - however this is all I got so far and obviously it is not working that way:

function checkSimilarity(array, threshold) {
    var sum = array.reduce((a, b) => a + b, 0),
        percentage = 0;
    for (var i =0; i< array.length; i++) {
       var diff = (sum / array.length) * i
       percentage += diff

    }
    return percentage * (threshold/100)
}

Any help how to solve my problem of creating a working algorithm would be very appreciated!

2
  • 1
    The for ... in loop gives you the indexes into the array, not the value. Don't use for ... in for plain arrays; use for ... of, or a simple for loop with an index variable, or .forEach() Commented Mar 12, 2018 at 22:39
  • seems you want something like standard deviation (then a little math to change 0 SD to 100%, )- though, not sure how you expect the threshold to change things Commented Mar 12, 2018 at 23:38

3 Answers 3

2

var array1 = [42.98, 42.89, 42.91, 42.98, 42.88] // should return nearly 100%
var array2 = [42.98, 22.89, 42.91, 42.98, 42.88] // should return maybe 80%
var array3 = [42.98, 332.89, 122.91, 5512.98, -12.88] // should return nearly 0%

function calculateRange(data) {
	var disimilarity;
	var sum = data.reduce((a, b) => a + b, 0);
  var mean = sum / data.length
	
  // loop through passed array
  data.forEach(function(item, idx) {
  	
    // calculate percentage diff from mean
  	var percentageDiff = 100 - (item / mean * 100)
    
    // insure value is always positive
    if (percentageDiff < 0) {
        percentageDiff = -percentageDiff;
    }
    
    // mean aggrigate the diff value
    if(disimilarity) {
        disimilarity = (disimilarity + percentageDiff) / 2
    } else {
    	disimilarity = percentageDiff
    }
    
  })
    
   // subtract mean disimiliarty from 100%
   return 100 - disimilarity;
}

var array1DOM = document.getElementById("array1")
var array2DOM = document.getElementById("array2")
var array3DOM = document.getElementById("array3")

array1DOM.innerHTML = calculateRange(array1)
array2DOM.innerHTML = calculateRange(array2)
array3DOM.innerHTML = calculateRange(array3)
<div>
    <div id="array1"></div>
    <div id="array2"></div>
    <div id="array3"></div>
</div>

This solution in simple terms is aggregating the percentage difference from the mean value of the data set to determine accuracy. You will notice that the first two arrays give answers in the nearly 100% and 80% as requested. The issue arises with the final array. As this model is based on variation from the mean, the lack of correlation between values in array3 leads to such a high dissimilarity score that it is a negative value.

I cannot resolve this issue as I cannot guess what your maximum difference value is. If that value is known I can normalise values using it such that the range returned is 0 - 100. If you can never know the maximum difference, the only potential solutions I can suggest are :

  • Using my method as is, and noting the lower the score the less similar it is (in theory it can go on for a long time)
  • Flooring anything below 0 to 0
  • Calculating several data sets and then using the lowest scoring one as your 0, and the highest as your 100. That way you have a relative degree of similarity between sets.
  • Estimating what the highest level of dissimilarity could be and pass it into the function. ie what is the minimum array value, or maximum array value you will ever receive in this process.

If you could supply information on the purpose/context of this task we may be able to specify more.

Sign up to request clarification or add additional context in comments.

Comments

1

Slightly different approach. By no means meant to be the most efficient, but it does work for your sample data.

https://codepen.io/anon/pen/RMWjRL?editors=0010

const array1 = [42.98, 42.89, 42.91, 42.98, 42.88]; // should return nearly 100%
const array2 = [42.98, 22.89, 42.91, 42.98, 42.88]; // should return maybe 80%
const array3 = [42.98, 332.89, 122.91, 5512.98, -12.88]; // should return nearly 0%

const similarity = (arr) => {
  const dict = {};

  arr.forEach(item => {
    const val = Math.round(item);
    dict[val] ? dict[val]++ : dict[val] = 1;
  });

  let largest = 1;

  Object.keys(dict).forEach(key => largest = dict[key] > largest ? dict[key] : largest);

  return largest / arr.length;
};

console.log(similarity(array1)); // 1
console.log(similarity(array2)); // 0.8
console.log(similarity(array3)); // 0.2

2 Comments

Interesting approach to the problem. I do want to note that this solution is length dependant. In that [50, 60] will get the result 0.5, as well as say [50. 99]. This method only determines the answer based on frequency of certain rounded values. So if accuracy, or a more "absolute" similarity value is required this answer will perhaps not work.
@JackDalton Agreed.
0

I'm using Euclidean distance for this problem. However, I don't know this will satisfy your problem or not.

const similarity = list => {
  if (list.length < 1) return 0;
  if (list.length < 2) return 100;
  
  let listPair = [];
  for (let i = 0; i < list.length - 1; i++)
    listPair.push({ a: list[i], b: list[i + 1] });
  
  const sum = listPair.reduce((acc, { a, b }) => acc + Math.pow(a - b, 2), 0);
  
  const calculation = 100 - Math.sqrt(sum);
  
  return calculation < 0 ? 0 : calculation;
};

let list = [];
console.log(similarity(list)); // 0%

list = [42.98, 42.89, 42.91, 42.98, 42.88];
console.log(similarity(list)); // ~99%

list = [42.98, 22.89, 42.91, 42.98, 42.88];
console.log(similarity(list)); // ~71%

list = [10, 10, 10, 20, 10];
console.log(similarity(list)); // ~85%

list = [42.98, 332.89, 122.91, 5512.98, -12.88];
console.log(similarity(list)); // 0%

list = [45.51, 45.51, 45.51, 45.51, 45.51];
console.log(similarity(list)); // 100%

list = [10];
console.log(similarity(list)); // 100%

2 Comments

Very nice approach, although I would like to see higher accuracy on the result values. the second last list, and last list are considered just as equal, despite having dissimilarity. Additionally 0% should not be returned for the fourth list. Overall though great approach!
@JackDalton thanks, Jack! I would love to improve it later.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.