Java: distance metric algorithm design

Question

i am trying to work out the following problem in Java (although it could be done in pretty much any other language):

I am given two arrays of integer values, xs and ys, representing dataPoints on the x-axis. Their length might not be identical, though both are > 0, and they need not be sorted. What I want to calculate is the minimum distance measure between two data set of points. What I mean by that is, for each x I find the closest y in the set of ys and calculate distance, for instance (x-y)^2. For instance:

xs = [1,5]
ys = [10,4,2]

should return (1-2)^2 + (5-4)^2 + (5-10)^2

Distance measure is not important, it's the algorithm I am interested in. I was thinking about sorting both arrays and advance indexes in both of these arrays somehow to achieve something better than bruteforce (for each elem in x, scan all elems in ys to find min) which is O(len1 * len2).

This is my own problem I am working on, not a homework question. All your hints would be greatly appreciated.

In your example, it actually looks as if for each y you find the closest x -- you probably mean that for each element of the larger set you find the closest element of the smaller set, since you seem to expect there to be as many terms in the distance calculation as there are elements in the larger set. — High Performance Mark
– High Performance Mark, Commented Jun 11, 2012 at 13:13
Yes, it is larger set i compare against the smaller set. Any ideas how to make it better than O(len1*len2)? — Bober02
– Bober02, Commented Jun 11, 2012 at 13:21

brimborium · Accepted Answer · 2012-06-11 13:48:29Z

2

I assume that HighPerformanceMark (first comment on your question) is right and you actually take the larger array, find for each element the closest one of the smaller array and sum up some f(dist) over those distances.

I would suggest your approach:

Sort both arrays 
indexSmall=0 

// sum up
for all elements e in bigArray {
  // increase index as long as we get "closer"
  while (dist(e,smallArray(indexSmall)) > dist(e,smallArray(indexSmall+1)) {
    indexSmall++
  }
  sum += f(dist(e,smallArray(indexSmall)));
}

which is O(max(len1,len2)*log(max(len1,len2))) for the sorting. The rest is linear to the larger array length. Now dist(x,y) would be something like abs(x-y), and f(d)=d^2 or whatever you want.

answered Jun 11, 2012 at 13:48

brimborium

9,56210 gold badges50 silver badges77 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Attila Over a year ago

You will need to ensure that indexSmall+1 does not overindex smallArray

brimborium Over a year ago

of course, this is only pseudocode... But this is a helpful hint for the implementor... ;)

jeff · Accepted Answer · 2012-06-11 13:31:52Z

1

You're proposed idea sounds good to me. You can sort the lists in O(n logn) time. Then you can do a single iteration over the longer list using a sliding index on the other to find the "pairs". As you progress through the longer list, you will never have to backtrack on the other. So now your whole algorithm is O(n logn + n) = O(n logn).

answered Jun 11, 2012 at 13:31

jeff

4,33319 silver badges27 bronze badges

Comments

NPE · Accepted Answer · 2012-06-11 13:37:09Z

1

Your approach is pretty good and has O(n1*log(n1)+n2*log(n2)) time complexity.

If the arrays has different lengths, another approach is to:

sort the shorter array;
traverse the longer array from start to finish, using binary search to locate the nearest item in the sorted short array.

This has O((n1+n2)*log(n1)) time complexity, where n1 is the length of the shorter array.

answered Jun 11, 2012 at 13:37

NPE

503k114 gold badges970 silver badges1k bronze badges

Collectives™ on Stack Overflow

Java: distance metric algorithm design

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related