0

i am trying to work out the following problem in Java (although it could be done in pretty much any other language):

I am given two arrays of integer values, xs and ys, representing dataPoints on the x-axis. Their length might not be identical, though both are > 0, and they need not be sorted. What I want to calculate is the minimum distance measure between two data set of points. What I mean by that is, for each x I find the closest y in the set of ys and calculate distance, for instance (x-y)^2. For instance:

xs = [1,5]
ys = [10,4,2]

should return (1-2)^2 + (5-4)^2 + (5-10)^2

Distance measure is not important, it's the algorithm I am interested in. I was thinking about sorting both arrays and advance indexes in both of these arrays somehow to achieve something better than bruteforce (for each elem in x, scan all elems in ys to find min) which is O(len1 * len2).

This is my own problem I am working on, not a homework question. All your hints would be greatly appreciated.

2
  • 1
    In your example, it actually looks as if for each y you find the closest x -- you probably mean that for each element of the larger set you find the closest element of the smaller set, since you seem to expect there to be as many terms in the distance calculation as there are elements in the larger set. Commented Jun 11, 2012 at 13:13
  • Yes, it is larger set i compare against the smaller set. Any ideas how to make it better than O(len1*len2)? Commented Jun 11, 2012 at 13:21

3 Answers 3

2

I assume that HighPerformanceMark (first comment on your question) is right and you actually take the larger array, find for each element the closest one of the smaller array and sum up some f(dist) over those distances.

I would suggest your approach:

Sort both arrays 
indexSmall=0 

// sum up
for all elements e in bigArray {
  // increase index as long as we get "closer"
  while (dist(e,smallArray(indexSmall)) > dist(e,smallArray(indexSmall+1)) {
    indexSmall++
  }
  sum += f(dist(e,smallArray(indexSmall)));
}

which is O(max(len1,len2)*log(max(len1,len2))) for the sorting. The rest is linear to the larger array length. Now dist(x,y) would be something like abs(x-y), and f(d)=d^2 or whatever you want.

Sign up to request clarification or add additional context in comments.

2 Comments

You will need to ensure that indexSmall+1 does not overindex smallArray
of course, this is only pseudocode... But this is a helpful hint for the implementor... ;)
1

You're proposed idea sounds good to me. You can sort the lists in O(n logn) time. Then you can do a single iteration over the longer list using a sliding index on the other to find the "pairs". As you progress through the longer list, you will never have to backtrack on the other. So now your whole algorithm is O(n logn + n) = O(n logn).

Comments

1

Your approach is pretty good and has O(n1*log(n1)+n2*log(n2)) time complexity.

If the arrays has different lengths, another approach is to:

  1. sort the shorter array;
  2. traverse the longer array from start to finish, using binary search to locate the nearest item in the sorted short array.

This has O((n1+n2)*log(n1)) time complexity, where n1 is the length of the shorter array.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.