3

Given an array containing N points find the K closest points to the origin (0, 0) in the 2D plane. You can assume K is much smaller than N and N is very large.

E.g:

    given array: (1,0), (3,0), (2,0), K = 2 
        Result = (1,0), (2,0)  

(result should be in ascending order by distance)

Code:

import java.util.*;

class CPoint {
    double x;
    double y;
    public CPoint(double x, double y) {
        this.x = x;
        this.y = y;
    }
}

public class KClosest {
    /**
     * @param myList: a list of myList
     * @param k: the number of closest myList
     * @return: the k closest myList
     */
    public static CPoint[] getKNearestPoints(CPoint[] myList, int k) {

        if (k <= 0 || k > myList.length)  return new CPoint[]{};                                
        if (myList == null || myList.length == 0 )  return myList; 

        final CPoint o = new CPoint(0, 0); // origin point

        // use a Max-Heap of size k for maintaining K closest points
        PriorityQueue<CPoint> pq = new PriorityQueue<CPoint> (k, new Comparator<CPoint> () {
            @Override
            public int compare(CPoint a, CPoint b) {
                return Double.compare(distance(b, o), distance(a, o));  
            }
        });

        for (CPoint p : myList) {   // Line 33
            // Keep adding the distance value until heap is full. // Line 34
            pq.offer(p);            // Line 35
            // If it is full        // Line 36
            if (pq.size() > k) {    // Line 37
                // Then remove the first element having the largest distance in PQ.// Line 38
                pq.poll();          // Line 39  
            }  // Line 40
        }       
        CPoint[] res = new CPoint[k];
        // Then make a second pass to get k closest points into result. 
        while (!pq.isEmpty()) {     // Line 44
            res[--k] = pq.poll();   // Line 45                   
        }                           // Line 46

        return res;
    }

    private static double distance(CPoint a, CPoint b) {        
        return (a.x - b.x) * (a.x - b.x) + (a.y - b.y) * (a.y - b.y);
    }

}

Question:

  1. What is time complexity for line 35, line 39, independently and separately?

  2. What is time complexity for line 35 - 40 (As a whole) ?

  3. What is time complexity for line 44 - 46 (As a whole) ?

  4. What is overall time complexity for entire method getKNearestPoints(), in best, worst and average case? What if n >> k ? and what if we don't have n >> k ?

Actually these questions are a couple of question during my technical interview, but I'm still kinda confused on it. Any help is appreciated.

3
  • 2
    This reads a lot like homework. What exactly are you confused by? What do you think the answer is and why? Commented Oct 8, 2017 at 5:00
  • I answered following during interview: Q1: log(K), log(K), Q2: log(K) or log(K) ^ 2 Q3: klog(K) Q4: NlogK + klogK but overall is NlogK? i'm not sure Commented Oct 8, 2017 at 5:03
  • 1
    I knew for PQ, all operations like add/offer, remove/poll takes OlogK, except peek is O1. but for these questions specifically. I'm really kinda lost.. Commented Oct 8, 2017 at 5:14

1 Answer 1

4

From the looks of it, I think the person who has written this code must be knowing the answer to these questions.

Anyways, Priority Queue here is based on Max Heap implementation.

So, complexities are as follows:

Line 35 - O(log k) The time to insert an element in the heap. Bottom up approach is followed in the heap at the time of insertion.

Line 37 - O(1), The time to check the size of the heap, generally it is maintained along with the heap.

Line 39 - O(log k), The time to remove the head of the heap, the heapify approach at the root of the heap is applied to remove the top of the heap.

Line 35-40: From the above complexities we can see that the overall complexity of one iteration will be O(log k). This loop runs for n elements, so the overall complexity will be O(n log k).

Line 44-46: The complexity of checking the size of the heap is again O(1), and polling is O(log k). So we are doing polling k times. The overall complexity of the loop will be O(k log k).

Overall complexity will remain O(n log k).

This is an awesome place to study this topic.

Sign up to request clarification or add additional context in comments.

5 Comments

Hi! this answer is really helpful. But I'm still kinda confused on Line 35-40. Say when this PQ is full, then there would be (n - k) times, pq.offer(p); and pq.poll(); should be executed together. That's should be O(logk) + O(logk) ,right? but why we still consider it as a O(logk) runtime?
Ok, to put it up mathematically, O(logk)+O(logk) = O(2logk)=O(logk^2)=O(logk), I mean they can be written in all those ways.
That makes perfect sense! Thanks! Just one more question, why we can "drop" the time of "make a second pass to get k closest points into result". (klogk)? Overall is O(nlogK), but not O(nlogk) + O(klogk)
Oh! is that because of N >> K, so it can be dropped. I got it. Thanks so much!
Yes, kind of, that is the reason.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.