1

the problem statement is:

given an array A of N integers, returns the smallest positive integer (greater than 0) that does not occur in A.

For example, given A = [1, 3, 6, 4, 1, 2], the function should return 5.

Given A = [1, 2, 3], the function should return 4.

Given A = [−1, −3], the function should return 1.

Can anyone tell me what the time-complexity of the following solution to the code be:

def solution(A):
    m = max(A)
    if m < 1:
       return 1

    A = set(A)
    B = set(range(1, m + 1))
    D = B - A
    if len(D) == 0:
        return m + 1
    else:
        return min(D)

It has to be O(n) or greater because we are finding the max. I had a doubt if creating a set from range 1 to the max element will be nlogn as the result will be a sorted set of elements..?

4
  • 1
    Why all the complexity? Isn't eh answer just m+1? Commented Aug 26, 2021 at 18:26
  • 1
    no, it's not, a number could be skipped like the first example Commented Aug 26, 2021 at 18:30
  • 1
    You might get a better answer if you ask a more focused question. It seems like you're most interested in set(range(1, m + 1)), though I think set(A) and B - A are also very interesting for this question. Commented Aug 26, 2021 at 18:35
  • 1
    set(B) is not sorted, because sets are not sorted in python. Consider solution([1,2,3]) and solution([1,2,1_000_000]): 1.21µs vs 107ms. Your algorithm's runtime depends mostly on the largest number in your array, because of set(range(1, m+1)). Commented Aug 26, 2021 at 18:41

1 Answer 1

1

The time complexity of this solution is linear: (O(m + n)) where m is the maximum element in A and n is the length of the input array A).

Here's a break down:

  1. max(A) is O(n) since you have to look at each element of A.
  2. set(A) is O(n) since you have to look at each lement of A.
  3. set(range(1, m + 1)) is O(m) since you iterate over every element between 1 and m.

The result is O(m + n). While we can ignore any multiple of n or m (you don't say O(2n)), we can't ignore the minimum of n and m. The reason is that there is no guaranteed relation between n and m. They can be as different as 1 and 1e64, or vice versa.

On a sidenote, this algorithm is really really bad. You can solve this problem in O(n) time and O(n) additional space.

A solution with O(n) time and space complexity could be as follows:

def solve(A):
  seen = set()
  res = 1
  for el in A:
    seen.add(el):
    while res in seen:
      res += 1
  return res

It works as follows:

  1. Init res to the best possible result (1)
  2. keep track of seen elements with a set for constant time lookups.
  3. iterate over As elements
  4. add them to the set.
  5. increment res while it is present in seen

The reason this is of O(n) time complexity is that in the worst case the elements of the input array don't have any gap between them, start at one, and we have to increment res n times.

If there's any gap, we've found it already and sit there not incrementing further.

Sign up to request clarification or add additional context in comments.

3 Comments

In the literal sense, "O(m + n) where m is the maximum element in A" is exponential, not linear.
Thank you so much. So what would a better algorithm be?
added a better algorithm @prison-mike

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.