7

I was attempting LeetCode question 128. Longest Consecutive Sequence:

Given an unsorted array of integers nums, return the length of the longest consecutive elements sequence.

You must write an algorithm that runs in O(n) time.

Example 1:

Input: nums = [100,4,200,1,3,2]
Output: 4
Explanation: The longest consecutive elements sequence is [1, 2, 3, 4]. Therefore its length is 4.

Constraints:

  • 0 <= nums.length <= 105
  • -109 <= nums[i] <= 109

My first attempt did sorting followed by counting. This has O(nlogn) time complexity, but it surprisingly gave me 93.93% percentile for time complexity (40ms).

I then re-read the question and realised the answer must be in O(n) time complexity. So I wrote the following code:

def longestConsecutive(self, nums: List[int]) -> int:
        s = set(nums)
        longest_streak = 0
        
        for num in nums:
            if (num - 1) not in s:
                current_streak = 1
                while (num + 1) in s:
                    num += 1
                    current_streak += 1
                longest_streak = max(longest_streak, current_streak)
            
        return longest_streak

(I know, it's not a great practice to reuse num variable name in the nested loop, but that's beside the point of the question, I have tested using a separate variable as well with the same result as below)

While this should theoretically be O(n) time complexity, faster than my first solution, this actually resulted in time limit exceeded for a couple of the cases and my code was rejected.

I eventually submitted a passing solution after referring to the solution

class Solution:
    def longestConsecutive(self, nums: List[int]) -> int:
        nums = set(nums)
        longest_streak = 0
        
        for num in nums:
            if (num - 1) not in nums:
                next_num = num + 1
                while next_num in nums:
                    next_num += 1
                longest_streak = max(longest_streak, next_num - num)
            
        return longest_streak

where I identified 2 key differences:

  1. I reassigned nums to a set in-place instead of a new variable
  2. I used next_num instead of keeping a current_streak variable

However, both of these changes does not seem like it should have significant impact on the runtime, enough to cross the line between time-limit exceeded into a passing solution. To puzzle me even more, this O(n) solution still performed worse than my sorting solution, ranking only at 75.73% percentile (46 ms).

So my questions are:

  1. Why does a O(nlogn) algorithm perform faster than O(n) in practice?
  2. Why is my first O(n) algorithm so slow that it reached time-limit exceeded while my second algorithm with minimal changes could pass?
1
  • Although it wouldn't reduce the big-O time complexity, before counting a sequence to measure how long it is, you could check to see whether it's possible that it's longer than the longest-so-far: if num-1 not in nums and n+longest_sequence in nums:. The cost-benefit tradeoff depends on the actual data, but I believe the cost in the worst case is pretty low and the savings in the best could be substantial. Commented Apr 3 at 14:42

3 Answers 3

11

Why does a O(nlogn) algorithm perform faster than O(n) in practice?

Time complexity does not say anything about actual running times for concrete input sizes. It only says something about how running times will evolve asymptotically as the input size grows large.

In general (unrelated to the actual code you presented), we can imagine a O(𝑛) algorithm that needs 1000 + 10𝑛 milliseconds to complete, and a O(𝑛log𝑛) algorithm that needs 1 + 𝑛log10𝑛 milliseconds to complete. And now it becomes clear how the O(𝑛log𝑛) algorithm will beat the O(𝑛) one for lots of realistic input sizes.

See how many milliseconds they would need for concrete values of 𝑛:

𝑛 O(𝑛) O(𝑛log𝑛)
1 1 010 1
10 1 100 11
100 2 000 201
1000 11 000 3 001
10000 101 000 40 001
100000 1 001 000 500 001
1000000 10 001 000 6 000 001
10000000 100 001 000 70 000 001
100000000 1 000 001 000 800 000 001
1000000000 10 000 001 000 9 000 000 001
10000000000 100 000 001 000 100 000 000 001
100000000000 1 000 000 001 000 1 100 000 000 001

By definition, there is always an 𝑛 above which the better time complexity will win out, but as you can see, it can be a very high threshold for 𝑛: in the table above only the last row shows a win for the theoretically more efficient algorithm.

Why is my first O(n) algorithm so slow that it reached time-limit exceeded while my second algorithm with minimal changes could pass?

It is because of the outer loop. In the slow version, it iterates over the input list, while in the faster version, it iterates over the set. This means that if the input is large and has lots of duplicate values, the slow version is repeating work that does not bring any benefit. Your initial version only needs to replace that in order to finish the job within the given time limit:

Change:

for num in nums:

to:

for num in s:
Sign up to request clarification or add additional context in comments.

Comments

1

Your first O(n) attempt isn't O(n) but O(n^2), even assuming that set lookups are O(1). Consider an input like this:

nums = [0] * 50000 + list(range(50000))

That would take your code several minutes.

Your

        for num in nums:
            if (num - 1) not in s:

gets num = 0 50001 times and the if condition is true every time, so you check the whole streak up to 49999 every time.

Your improved version avoids that because it gets num = 0 only once, as the set removes all its duplicates. It is O(n) as intended (assuming that set lookups are O(1)).

Comments

-1

First of all, none of those code variants has the complexity of O (N), since it contains the nested cycle. This is o (n * m), and at n == m it turns out O (n ** 2). Only the 'flat' cycle has the complexity of O (n). Or if m is always equal to 1/0, which simply cancel iteration.

Now we go to your first code. First: the FOR cycle passes along the list, and along set. Random list may look like this: [2, 3, 3, 3, ... 100,500 More Times ..., 1]. And your cycle will pass all 100,500 of the same elements and each time will launch a useless check. Second:

while (num + 1) in s:
    num += 1

D.R.Y., Don`t Repeat Yourself))) From scratch you increased the number of operations by 100%. All these 'minor little things' gave the result that you see.

If we talk about optimization, then this is not the best code option (yes, I found the source where you got this decision from :)). It is not necessary to check the same elements many times in a row when it is already known for sure that they will no longer be needed. Set () is the same fast data structure and when performing comparison/addition/subtraction operations. Therefore, it will be logical after finding each sequence to remove all its elements from the set, they are no longer needed. Launch this code (or create a list of any length, for example, [random.randrange (1000) for _ in Range 100_000] and compare the number of operations that have completed the first and the second fragments of the code to get the result.

P. S. I apologize for the grammar, I write through the translator.

lst = [44, 17, 57, 64, 90, 5, 44, 43, 99, 33, 31, 52, 55, 50, 2, 58, 12, 32, 27,
 73, 5, 12, 45, 86, 52, 40, 3, 68, 17, 87, 34, 71, 23, 17, 60, 47, 66, 47, 95,
 67, 82, 92, 75, 13, 35, 7, 48, 13, 35, 74, 10, 63, 26, 48, 14, 38, 85, 72, 73,
 0, 75, 61, 88, 49, 90, 65, 80, 35, 55, 96, 73, 88, 69, 59, 29, 5, 4, 36, 84,
 21, 36, 34, 44, 91, 97, 3, 50, 31, 54, 27, 79, 7, 22, 38, 85, 76, 80, 35, 2, 87]

set_1 = set(lst)
set_2 = set()
res = 0
steps = 0

def longest_consecurtive(arg: set[int]) -> int:
    global steps
    streak = 0
    x = 0
    for x in arg:
        steps += 1
        if x - 1 not in arg:
            set_2.add(x)
            streak += 1
            break
    while (x := x + 1) in arg:
        steps += 1
        set_2.add(x)
        streak += 1
    return streak


while s := set_1 - set_2:
    steps += 1
    res = max(longest_consecurtive(s), res)

print(res)
print(f'Solution by sets finished in {steps} steps\n')

steps = 0


class Solution:
    def longestConsecutive(self, nums: list[int]) -> int:
        # Create a set from the list for O(1) lookups
        num_set = set(nums)
        longest_streak = 0

        # Iterate over each number in the list
        for number in nums:
            global steps
            steps += 1
            # Check if it's the start of a sequence
            if number - 1 not in num_set:

                # Initialize the current number as the possible start of a sequence
                current_num = number
                current_streak = 1

                # Increment the current_num to find the length of the streak
                while current_num + 1 in num_set:
                    steps += 1
                    current_num += 1
                    current_streak += 1

                # Update the longest_streak with the maximum streak found
                longest_streak = max(longest_streak, current_streak)

        # Return the length of the longest consecutive sequence
        return longest_streak


res = Solution()
print(res.longestConsecutive(lst))
print(f'Soluton by list finished in {steps} steps')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.