This question was motivated from another Stack Overflow question - How do I improve remove duplicate algorithm?
The requirement posted in the questions was -
need to return the length of an array that removed duplicates but we can leave at most 2 duplicates.
Example - [1, 1, 1, 2, 2, 3] , the new array would be [1, 1, 2, 2, 3] . So the new length would be 5 .
The solution given by the OP -
def removeDuplicates(nums):
if nums is None:
return 0
if len(nums) == 0:
return 0
if len(nums) == 1:
return 1
new_array = {}
for num in nums:
new_array[num] = new_array.get(num, 0) + 1
new_length = 0
for key in new_array:
if new_array[key] > 2:
new_length = new_length + 2
else:
new_length = new_length + new_array[key]
return new_length
I tried coming up with a solution that reduced the amount of loops to a single loop.
def removeDuplicates1(nums):
if nums is None:
return 0
if len(nums) == 0:
return 0
if len(nums) == 1:
return 1
new_array = {}
length = 0
for num in nums:
n = new_array.get(num, 0)
new_array[num] = n + 1
if n <= 1:
length += 1
return length
After that, I was trying to time the solution vs the original solution , I thought my solution should have provided atleast a little improvement on the original solution , but the result of timeit showed that the original solution was always better (even when the array contained all unique elements) . Timings taken -
In [3]: l = list(range(1000))
In [4]: %timeit removeDuplicates(l)
1000 loops, best of 3: 390 s per loop
In [5]: %timeit removeDuplicates1(l)
1000 loops, best of 3: 412 s per loop
In [6]: l1 = [1] * 1000
In [7]: %timeit removeDuplicates(l1)
1000 loops, best of 3: 224 s per loop
In [9]: %timeit removeDuplicates1(l1)
1000 loops, best of 3: 304 s per loop
Could someone please advice why this is happenning? Am I overlooking something obvious?
new_arrayisn't an array, it's adict-- OP of the other thread is bad at naming things :/sum(min(2, count) for count in collections.Counter(nums).values())In [19]: %timeit removeDuplicates(l) 1000 loops, best of 3: 373 s per loop In [20]: %timeit rD2(l) 1000 loops, best of 3: 385 s per loop. WhererD2()is the function i defined for your method