6

I did an experiment in which I tried to find the time it takes to search a python list. I have a list arr with random integers. arr_s has the same elements only sorted.

arr = np.random.randint(low = 0, high = 1000, size = 500)
arr_s = sorted(arr)

Now I create a random array of integers find which has elements that I want to search in arr and arr_s.

>>> %%timeit
...:find = np.random.randint(0, 1000, 600)
...:for i in find:
...:    if i in arr:
...:        continue

[OUT]:100 loops, best of 3: 2.18 ms per loop


>>> %%timeit
...:find = np.random.randint(0, 1000, 600)
...:for i in find:
...:    if i in arr_s:
...:        continue

[OUT]:100 loops, best of 3: 5.15 ms per loop

Now I understand that I have not used any specific method to search in the sorted array (e.g. binary search). So it might be doing the standard linear search but why does it take significantly longer to search in the sorted array that in the unsorted array? I would think that it should take nearly the same time. I have tried all sorts of find array. Arrays which have integers from (0, 1000), (-1000, -100) and (-10000, 10000) the loops always take longer for the sorted array.

1

3 Answers 3

7
arr = np.random.randint(low = 0, high = 1000, size = 500)
arr_s = sorted(arr)

arr is an array. arr_s is a list. Searching an array can be handled efficiently by numpy, but searching a list requires following pointers and performing type checks. It has nothing to do with sorting.

Note: in does weird things in numpy. Using in with numpy ndarrays may be a bad idea.

Sign up to request clarification or add additional context in comments.

3 Comments

I converted the array to list. Now they both take the same time.
This answer is correct. Python lists are unfortunately...pretty inefficient. :\
Iterating over a numpy array is slow as heck because numpy has to create wrapper objects for array elements when you access them. This is one of many reasons why you should always use vectorized operations instead of loops when working with ndarrays.
0

Python lists are not like C arrays. They are not just a simple block of memory where element 1 always comes after element 0, and so on. Instead, under the hood Python is storing things in a flexible way so that you can add and remove elements of arbitrary types and move things around at will.

In this case, my guess is that the act of sorting the list changes the underlying organization, making it somewhat less efficient to access the elements.

Comments

0

I do not have an exact answer but a possible starting point is to check at the iterators used by each object.



    In [9]: it = arr.__iter__()
    In [10]: its = arr_s.__iter__()
    In [11]: type(it)
    Out[11]: iterator
    In [12]: type(its)
    Out[12]: listiterator

They apparently use two different iterators which could explain the difference in speed.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.