Why search in sorted list in python takes longer?

Question

I did an experiment in which I tried to find the time it takes to search a python list. I have a list arr with random integers. arr_s has the same elements only sorted.

arr = np.random.randint(low = 0, high = 1000, size = 500)
arr_s = sorted(arr)

Now I create a random array of integers find which has elements that I want to search in arr and arr_s.

>>> %%timeit
...:find = np.random.randint(0, 1000, 600)
...:for i in find:
...:    if i in arr:
...:        continue

[OUT]:100 loops, best of 3: 2.18 ms per loop


>>> %%timeit
...:find = np.random.randint(0, 1000, 600)
...:for i in find:
...:    if i in arr_s:
...:        continue

[OUT]:100 loops, best of 3: 5.15 ms per loop

Now I understand that I have not used any specific method to search in the sorted array (e.g. binary search). So it might be doing the standard linear search but why does it take significantly longer to search in the sorted array that in the unsorted array? I would think that it should take nearly the same time. I have tried all sorts of find array. Arrays which have integers from (0, 1000), (-1000, -100) and (-10000, 10000) the loops always take longer for the sorted array.

you can perhaps find some partial answer in stackoverflow.com/questions/12905513/… — Fredrik Pihl
– Fredrik Pihl, Commented Sep 5, 2013 at 18:51

Community · Accepted Answer · 2017-05-23 12:02:22Z

7

arr = np.random.randint(low = 0, high = 1000, size = 500)
arr_s = sorted(arr)

arr is an array. arr_s is a list. Searching an array can be handled efficiently by numpy, but searching a list requires following pointers and performing type checks. It has nothing to do with sorting.

Note: in does weird things in numpy. Using in with numpy ndarrays may be a bad idea.

edited May 23, 2017 at 12:02

CommunityBot

11 silver badge

answered Sep 5, 2013 at 18:53

user2357112

286k32 gold badges490 silver badges571 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Shishir Pandey Over a year ago

I converted the array to list. Now they both take the same time.

Shashank Over a year ago

This answer is correct. Python lists are unfortunately...pretty inefficient. :\

user2357112 Over a year ago

Iterating over a numpy array is slow as heck because numpy has to create wrapper objects for array elements when you access them. This is one of many reasons why you should always use vectorized operations instead of loops when working with ndarrays.

user1919238 · Accepted Answer · 2013-09-05 18:51:25Z

0

Python lists are not like C arrays. They are not just a simple block of memory where element 1 always comes after element 0, and so on. Instead, under the hood Python is storing things in a flexible way so that you can add and remove elements of arbitrary types and move things around at will.

In this case, my guess is that the act of sorting the list changes the underlying organization, making it somewhat less efficient to access the elements.

answered Sep 5, 2013 at 18:51

user1919238

Comments

user2719127 · Accepted Answer · 2013-09-05 19:17:14Z

0

I do not have an exact answer but a possible starting point is to check at the iterators used by each object.



    In [9]: it = arr.__iter__()
    In [10]: its = arr_s.__iter__()
    In [11]: type(it)
    Out[11]: iterator
    In [12]: type(its)
    Out[12]: listiterator

They apparently use two different iterators which could explain the difference in speed.

answered Sep 5, 2013 at 19:17

user2719127

1

Collectives™ on Stack Overflow

Why search in sorted list in python takes longer?

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related