168

What is the complexity of the in operator in Python? Is it theta(n)?

Is it the same as the following?

def find(L, x):
   for e in L:
       if e == x:
           return True
   return False

L is a list.

7
  • 7
    It depends on the type of container, since using it with a dictionary or set will be much faster than with an array. Commented Dec 14, 2012 at 18:19
  • 1
    @BasicWolf I have used L, so it is list Commented Dec 14, 2012 at 18:29
  • 11
    @Rastegar L doesn't imply a list. seq is the most common choice where one wants to imply a list. L is a terrible variable name. Single letter ones are bad, and the capital implies it's a class. Even if it was something in particular, Python is dynamic, so state it explicitly in a case like this. Commented Dec 14, 2012 at 18:50
  • 5
    L means list? My libtelepathy.so is probably outdated. Commented Dec 14, 2012 at 18:56
  • 2
    @GarethLatty Using lst is also a good name to define a list Commented Sep 24, 2020 at 20:02

3 Answers 3

245

The complexity of in depends entirely on what L is. e in L will become L.__contains__(e).

See this time complexity document for the complexity of several built-in types.

Here is the summary for in:

  • list - Average: O(n)
  • set/dict - Average: O(1), Worst: O(n)

The O(n) worst case for sets and dicts is very uncommon, but it can happen if __hash__ is implemented poorly. This only happens if everything in your set has the same hash value.

Sign up to request clarification or add additional context in comments.

7 Comments

Does anyone happen to know the complexity of the "in" operator for an OrderedDict?
After some testing, I can confirm that the complexity of OrderedDict in Python 2.7 appears to be O(1) in the average case.
@Josh Sherick you don't have to provide tests, all you need are the sources of the OrderedDict, and as you could find out: OrderedDict is inherited from dict, so the most operations (of course, with exceptions) have the same complexity.
Is the time complexity of "in" operator O(n) for tuple as well?
@whitehat linear.
|
25

It depends entirely on the type of the container. Hashing containers (dict, set) use the hash and are essentially O(1). Typical sequences (list, tuple) are implemented as you guess and are O(n). Trees would be average O(log n). And so on. Each of these types would have an appropriate __contains__ method with its big-O characteristics.

7 Comments

of value is to include the overhead of generating the hash.
Hashing data types include dict and set (as wells as potentially others)
@Woot4Moo: When you're talking about asymptotic complexity, that isn't relevant. The overhead of generating the hash is constant. When you're dealing with small values of N, profiling becomes important, because, say, 100 >> 2N for small N. But that's a separate issue from what the OP was asking about; for huge N, 100 << 2N, which is what complexity is all about.
@abarnert well it actually is quite relevant, as you don't arbitrarily choose data structures. You must consider the use and most common ways the structure will be used, so it actually is important to consider the amount of time for a hash function, especially in a scenario where the has must be computed per iteration of a program.
@Woot4Moo: If someone is asking about asymptotic complexity, either (a) they expect to deal with a large N, or (b) they're an idiot. I'm assuming the OP is case (a), but either way, the constant factor isn't relevant to the answer.
|
-1

It depends on the container you're testing. It's usually what you'd expect - linear for ordered datastructures, constant for the unordered. Of course, there are both types (ordered or unordered) which might be backed by some variant of a tree.

5 Comments

@ZoranPavlovic A in B tests whether A is in B.
I'd definitely expect logarithmic time in an ordered structure.
@dedObed Why would you expect that? Do you expect python to already know whether or not your data are sorted?
Because if there is a container designed to be ordered, the obvious reason is to allow for logarithmic lookups. But I guess it's just a naming issue, I'd use "linear" where you wrote "ordered" and all would be fine. (In my head -- English as second language here.)
"Ordered" in Python usually means that the iterator yields elements in the order they were added to the collection. It doesn't usually refer to the case where they are yielded in sorted order (like from a heap), where you'd indeed expect logarithmic time. Maybe this is a case of different terminology between languages, because C++ often seems to use "ordered" in the sense you mean it here.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.