1

I have a list of values:

a = [1,3,4,5,2]

I now want the following function:

does_segment_exist(a, [1,3,4]) #True
does_segment_exist(a, [3,4,5]) #True
does_segment_exist(a, [4,5,2]) #True
does_segment_exist(a, [1,4,5]) #False
does_segment_exist(a, [1,3]) #True
does_segment_exist(a, [1,4]) #False

So the values must be found in consecutive order.

I there a clever way of doing this in Python 3?

1

3 Answers 3

4

You can use a rolling window iterator, in this case one from an old version of the itertools docs:

from itertools import islice

def window(seq, n=2):
    "Returns a sliding window (of width n) over data from the iterable"
    "   s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   "
    it = iter(seq)
    result = tuple(islice(it, n))
    if len(result) == n:
        yield result
    for elem in it:
        result = result[1:] + (elem,)
        yield result

def does_segment_exist(iterable, sublist):
    return tuple(sublist) in window(iterable, len(sublist))

print(does_segment_exist([1,3,4,5,2], [3,4,5]))

If you only need it to work on lists, not any iterable, you can use:

def does_segment_exist(seq, sublist):
    # seq and sublist must both be lists
    n = len(sublist)
    return sublist in (seq[i:i+n] for i in range(len(seq) + 1 - n))

A basic implementation of the method mentioned by Raymond:

def does_segment_exist(seq, sublist):
    first = sublist[0]
    i = 0
    n = len(sublist)
    while True:
        try:
            i = seq.index(first, i)
        except ValueError:
            return False
        if sublist == seq[i:i+n]:
            return True
        i += 1

print(does_segment_exist([1,3,4,5,2], [3,4,5]))

The advantage of this method is that it doesn't have to slice for every index up to the first match, just for indexes corresponding to matches for the first value in the segment.

Sign up to request clarification or add additional context in comments.

Comments

1

There are many ways to do this and they are all isomorphic to substring search algorithms.

The easiest way is the naïve search using list.index() to find a common start point and then using a slice to check for a full match. If there is not a match, repeat the search until you hit the end of the list.

3 Comments

>>> help(list.index) Help on method_descriptor: index(...) L.index(value, [start, [stop]]) -> integer -- return first index of value. Raises ValueError if the value is not present.
Any particular reason that's not shown on the standard types page of the docs?
File a bug tracker report for a doc update. It was likely overlooked (at one point, not all sequences had index()).
1

This should work with Python 2.5 and newer:

def does_segment_exist(sequence, segment):
    n, m = len(sequence), len(segment)
    return any(segment == sequence[i:i+m] for i in range(n+1-m))

3 Comments

If you look at my second one it is the same -- short circuiting slice comparison. in just automatically does the ==.
Sorry, overlooked that. Yes, it's essentially the same, and there should not be noteworthy performance differences.
Actually your solution using "in" even looks more elegant and works with older Python versions.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.