28

How to test if one string is a subsequence of another?

This is a weaker condition than being a substring. For example 'iran' is not a substring of 'ireland', but it is a subsequence IRelANd. The difference is a subsequence doesn't have to be contiguous.

More examples:

  • 'indonesia' contains 'india'. INDonesIA
  • 'romania' contains 'oman'. rOMANia
  • 'malawi' contains 'mali'. MALawI

Movitation: My friends like word games. Yesterday we played 'countries within countries'. I am curious if there are any pairs we missed.

Edit: If you aren't familiar with the mathematical definition of subsequence

A subsequence is a sequence that can be derived from another sequence by deleting some elements without changing the order of the remaining elements

0

4 Answers 4

55
def is_subseq(x, y):
    it = iter(y)
    return all(any(c == ch for c in it) for ch in x)

assert is_subseq('india', 'indonesia')
assert is_subseq('oman', 'romania')
assert is_subseq('mali', 'malawi')
assert not is_subseq('mali', 'banana')
assert not is_subseq('ais', 'indonesia')
assert not is_subseq('ca', 'abc')

Also works for any iterables:

assert is_subseq(['i', 'n', 'd', 'i', 'a'],
                 ['i', 'n', 'd', 'o', 'n', 'e', 's', 'i', 'a'])

UPDATE

Stefan Pochmann suggested this.

def is_subseq(x, y):
    it = iter(y)
    return all(c in it for c in x)

Both versions uses iterators; Iterator yields items that was not yielded in previous iteration.

For example:

>>> it = iter([1,2,3,4])
>>> for x in it:
...     print(x)
...     break
...
1
>>> for x in it:  # `1` is yielded in previous iteration. It's not yielded here.
...     print(x)
...
2
3
4
Sign up to request clarification or add additional context in comments.

15 Comments

@user189, According to the quote in the question, order is important; ca is not a subsequence of abc.
Ok, misunderstood the use of an iterator here, seems quite clever.
What do you think of all(c in it for c in x)? Does something speak against it?
@user2361174, it in the answer is an iterator. Item that is iterated is not iterated again. (consumed)
Just brilliant!! took a while to grok this, but I love it
|
11

Just keep on looking for the next character of your potential subsequence, starting behind the last found one. As soon as one of the characters can't be found in the remainder of the string, it's no subsequence. If all characters can be found this way, it is:

def is_subsequence(needle, haystack):
    current_pos = 0
    for c in needle:
        current_pos = haystack.find(c, current_pos) + 1
        if current_pos == 0:
            return False
    return True

An advantage over the top answer is that not every character needs to be iterated in Python here, since haystack.find(c, current_pos) is looping in C code. So this approach can perform significantly better in the case that the needle is small and the haystack is large:

>>> needle = "needle"
>>> haystack = "haystack" * 1000
>>> %timeit is_subseq(needle, haystack)
296 µs ± 2.09 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit is_subsequence(needle, haystack)
334 ns ± 1.51 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

2 Comments

@wim Can this be made to work with lists?
@r.e.s. See stackoverflow.com/a/52709319/674039 - I don't believe the string fast path that this answer exploits will be possible with lists.
0

I did

def is_subsequence(x, y):
    """Test whether x is a subsequence of y"""
    x = list(x)
    for letter in y:
        if x and x[0] == letter:
            x.pop(0)

    return not x

1 Comment

Deleting from the left of a list is awfully inefficient, especially if you do it in a loop. And it's completely unnecessary here; you could/should just use an iterator instead.
-3
def subsequence(seq, subseq):
    seq = seq.lower()
    subseq = subseq.lower()
    for char in subseq:
        try:      
            seq = seq[seq.index(char)+1:]            
        except:
            return False
    return True

1 Comment

Repeatedly slicing the input sequence is needlessly inefficient; using an iterator (or index) would be much better.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.