7

I have a string abccddde

I need to find substrings like: a, b, c, cc, d, dd, ddd, e

substrings ab or cd are not valid.

I tried finding all the substrings from a string but its not efficient

def get_all_substrings(input_string):
    length = len(input_string)
    return [input_string[i:j+1] for i in range(length) for j in range(i,length)]

This is outputting:

['a', 'ab', 'abc', 'abcc', 'abccd', 'abccdd', 'abccddd', 'abccddde', 'b', 'bc', 'bcc', 'bccd', 'bccdd', 'bccddd', 'bccddde', 'c', 'cc', 'ccd', 'ccdd', 'ccddd', 'ccddde', 'c', 'cd', 'cdd', 'cddd', 'cddde', 'd', 'dd', 'ddd', 'ddde', 'd', 'dd', 'dde', 'd', 'de', 'e']

This was the method i followed to find the substrings but it gives all the possiblities but that is what makes it inefficient Please Help!

4
  • Are the same symbols in the source string always grouped together? If yes, the solution with collections.Counter is ok. If no, it isn't. Commented Jan 28, 2017 at 6:27
  • if i take a string 'ccdc' its giving me output as 'c', 'cc', 'ccc', 'd' which is wrong. It should not print 'ccc' as its not a contiguous substring. So the collections.Counter solution is not completely correct. Commented Jan 28, 2017 at 6:29
  • @Sagar Check mine. Commented Jan 28, 2017 at 6:41
  • were you trying to solve "Weighted Uniform Strings"? :) Commented Jan 29, 2017 at 0:45

5 Answers 5

4

You can use itertools.groupby() for this:

from itertools import groupby

s = 'abccdddcce'
l1 = ["".join(g) for k, g in groupby(s)]
l2 = [a[:i+1] for a in l1 for i in range(len(a))]
print l2

Output:

['a', 'b', 'c', 'cc', 'd', 'dd', 'ddd', 'c', 'cc', 'e']

For larger input data, replace the Lists with Generators,

l1=()
l2=()
Sign up to request clarification or add additional context in comments.

3 Comments

this answer is really cool. But is it efficient enough for larger inputs ? I tried with larger inputs and its taking a bit time
@Sagar What you can do is make it a generator instead of list l1=(..); l2=(..) and then you can loop over l2 to get the strings.
This solution is a silverbullet for my problem.thanks a lot mate.
1

itertools.groupby can tell you the number of consecutive chars. After that for each group you have the char repeated upto that number.

from itertools import groupby

def substrings(s):
    for char, group in groupby(s):
        substr = ''
        for i in group:
            substr += i
            yield substr

for result in substrings('abccdddcce'):
    print(result)

2 Comments

You can leave out the key function in groupby(s, lambda x: x) as it defaults to identity: groupby(s).
thanks a lot for the solution.even this worked for me.
0

here is one way using regex:

In [85]: [j for i in re.findall(r'((\w)(\2+)?)', s) for j in set(i) if j]
Out[85]: ['a', 'b', 'c', 'cc', 'ddd', 'dd', 'd', 'e']

1 Comment

this also partially worked for me. anyway thanks for the solution. :)
0
from itertools import groupby

def runlength_compress(src):
    return ((k, sum(1 for _ in g)) for k,g in groupby(src))

def contiguous_substrings(src):
    return [c*(i+1) for c, count in runlength_compress(src) for i in range(count)]

print(contiguous_substrings('abccddde'))

Comments

0

The following will do what you want. I don't know if its efficient compared to other solutions though.

def get_all_substrings(text):
    res = []
    prev = ''
    s = ''

    for c in text:
        if c == prev:
            s += c
        else:
            s = prev = c
        res.append(s)

    return res

# Output
>>> get_all_substrings('abccddde')
['a', 'b', 'c', 'cc', 'd', 'dd', 'ddd', 'e']
>>> get_all_substrings('abccdddec')
['a', 'b', 'c', 'cc', 'd', 'dd', 'ddd', 'e', 'c']

Timings

import timeit
import random

size = 100
values = 'abcde'
s = ''.join(random.choice(values) for _ in range(size))

print(s)

print(timeit.timeit("get_all_substrings(s)",
                    setup = 'from __main__ import s, get_all_substrings',
                    number = 10000) )

# Example for size 100 input
abbaaebacddbdedbdbbacadcdddabaeabacdcbeebbccaadebdcecadcecceececcacebacecbbccdedddddabaeeceeeccabdcc
0.16761969871275967

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.