efficient string char elimination in python

Question

I have a python for loop over a string, that have to decide char-by-char if the current char should stay or be removed.

Let's say I have a function fun(character,state) which receives a char, and some state parameter, and returns True if the char should be kept, and a new state. for example:

def fun(c,state):
    if state%5==0:
        return False, state+1
    else:
        return True, state+2

I have two ideas how to iterate (sequentially) over the string with this function, but I can't decide between them (or others) in terms of runtime complexity and space usage tradeoff:

Option 1:

def option1(string):
    state = 0
    i =0
    while i<len(string):
        answer, state = fun(string[i],state)
        if not answer:
             string = string[:i]+string[i+1:]
        else:
            i+=1
    return string

Option 2:

def option2(string):
    result  = ''
    state = 0
    for c in string:
        answer, state = fun(c,state)
        if answer:
             result+=c
   return result

Does anyone has a better approach/ solution?

edit: some timing results:

on a long string ( 67976 chars) the results are:

option1 - 205 ms
option2 - 46 ms

looks like eliminating the i'th char with string[:i]+string[i+1:] is not a good idea.

You can just measure the performance and see what happens. Your best bet in making this faster (if it matters) is avoiding the function call per character altogether. — pvg
– pvg, Commented Jun 29, 2017 at 13:36
I'm looking for a theoretical point of view, since just timing won't find a better approach. — Dimgold
– Dimgold, Commented Jun 29, 2017 at 13:39
I'm not sure what the theoretical differences here are. You have to iterate over the entire string and apply your condition. The rest are implementation details that are very much a matter of measurement. They'll also tend to dominate the running time (i.e. how many function calls you make, how many new strings you instantiate and instantly turn to garbage, etc). — pvg
– pvg, Commented Jun 29, 2017 at 13:46
Your option 1 may be slow because you are calculating len(string) every loop. — BoilingFire
– BoilingFire, Commented Jun 29, 2017 at 14:21
Now that you're measuring things, try taking out the function call as an experiment - you'll see it completely dominates the time taken, making your option2 (and the one in the join answer below) nearly two times faster. — pvg
– pvg, Commented Jun 29, 2017 at 14:51

Mike Müller · Accepted Answer · 2017-06-29 13:57:07Z

3

String concatenation with += is not recommended. Better use a list and join it.

def option2(string):
    result  = []
    state = 0
    for c in string:
        answer, state = fun(c, state)
        if answer:
             result.append(c)
   return ''.join(result)

This a good explanation why. While += is optimized in CPython (for certain cases?), it can be really slow in other implementations such as PyPy, for example.

edited Jun 29, 2017 at 13:57

answered Jun 29, 2017 at 13:49

Mike Müller

86k21 gold badges174 silver badges165 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Dimgold Over a year ago

indeed, the time elapsed is 38 ms - the best by now

pvg · Accepted Answer · 2017-06-29 18:38:02Z

In CPython, a good performance rule of thumb is 'avoid function calls in tight loops'. The overhead tends to take a significant amount of time compared to simple iterative computations. In your case, a 'brute force' approach is to simply inline the logic in the loop that traverses your string:

def option4(strng):
    result  = []
    state = 0
    for c in strng:
        if state % 5 == 0:
            state += 1
        else:
            result.append(c)
            state += 2
    return ''.join(result)

This may feel a little clunky, a more 'pythonic' approach might be to use a python generator that takes an iterable and initial state as input and yields the matching members.

def filtergen(iter, initstate):
    state = initstate
    for c in iter:
        if state % 5 == 0:
            state += 1
        else:
            state += 2
            yield c

def option5(strng):
    return ''.join(filtergen(strng, 0))

This reduces call overhead - you only need as many calls as matching elements and as a side benefit, frees the caller from the need to keep track of state. Additionally, you no longer need the intermediate list. The filter remains generic - it works on any kind of iterable, rather than just strings.

Putting all of these together and measuring performance:

s = 'T0kzcZ0x8VQY8ulgFKU8D1MlIUULRsVsNZMnXbjliUEES6sEIVUzpjxlHLG59z' * 1200

def fun(c, state):
    if state % 5 == 0:
        return False, state + 1
    else:
        return True, state + 2

def option3(strng):
    result  = []
    state = 0
    for c in strng:
        answer, state = fun(c, state)
        if answer:
             result.append(c)
    return ''.join(result)

def option4(strng):
    result  = []
    state = 0
    for c in strng:
        if state % 5 == 0:
            state += 1
        else:
            result.append(c)
            state += 2
    return ''.join(result)

def filtergen(iter, initstate):
    state = initstate
    for c in iter:
        if state % 5 == 0:
            state += 1
        else:
            state += 2
            yield c

def option5(strng):
    return ''.join(filtergen(strng, 0))

import timeit
print("string length is " + str(len(s)))
print(timeit.timeit('option3(s)', globals=globals(), number=100))
print(timeit.timeit('option4(s)', globals=globals(), number=100))
print(timeit.timeit('option5(s)', globals=globals(), number=100))

print(option3(s) == option(4) == option5(s))

On my dinky laptop, the output looks like this (times are in seconds):

pvg /tmp ➤  python3 s.py
string length is 74400
2.488818967998668
1.3291878789968905
1.268602224998176
True

The 'inline logic' and generator both easily outperform the naive call-per-element approach.

Collectives™ on Stack Overflow

efficient string char elimination in python

Option 1:

Option 2:

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

Option 1:

Option 2:

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related