2

I want to replace all occurrences of a set of strings in a text line. I came up with this approach, but I am sure there is a better way of doing this:

myDict = {}
test = re.compile(re.escape('pig'), re.IGNORECASE)
myDict['car'] = test
test = re.compile(re.escape('horse'), re.IGNORECASE)
myDict['airplane'] = test
test = re.compile(re.escape('cow'), re.IGNORECASE)
myDict['bus'] = test

mystring = 'I have this Pig and that pig with a hOrse and coW'

for key in myDict:      
    regex_obj = myDict[key]
    mystring = regex_obj.sub(key, mystring)

print mystring

I have this car and that car with a airplane and bus

Based on @Paul Rooney's answer below, ideally I would do this:

def init_regex():
    rd = {'pig': 'car', 'horse':'airplane', 'cow':'bus'}
    myDict = {}
    for key,value in rd.iteritems():
        pattern = re.compile(re.escape(key), re.IGNORECASE)
        myDict[value] = pattern

    return myDict

def strrep(mystring, patternDict):
    for key in patternDict:
        regex_obj = patternDict[key]
        mystring = regex_obj.sub(key, mystring)

    return mystring
1

1 Answer 1

4

Try

import itertools
import re

mystring = 'I have this Pig and that pig with a hOrse and coW'

rd = {'pig': 'car', 'horse':'airplane', 'cow':'bus'}

cachedict = {}

def strrep(orig, repdict):
    for k,v in repdict.iteritems():
        if k in cachedict:
            pattern = cachedict[k]
        else:
            pattern = re.compile(k, re.IGNORECASE)
            cachedict[k] = pattern
        orig = pattern.sub(v, orig)
    return orig

print strrep(mystring, rd)

This answer was initially written for python2, but for python 3 you would use repdict.items instead of repdict.iteritems.

Sign up to request clarification or add additional context in comments.

7 Comments

Is it possible to cache the compile to save on computation as my strrep would be called multiple times?
Based on your answer I improved the solution to cache the compiled portion of the code into an array so I don't have to do this everytime and then just call the patter with .sub
Yes that works. I was looking at a solution to cache the compiled regex's into a dict.
The dict caching solution seems to actually be a bit slower than without the caching.
The reason your approach to a dictionary degrades performance is probably because you specify a default if the key is not found. I would venture to guess that python performs the compile before calling the get which causes the additional compute unnecessarily every time.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.