Should I continue to use str.replace over re.sub, if the string manipulation becomes "complicated"

Question

For example I have several thousand strings similar to:

zz='/cars-for-sale/vehicledetails.xhtml?dealerId=54222147&zip=90621&endYear=2015&location=Buena%2BPark%2BCA-90621&startYear=1981&dealerName=CarMax%2BBuena%2BPark&numRecords=100&searchRadius=10&listingId=389520333&Log=0'

I wish to truncate it such that

zz='/cars-for-sale/vehicledetails.xhtml?&listingId=389520333&Log=0'

I have two ways to accomplish this

zz.replace(zz[36:zz.strip('&Log=0').rfind('&')],'')

OR

re.sub('dealer.+Radius=10','',zz)

From a "good engineering practices" standpoint, which one is preferable? Readability vs. Maintainability vs. Speed

I am using Python 2.7

python2 replace takes 1.91 µs vs 3.37 µs for re, python 3 is 2.1µs vs 2.4µs — Padraic Cunningham
– Padraic Cunningham, Commented Jan 8, 2015 at 0:18
@PadraicCunningham I am using python2.7, sorry I did not include that. So I know that replace is faster than using regex, but if the replace manipulation is so complex-looking, is it still preferable? — lollerskates
– lollerskates, Commented Jan 8, 2015 at 0:20
From a "good engineering practices" standpoint, I would probably use urlparse.urlparse() and urlparse.parse_qs()... It would be slower though. — thebjorn
– thebjorn, Commented Jan 8, 2015 at 0:24
FWIW spl = zz.rsplit("&",2)(zz[:36] + "&{}&{}".format(spl[-2], spl[-1])) takes 1.17 µs and spl = zz.rsplit("&",2) (zz[:36] + "&"+spl[-2]+"&" + spl[-1]) takes 935 ns, but any version would break quite easily. — Padraic Cunningham
– Padraic Cunningham, Commented Jan 8, 2015 at 0:33

Community · Accepted Answer · 2020-06-20 09:12:55Z

5

This question is difficult to answer because it is opinion-based. str.replace is definitely faster. Using timeit in ipython with Python 3.4.2:

In []: %timeit zz.replace(zz[36:zz.strip('&Log=0').rfind('&')],'')
100000 loops, best of 3: 2.04 µs per loop

In []: %timeit re.sub('dealer.+Radius=10','',zz)
100000 loops, best of 3: 2.83 µs per loop

As Padraic Cunningham pointed out, the difference is even greater in Python 2:

In []: %timeit zz.replace(zz[36:zz.strip('&Log=0').rfind('&')],'')
100000 loops, best of 3: 2 µs per loop

In []: %timeit re.sub('dealer.+Radius=10','',zz)
100000 loops, best of 3: 3.11 µs per loop

Which one is better depends on the program. Generally, for Python, readability is more important than speed (because the standard PEP 8 style is based on the notion that code is read more than written). If speed is vital for the program, the faster option str.replace would be better. Otherwise, the more readable option re.sub would be better.

EDIT

As Anony-Mousse pointed out, using re.compile instead is both faster and more readable than both. (You added that you're using Python 2, but I'll put the Python 3 test first to reflect the order of my other tests above.)

With Python 3:

In []: z_match = re.compile('dealer.+Radius=10')
In []: %timeit z_match.sub('', zz)
1000000 loops, best of 3: 1.36 µs per loop

With Python 2:

In []: z_match = re.compile('dealer.+Radius=10')
In []: %timeit z_match.sub('', zz)
100000 loops, best of 3: 1.68 µs per loop

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Jan 8, 2015 at 0:36

GreenRaccoon23

3,8638 gold badges37 silver badges47 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Has QUIT--Anony-Mousse Over a year ago

How about regexps precompiled with re.compile, if you want speed?

Collectives™ on Stack Overflow

Should I continue to use str.replace over re.sub, if the string manipulation becomes "complicated"

1 Answer 1

EDIT

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

EDIT

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related