3

For example I have several thousand strings similar to:

zz='/cars-for-sale/vehicledetails.xhtml?dealerId=54222147&zip=90621&endYear=2015&location=Buena%2BPark%2BCA-90621&startYear=1981&dealerName=CarMax%2BBuena%2BPark&numRecords=100&searchRadius=10&listingId=389520333&Log=0'

I wish to truncate it such that

zz='/cars-for-sale/vehicledetails.xhtml?&listingId=389520333&Log=0'

I have two ways to accomplish this

zz.replace(zz[36:zz.strip('&Log=0').rfind('&')],'')

OR

re.sub('dealer.+Radius=10','',zz)

From a "good engineering practices" standpoint, which one is preferable? Readability vs. Maintainability vs. Speed

I am using Python 2.7

4
  • 1
    python2 replace takes 1.91 µs vs 3.37 µs for re, python 3 is 2.1µs vs 2.4µs Commented Jan 8, 2015 at 0:18
  • @PadraicCunningham I am using python2.7, sorry I did not include that. So I know that replace is faster than using regex, but if the replace manipulation is so complex-looking, is it still preferable? Commented Jan 8, 2015 at 0:20
  • 1
    From a "good engineering practices" standpoint, I would probably use urlparse.urlparse() and urlparse.parse_qs()... It would be slower though. Commented Jan 8, 2015 at 0:24
  • FWIW spl = zz.rsplit("&",2)(zz[:36] + "&{}&{}".format(spl[-2], spl[-1])) takes 1.17 µs and spl = zz.rsplit("&",2) (zz[:36] + "&"+spl[-2]+"&" + spl[-1]) takes 935 ns, but any version would break quite easily. Commented Jan 8, 2015 at 0:33

1 Answer 1

5

This question is difficult to answer because it is opinion-based. str.replace is definitely faster. Using timeit in ipython with Python 3.4.2:

In []: %timeit zz.replace(zz[36:zz.strip('&Log=0').rfind('&')],'')
100000 loops, best of 3: 2.04 µs per loop

In []: %timeit re.sub('dealer.+Radius=10','',zz)
100000 loops, best of 3: 2.83 µs per loop

As Padraic Cunningham pointed out, the difference is even greater in Python 2:

In []: %timeit zz.replace(zz[36:zz.strip('&Log=0').rfind('&')],'')
100000 loops, best of 3: 2 µs per loop

In []: %timeit re.sub('dealer.+Radius=10','',zz)
100000 loops, best of 3: 3.11 µs per loop

Which one is better depends on the program. Generally, for Python, readability is more important than speed (because the standard PEP 8 style is based on the notion that code is read more than written). If speed is vital for the program, the faster option str.replace would be better. Otherwise, the more readable option re.sub would be better.

EDIT

As Anony-Mousse pointed out, using re.compile instead is both faster and more readable than both. (You added that you're using Python 2, but I'll put the Python 3 test first to reflect the order of my other tests above.)

With Python 3:

In []: z_match = re.compile('dealer.+Radius=10')
In []: %timeit z_match.sub('', zz)
1000000 loops, best of 3: 1.36 µs per loop

With Python 2:

In []: z_match = re.compile('dealer.+Radius=10')
In []: %timeit z_match.sub('', zz)
100000 loops, best of 3: 1.68 µs per loop
Sign up to request clarification or add additional context in comments.

1 Comment

How about regexps precompiled with re.compile, if you want speed?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.