0

Suppose you have a string which you want to parse into a specific format. That means: replace all ' ', '.', '-', etc with '_'.

I know that I could do this:

>s = "Hello----.....    World"
>s = s.replace('-','_').replace('.', '_').replace(' ', '_')
>print s
>Hello_____________World

And get what I want. But, is there a cleaner way? A more pythonic way? I tried parsing a list in to the first argument of replace, but that didn't work very well.

4 Answers 4

2

Use Regular Expressions.

Ex:

import re

s = "Hello----.....    World"
print(re.sub(r"[ .-]", "_", s))

Here is the Python tutorial.

Sign up to request clarification or add additional context in comments.

Comments

1

You can do it using str.translate and string.maketrans which will be the most efficient approach not chaining calls etc..:

In [6]: from string import maketrans

In [7]: s = "Hello----.....    World"

In [8]: table = maketrans(' .-',"___")

In [9]: print(s.translate(table))
Hello_____________World

The timings:

In [12]: %%timeit
   ....: s = "Hello----.....    World"
   ....: table = maketrans(' .-',"___")
   ....: s.translate(table)
   ....: 

1000000 loops, best of 3: 1.14 µs per loop

In [13]: timeit  s.replace('-','_').replace('.', '_').replace(' ', '_')
100000 loops, best of 3: 2.2 µs per loop
In [14]: %%timeit                                                      
text = "Hello----.....    World"
for ch in [' ', '.', '-']:
    if ch in text:
        text = text.replace(ch,'_')
   ....: 
100000 loops, best of 3: 3.51 µs per loop

In [18]: %%timeit
....: s = "Hello----.....    World"
....: re.sub(r"[ .-]", "_", s)
....: 
100000 loops, best of 3: 11 µs per loop

Even pre-compiling the pattern leaves around 10µs so the regex is by far the least efficient approach.

In [20]: patt=  re.compile(r"[ .-]")

In [21]: %%timeit            
s = "Hello----.....    World"
patt.sub( "_", s)
   ....: 
100000 loops, best of 3: 9.98 µs per loop

Pre creating the table gets us down to nanoseconds:

In [22]: %%timeit                                                      
s = "Hello----.....    World"
s.translate(table)
   ....: 

1000000 loops, best of 3: 590 ns per loop

Comments

1

Use re

>>> import re
>>> print re.sub(' |\.|-', '_',"Hello----.....    World")
Hello_____________World

Bonus solution not using regex:

>>> keys = [' ', '.', '-']
>>> print ''.join('_' if c in keys else c for c in "Hello----.....    World")
Hello_____________World

Comments

0

This answer lays out a variety of different ways to accomplish this task, contrasting different functions and inputs by speed.

If you are replacing few characters, the fastest way is the way in your question, by chaining multiple replaces, with regular expressions being the slowest.

If you want to make this more 'pythonic', the best way to leverage both speed and readability, is to make a list of the characters you want to replace, and loop through them.

text = "Hello----.....    World"
for ch in [' ', '.', '-']:
    if ch in text:
        text = text.replace(ch,'_')

1 Comment

This is incorrect, the fastest way is not chaining, also doing if ch in text is redundant when you could just text = text.replace(ch,'_'), nothing is going to be replaced if there is nothing in there to replace

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.