Most Efficient Way to Replace Multiple Characters in a String [duplicate]

Question

Let's say there is a string of any length, and it only contains the letters A through D:

s1 = 'ACDCADBCDBABDCBDAACDCADCDAB'

What is the most efficient/fastest way to replace every 'B' with an 'C' and every 'C' with a 'B'.

Heres what I am doing now:

replacedString = ''
for i in s1:
    if i == 'B':
        replacedString += 'C'
    elif i == 'C':
        replacedString += 'B'
    else:
        replacedString += i

This works but it is obviously not very elegant. The probelm is that I am dealing with strings that can be ones of milliions of characters long, so I need a better solution.

I can't think of a way to do this with the .replace() method. This suggests that maybe a regular expression is the way to go. Is that applicable here as well? If so what is a suitable regular expression? Is there an even faster way?

Thank you.

The 'duplicate' question seems to address removal of characters not replacement. — Malonge
– Malonge, Commented Feb 27, 2015 at 22:02
Yes, I was about to tell you about string translation before Cyber marked as duplicate, but basically, you don't want to use a dictionary because you will replace already replaced values. — Malik Brahimi
– Malik Brahimi, Commented Feb 27, 2015 at 22:07
If the post linked as duplicate doesn't help you, see this: tutorialspoint.com/python/string_translate.htm — Fred Larson
– Fred Larson, Commented Feb 27, 2015 at 22:11
"efficient" can mean different things to different people. Do you only want to iterate once? If you can afford to iterate multiple times use str.replace otherwise use translate. — notorious.no
– notorious.no, Commented Feb 27, 2015 at 22:17

Malik Brahimi · Accepted Answer · 2015-02-27 22:36:43Z

4

I wanted to show you the effects of improper translation. Let's pretend we had a DNA sequence like the string and we want to translate to RNA string. One method uses incorrect replacement whereas the other uses string concatenation.

string = 'GGGCCCGCGCCCGGG' # DNA string ready for transcription

Replacement

The problem with replacement is that the already replaced letters will be replaced in a future iteration. For example, you can see that once it is finished that you'll have a string of the same letter rather than a complete inversion.

string = 'GGGCCCGCGCCCGGG'

coding = {'A': 'U', 'T': 'A',
          'G': 'C', 'C': 'G'}

for k, v in coding.items():
    string = string.replace(k, v)

print string

Concatenation

Instead use string concatenation with a different string. As a result, you can retain the original string without replacing incorrectly. You can of course use a string translation, but I tend to prefer dictionaries because by definition, they map values.

string = 'GGGCCCGCGCCCGGG'

coding = {'A': 'U', 'T': 'A',
          'G': 'C', 'C': 'G'}

answer = ''

for char in string:
    answer += coding[char]

print answer

edited Feb 27, 2015 at 22:36

answered Feb 27, 2015 at 22:28

Malik Brahimi

16.8k7 gold badges47 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Adam Smith Over a year ago

IMO toss this in a gist and put it as a comment. Agreed that this is Not An Answer

Malik Brahimi Over a year ago

What does that mean?

Malik Brahimi Over a year ago

I'm going to add an alternative answer to this advice.

Adam Smith · Accepted Answer · 2015-02-27 22:23:48Z

2

Apart from the str.translate method, you could simply build a translation dict and run it yourself.

s1 = 'ACDCADBCDBABDCBDAACDCADCDAB'

def str_translate_method(s1):
    try:
        translationdict = str.maketrans("BC","CB")
    except AttributeError: # python2
        import string
        translationdict = string.maketrans("BC","CB")
    result = s1.translate(translationdict)
    return result

def dict_method(s1):
    from, to = "BC", "CB"
    translationdict = dict(zip(from, to))
    result = ' '.join([translationdict.get(c, c) for c in s1])
    return result

answered Feb 27, 2015 at 22:23

Adam Smith

54.6k13 gold badges85 silver badges120 bronze badges

1 Comment

Adam Smith Over a year ago

This is nearly identical to my answer here. The question is possibly a dupe, but since this question has overlapping translations (B <--> C) it seems different enough to answer.

Aamir Rind · Accepted Answer · 2015-02-27 23:00:23Z

0

Using regular expression, this handles the case sensitivity as well e.g. if alphabet which has to be replace in string is in lowercase then it will replace it with lowercase replacement character else uppercase:

import re

chars_map = {'b': 'c', 'c': 'b'} # build a dictionary of replacement characters in lowercase

def rep(match):
    char = match.group(0)
    replacement = chars_map[char.lower()]
    return replacement if char.islower() else replacement.upper()

s = 'AbC'
print re.sub('(?i)%s' % '|'.join(chars_map.keys()), rep, s) # 'AcB'

answered Feb 27, 2015 at 23:00

Aamir Rind

39.8k24 gold badges131 silver badges169 bronze badges

Collectives™ on Stack Overflow

Most Efficient Way to Replace Multiple Characters in a String [duplicate]

3 Answers 3

Replacement

Concatenation

3 Comments

1 Comment

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Replacement

Concatenation

3 Comments

1 Comment

Comments

Linked

Related