1

Let's say there is a string of any length, and it only contains the letters A through D:

s1 = 'ACDCADBCDBABDCBDAACDCADCDAB'

What is the most efficient/fastest way to replace every 'B' with an 'C' and every 'C' with a 'B'.

Heres what I am doing now:

replacedString = ''
for i in s1:
    if i == 'B':
        replacedString += 'C'
    elif i == 'C':
        replacedString += 'B'
    else:
        replacedString += i

This works but it is obviously not very elegant. The probelm is that I am dealing with strings that can be ones of milliions of characters long, so I need a better solution.

I can't think of a way to do this with the .replace() method. This suggests that maybe a regular expression is the way to go. Is that applicable here as well? If so what is a suitable regular expression? Is there an even faster way?

Thank you.

10
  • The 'duplicate' question seems to address removal of characters not replacement. Commented Feb 27, 2015 at 22:02
  • 8
    Same idea, s1.translate(str.maketrans('BC', 'XY')) Commented Feb 27, 2015 at 22:03
  • 1
    Yes, I was about to tell you about string translation before Cyber marked as duplicate, but basically, you don't want to use a dictionary because you will replace already replaced values. Commented Feb 27, 2015 at 22:07
  • If the post linked as duplicate doesn't help you, see this: tutorialspoint.com/python/string_translate.htm Commented Feb 27, 2015 at 22:11
  • "efficient" can mean different things to different people. Do you only want to iterate once? If you can afford to iterate multiple times use str.replace otherwise use translate. Commented Feb 27, 2015 at 22:17

3 Answers 3

4

I wanted to show you the effects of improper translation. Let's pretend we had a DNA sequence like the string and we want to translate to RNA string. One method uses incorrect replacement whereas the other uses string concatenation.

string = 'GGGCCCGCGCCCGGG' # DNA string ready for transcription

Replacement

The problem with replacement is that the already replaced letters will be replaced in a future iteration. For example, you can see that once it is finished that you'll have a string of the same letter rather than a complete inversion.

string = 'GGGCCCGCGCCCGGG'

coding = {'A': 'U', 'T': 'A',
          'G': 'C', 'C': 'G'}

for k, v in coding.items():
    string = string.replace(k, v)

print string

Concatenation

Instead use string concatenation with a different string. As a result, you can retain the original string without replacing incorrectly. You can of course use a string translation, but I tend to prefer dictionaries because by definition, they map values.

string = 'GGGCCCGCGCCCGGG'

coding = {'A': 'U', 'T': 'A',
          'G': 'C', 'C': 'G'}

answer = ''

for char in string:
    answer += coding[char]

print answer
Sign up to request clarification or add additional context in comments.

3 Comments

IMO toss this in a gist and put it as a comment. Agreed that this is Not An Answer
What does that mean?
I'm going to add an alternative answer to this advice.
2

Apart from the str.translate method, you could simply build a translation dict and run it yourself.

s1 = 'ACDCADBCDBABDCBDAACDCADCDAB'

def str_translate_method(s1):
    try:
        translationdict = str.maketrans("BC","CB")
    except AttributeError: # python2
        import string
        translationdict = string.maketrans("BC","CB")
    result = s1.translate(translationdict)
    return result

def dict_method(s1):
    from, to = "BC", "CB"
    translationdict = dict(zip(from, to))
    result = ' '.join([translationdict.get(c, c) for c in s1])
    return result

1 Comment

This is nearly identical to my answer here. The question is possibly a dupe, but since this question has overlapping translations (B <--> C) it seems different enough to answer.
0

Using regular expression, this handles the case sensitivity as well e.g. if alphabet which has to be replace in string is in lowercase then it will replace it with lowercase replacement character else uppercase:

import re

chars_map = {'b': 'c', 'c': 'b'} # build a dictionary of replacement characters in lowercase

def rep(match):
    char = match.group(0)
    replacement = chars_map[char.lower()]
    return replacement if char.islower() else replacement.upper()

s = 'AbC'
print re.sub('(?i)%s' % '|'.join(chars_map.keys()), rep, s) # 'AcB'

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.