3

Could anyone please explain what is wrong here:

def get_complementary_sequence(string):
    dic = {'A':'T', 'C':'G', 'T':'A', 'G':'C'}
    for a, b in dic.items():
        string = string.replace(a, b)
    return string

I get proper results for 'T' and 'C', but 'A' and 'C' won't replace. Got really stuck.

String looks like 'ACGTACG'.

2
  • 4
    You're iterating over each item of the dictionary in order, so they are all being replaced correctly! The problem is that when you replace T, A has already been replaced by T, so you replace it back. Commented Sep 10, 2013 at 15:51
  • now that you are trying to deal with DNA sequence, I suggest you to use BioPython instead Commented Sep 6, 2015 at 9:35

2 Answers 2

6

You are first replacing all As with Ts before then replacing all Ts with As again (including those you just replaced As with!):

>>> string = 'ACGTACG'
>>> string.replace('A', 'T')
'TCGTTCG'
>>> string.replace('A', 'T').replace('T', 'A')
'ACGAACG'

Use a translation map instead, fed to str.translate():

transmap = {ord('A'): 'T', ord('C'): 'G', ord('T'): 'A', ord('G'): 'C'}
return string.translate(transmap)

The str.translate() method requires a dictionary mapping codepoints (integers) to replacement characters (either a single character or a codepoint), or None (to delete the codepoint from the input string). The ord() function gives us those codepoints for the given 'from' letters.

This looks up characters in string, one by one in C code, in the translation map, instead of replacing all As followed by all Ts.

str.translate() has the added advantage of being much faster than a series of str.replace() calls.

Demo:

>>> string = 'ACGTACG'
>>> transmap = {ord('A'): 'T', ord('C'): 'G', ord('T'): 'A', ord('G'): 'C'}
>>> string.translate(transmap)
'TGCATGC'
Sign up to request clarification or add additional context in comments.

1 Comment

str.translate() is also much faster than the naive ''.join(dic[c] for c in s).
2

Mutable data is your enemy :)

See, you first replace all As with Ts, then, in another iteration, replace all Ts with As again.

What works:

# for Creek and Watson's sake, name your variables sensibly
complements = {ord('A'):'T', ord('C'):'G', ord('T'):'A', ord('G'):'C'}
sequence = "AGCTTCAG"
print(sequence.translate(complements))

It prints TCGAAGTC.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.