1

I want to write a function that takes a long string of characters (a protein sequence like 'UGGUGUUAUUAAUGGUUU') and extracts three characters at a time from it (i.e. the codons). It can either return each set of three characters one after another, or a list containing all the sets of three characters. Either way would work. But I'm having some trouble figuring out exactly how to do this cleanly.

Here's what I have so far:

def get_codon_list(codon_string):
    codon_start = 0
    codon_length = 3
    codon_end = 3
    codon_list = []
    for x in range(len(codon_string) // codon_length):
        codon_list.append(codon_string[codon_start:codon_end])
        codon_start += codon_length
        codon_end += codon_length
    return codon_list

It works to return a list of the codons, but it seems very inefficient. I don't like using hard-coded numbers and incrementing variables like that if there is a better way. I also don't like using for loops that don't actually use the variable in the loop. It doesn't seem like a proper use of it.

Any suggestions for how to improve this, either with a specific function/module, or just a better Pythonic technique?

Thanks!

3

5 Answers 5

4

You can use a list comprehension and get a slice of length 3 from the string at each time.

>>> s="UGGUGUUAUUAAUGGUUU"
>>> res = [s[i:i+3] for i in range(0,len(s),3)]
>>> res
['UGG', 'UGU', 'UAU', 'UAA', 'UGG', 'UUU']
Sign up to request clarification or add additional context in comments.

Comments

3

You can simply use the step argument of the range function to avoid maintaining the variables:

def get_codon_list(codon_string):
    codon_length = 3
    codon_list = []

    for codon_start in range(0, len(codon_string), codon_length):
        codon_end = codon_start + codon_length
        codon_list.append(codon_string[codon_start:codon_end])

    return codon_list

And then it can become as a list-comprehension:

def get_codon_list(codon_string):
    codon_length = 3

    codon_list = [codon_string[x:x+codon_length] for x in range(0, len(codon_string), codon_length)]

    return codon_list

Comments

2

The itertools grouper recipe is perfect for that (https://docs.python.org/3/library/itertools.html#itertools-recipes):

In [1]: from itertools import zip_longest

In [2]: def grouper(iterable, n, fillvalue=None):
   ...:     "Collect data into fixed-length chunks or blocks"
   ...:     # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
   ...:     args = [iter(iterable)] * n
   ...:     return zip_longest(*args, fillvalue=fillvalue)
   ...:

In [3]: list(grouper('UGGUGUUAUUAAUGGUUU', 3))
Out[3]:
[('U', 'G', 'G'),
 ('U', 'G', 'U'),
 ('U', 'A', 'U'),
 ('U', 'A', 'A'),
 ('U', 'G', 'G'),
 ('U', 'U', 'U')]

1 Comment

Maybe finish that with [''.join(tup) for tup in grouper('UGGUGUUAUUAAUGGUUU', 3)].
0

You might want to use a while loop here and increment the index by 3 each iteration, printing the next three letters, and exiting when the inedex is within 3 of the length

Comments

0

With regular expression :

import re

def get_codon_list(codon_string):    
    return list(re.findall(r"(\w{3})", codon_string))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.