3

Alright, I'm working on a little project for school, a 6-frame translator. I won't go into too much detail, I'll just describe what I wanted to add. The normal output would be something like:

TTCPTISPALGLAWS_DLGTLGFMSYSANTASGETLVSLYQLGLFEM_VVSYGRTKYYLICP_LFHLSVGFVPSD

The important part of this string are the M and the _ (the start and stop codons, biology stuff). What I wanted to do was highlight these like so:

TTCPTISPALGLAWS_DLGTLGF 'MSYSANTASGETLVSLYQLGLFEM_' VVSYGRTKYYLICP_LFHLSVGFVPSD

Now here is where (for me) it gets tricky, I got my output to look like this (adding a space and a ' to highlight the start and stop). But it only does this once, for the first start and stop it finds. If there are any other M....._ combinations it won't highlight them.

Here is my current code, attempting to make it highlight more than once:

def start_stop(translation):
index_2 = 0
while True:
    if 'M' in translation[index_2::1]:
        index_1 = translation[index_2::1].find('M')
        index_2 = translation[index_1::1].find('_') + index_1
        new_translation = translation[:index_1] + " '" + \
                          translation[index_1:index_2 + 1] + "' " +\
                          translation[index_2 + 1:]
    else:
        break
    return new_translation

I really thought this would do it, guess not. So now I find myself being stuck. If any of you are willing to try and help, here is a randomly generated string with more than one M....._ set:

'TTCPTISPALGLAWS_DLGTLGFMSYSANTASGETLVSLYQLGLFEM_VVSYGRTKYYLICP_LFHLSVGFVPSDGRRLTLYMPPARRLATKSRFLTPVISSG_DKPRHNPVARSQFLNPLVRPNYSISASKSGLRLVLSYTRLSLGINSLPIERLQYSVPAPAQITP_IPEHGNARNFLPEWPRLLISEPAPSVNVPCSVFVVDPEHPKAHSKPDGIANRLTFRWRLIG_VFFHNAL_VITHGYSRVDILLPVSRALHVHLSKSLLLRSAWFTLRNTRVTGKPQTSKT_FDPKATRVHAIDACAE_QQH_PDSGLRFPAPGSCSEAIRQLMI'

Thank you to anyone willing to help :)

2
  • 1
    wouldn't it be easier to have a boolean which indicates that you have started (encountered an M) and a boolean which indicates when you have ended (encountered a '. So instead of while true just iterate over the string, and with a play of if statements and the booleans you can easily know when you are in a sequence, and when it finishes, the booleans help you keep track of what is happening, so that way you can do not just 1 sequence, but many. Hope this makes sense Commented Dec 7, 2017 at 16:13
  • I think this is a good application of regex as in tzaman's answer, but the reason your code doesn't replace more than one instance is that you're returning after the first replacement, so you'll never reach the next iteration of the while loop. You may have intended for the return line to be dedented one level. There are other problems with the code, but that's the cause of your immediate question. Commented Dec 7, 2017 at 16:22

2 Answers 2

4

Regular expressions are pretty handy here:

import re
sequence = "TTCP...."
highlighted = re.sub(r"(M\w*?_)", r" '\1' ", sequence)

# Output:
"TTCPTISPALGLAWS_DLGTLGF 'MSYSANTASGETLVSLYQLGLFEM_' VVSYGRTKYYLICP_LFHLSVGFVPSDGRRLTLY 'MPPARRLATKSRFLTPVISSG_' DKPRHNPVARSQFLNPLVRPNYSISASKSGLRLVLSYTRLSLGINSLPIERLQYSVPAPAQITP_IPEHGNARNFLPEWPRLLISEPAPSVNVPCSVFVVDPEHPKAHSKPDGIANRLTFRWRLIG_VFFHNAL_VITHGYSRVDILLPVSRALHVHLSKSLLLRSAWFTLRNTRVTGKPQTSKT_FDPKATRVHAIDACAE_QQH_PDSGLRFPAPGSCSEAIRQLMI"

Regex explanation:
We look for an M followed by any number of "word characters" \w* then an _, using the ? to make it a non-greedy match (otherwise it would just make one group from the first M to the last _).
The replacement is the matched group (\1 indicates "first group", there's only one), but surrounded by spaces and quotes.

Sign up to request clarification or add additional context in comments.

2 Comments

Wow, that worked, thank you so much! (ill give you the mark when it lets me). I still really need to learn all the stuff you can import, so thanks for teaching me another one :)
You're welcome! Regex is a very common and powerful tool for processing strings in all languages, not just Python, I'd highly recommend reading up a bit.
0

You just require little slice of 'slice' module , you don't need any external module :

Python string have a method called 'index' just use it.

string_1='TTCPTISPALGLAWS_DLGTLGFMSYSANTASGETLVSLYQLGLFEM_VVSYGRTKYYLICP_LFHLSVGFVPSD'

before=string_1.index('M')
after=string_1[before:].index('_')
print('{}  {} {}'.format(string_1[:before],string_1[before:before+after+1],string_1[before+after+1:]))

output:

TTCPTISPALGLAWS_DLGTLGF  MSYSANTASGETLVSLYQLGLFEM_ VVSYGRTKYYLICP_LFHLSVGFVPSD

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.