Python inserting spaces in string

Question

Alright, I'm working on a little project for school, a 6-frame translator. I won't go into too much detail, I'll just describe what I wanted to add. The normal output would be something like:

TTCPTISPALGLAWS_DLGTLGFMSYSANTASGETLVSLYQLGLFEM_VVSYGRTKYYLICP_LFHLSVGFVPSD

The important part of this string are the M and the _ (the start and stop codons, biology stuff). What I wanted to do was highlight these like so:

TTCPTISPALGLAWS_DLGTLGF 'MSYSANTASGETLVSLYQLGLFEM_' VVSYGRTKYYLICP_LFHLSVGFVPSD

Now here is where (for me) it gets tricky, I got my output to look like this (adding a space and a ' to highlight the start and stop). But it only does this once, for the first start and stop it finds. If there are any other M....._ combinations it won't highlight them.

Here is my current code, attempting to make it highlight more than once:

def start_stop(translation):
index_2 = 0
while True:
    if 'M' in translation[index_2::1]:
        index_1 = translation[index_2::1].find('M')
        index_2 = translation[index_1::1].find('_') + index_1
        new_translation = translation[:index_1] + " '" + \
                          translation[index_1:index_2 + 1] + "' " +\
                          translation[index_2 + 1:]
    else:
        break
    return new_translation

I really thought this would do it, guess not. So now I find myself being stuck. If any of you are willing to try and help, here is a randomly generated string with more than one M....._ set:

'TTCPTISPALGLAWS_DLGTLGFMSYSANTASGETLVSLYQLGLFEM_VVSYGRTKYYLICP_LFHLSVGFVPSDGRRLTLYMPPARRLATKSRFLTPVISSG_DKPRHNPVARSQFLNPLVRPNYSISASKSGLRLVLSYTRLSLGINSLPIERLQYSVPAPAQITP_IPEHGNARNFLPEWPRLLISEPAPSVNVPCSVFVVDPEHPKAHSKPDGIANRLTFRWRLIG_VFFHNAL_VITHGYSRVDILLPVSRALHVHLSKSLLLRSAWFTLRNTRVTGKPQTSKT_FDPKATRVHAIDACAE_QQH_PDSGLRFPAPGSCSEAIRQLMI'

Thank you to anyone willing to help :)

wouldn't it be easier to have a boolean which indicates that you have started (encountered an M) and a boolean which indicates when you have ended (encountered a '. So instead of while true just iterate over the string, and with a play of if statements and the booleans you can easily know when you are in a sequence, and when it finishes, the booleans help you keep track of what is happening, so that way you can do not just 1 sequence, but many. Hope this makes sense — N. Ivanov
– N. Ivanov, Commented Dec 7, 2017 at 16:13
I think this is a good application of regex as in tzaman's answer, but the reason your code doesn't replace more than one instance is that you're returning after the first replacement, so you'll never reach the next iteration of the while loop. You may have intended for the return line to be dedented one level. There are other problems with the code, but that's the cause of your immediate question. — glibdud
– glibdud, Commented Dec 7, 2017 at 16:22

tzaman · Accepted Answer · 2017-12-07 16:25:00Z

4

Regular expressions are pretty handy here:

import re
sequence = "TTCP...."
highlighted = re.sub(r"(M\w*?_)", r" '\1' ", sequence)

# Output:
"TTCPTISPALGLAWS_DLGTLGF 'MSYSANTASGETLVSLYQLGLFEM_' VVSYGRTKYYLICP_LFHLSVGFVPSDGRRLTLY 'MPPARRLATKSRFLTPVISSG_' DKPRHNPVARSQFLNPLVRPNYSISASKSGLRLVLSYTRLSLGINSLPIERLQYSVPAPAQITP_IPEHGNARNFLPEWPRLLISEPAPSVNVPCSVFVVDPEHPKAHSKPDGIANRLTFRWRLIG_VFFHNAL_VITHGYSRVDILLPVSRALHVHLSKSLLLRSAWFTLRNTRVTGKPQTSKT_FDPKATRVHAIDACAE_QQH_PDSGLRFPAPGSCSEAIRQLMI"

Regex explanation:
We look for an M followed by any number of "word characters" \w* then an _, using the ? to make it a non-greedy match (otherwise it would just make one group from the first M to the last _).
The replacement is the matched group (\1 indicates "first group", there's only one), but surrounded by spaces and quotes.

edited Dec 7, 2017 at 16:25

answered Dec 7, 2017 at 16:14

tzaman

48k11 gold badges93 silver badges118 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

GotYa Over a year ago

Wow, that worked, thank you so much! (ill give you the mark when it lets me). I still really need to learn all the stuff you can import, so thanks for teaching me another one :)

tzaman Over a year ago

You're welcome! Regex is a very common and powerful tool for processing strings in all languages, not just Python, I'd highly recommend reading up a bit.

Aaditya Ura · Accepted Answer · 2017-12-07 16:39:51Z

0

You just require little slice of 'slice' module , you don't need any external module :

Python string have a method called 'index' just use it.

string_1='TTCPTISPALGLAWS_DLGTLGFMSYSANTASGETLVSLYQLGLFEM_VVSYGRTKYYLICP_LFHLSVGFVPSD'

before=string_1.index('M')
after=string_1[before:].index('_')
print('{}  {} {}'.format(string_1[:before],string_1[before:before+after+1],string_1[before+after+1:]))

output:

TTCPTISPALGLAWS_DLGTLGF  MSYSANTASGETLVSLYQLGLFEM_ VVSYGRTKYYLICP_LFHLSVGFVPSD

answered Dec 7, 2017 at 16:39

Aaditya Ura

12.8k7 gold badges60 silver badges96 bronze badges

Collectives™ on Stack Overflow

Python inserting spaces in string

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related