0
String = n76a+q80a+l83a+i153a+l203f+r207a+s211a+s215w+f216a+e283l

I want the script to look at a pair at a time meaning:

evaluate n76a+q80a. if abs(76-80) < 10, then replace '+' with a '_': else don't change anything. Then evaluate q80a+l83a next and do the same thing.

The desired output should be:

n76a_q80a_l83a+i153a+l203f_r207a_s211a_s215w_f216a+e283l

What i tried is,

def aa_dist(x):
if abs(int(x[1:3]) - int(x[6:8])) < 10:
    print re.sub(r'\+', '_', x)

with open(input_file, 'r') as alex:
    oligos_list = alex.read()
    aa_dist(oligos_list)

This is what I have up to this point. I know that my code will just replace all '+' into '_' because it only evaluates the first pair and and replace all. How should I do this?

3
  • is it always '+' and lowercase letters? Commented Feb 11, 2015 at 23:50
  • Yes. that is always the case. Commented Feb 11, 2015 at 23:57
  • i think the index value would change in case of i153a+l203f Commented Feb 11, 2015 at 23:59

2 Answers 2

2
import itertools,re

my_string =  "n76a+q80a+l83a+i153a+l203f+r207a+s211a+s215w+f216a+e283l"
#first extract the numbers    
my_numbers = map(int,re.findall("[0-9]+",my_string))
#split the string on + (useless comment)
parts = my_string.split("+")

def get_filler((a,b)):
    '''this method decides on the joiner'''
    return "_" if abs(a-b) < 10 else '+'

fillers = map(get_filler,zip(my_numbers,my_numbers[1:])) #figure out what fillers we need
print "".join(itertools.chain.from_iterable(zip(parts,fillers)))+parts[-1] #it will always skip the last part so gotta add it

is one way you might accomplish this... and is also an example of worthless comments

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks! I am really new to programming in general and I'm not that familiar with the itertool module. Could you please explain a little more what the following two lines are doing exactly? fillers = map(get_filler,zip(my_numbers,my_numbers[1:])) #figure out what fillers we need print "".join(itertools.chain.from_iterable(zip(parts,fillers)))+parts[-1]
itertools.chain simply takes a 2d list and flattens it ... it is one of many ways to do that ... the line above zips the list of numbers with itself to get pairs of adjacent numbers and maps them to a function that decides on + or _
1

Through re module only.

>>> s = 'n76a+q80a+l83a+i153a+l203f+r207a+s211a+s215w+f216a+e283l'
>>> m = re.findall(r'(?=\b([^+]+\+[^+]+))', s)               # This regex would helps to do a overlapping match. See the  demo (https://regex101.com/r/jO6zT2/13)
>>> m
['n76a+q80a', 'q80a+l83a', 'l83a+i153a', 'i153a+l203f', 'l203f+r207a', 'r207a+s211a', 's211a+s215w', 's215w+f216a', 'f216a+e283l']
>>> l = []
>>> for i in m:
        if abs(int(re.search(r'^\D*(\d+)', i).group(1)) -    int(re.search(r'^\D*\d+\D*(\d+)', i).group(1))) < 10:
            l.append(i.replace('+', '_'))
        else:
            l.append(i)
>>> re.sub(r'([a-z0-9]+)\1', r'\1',''.join(l))
'n76a_q80a_l83a+i153a+l203f_r207a_s211a_s215w_f216a+e283l'

By defining a separate function.

import re
def aa_dist(x):
    l = []
    m = re.findall(r'(?=\b([^+]+\+[^+]+))', x)
    for i in m:
        if abs(int(re.search(r'^\D*(\d+)', i).group(1)) - int(re.search(r'^\D*\d+\D*(\d+)', i).group(1))) < 10:
            l.append(i.replace('+', '_'))
        else:
            l.append(i)
    return re.sub(r'([a-z0-9]+)\1', r'\1',''.join(l))

string = 'n76a+q80a+l83a+i153a+l203f+r207a+s211a+s215w+f216a+e283l'
print  aa_dist(string)  

Output:

n76a_q80a_l83a+i153a+l203f_r207a_s211a_s215w_f216a+e283l

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.