1

When using the regex .search() I found that it matches only the first time a pattern occurs in a string, and to find all the recurrence of that pattern in the string .findall() is needed.

So, my question is: giving two different strings that "talks" to each other, i need to find each occurrences of a specific pattern in a string, then grab the position of this pattern and take the elements in that positions from the first string, then print them or save in a new list.

To be more clear i'll provide an example:

ACGCUGAGAGGACGAUGCGGACGUGCUUAGGACGUUCACACGGUGGAAGUUCACAACAAGCAGACGACUCGCUGAGGAUCCGAGAUUGCUCGCGAUCGG

...((.((....(((..((....(((((.((((.(((((...))))).)))).....)))))..))..))))).))((((((((....)))).))))..

These are the two strings, first with letters, second with dots and brackets. The pattern I want to find, compiled by regex is "((.+))". Once the pattern is found on the second string, then grab the position of the pattern and return the correspective elements of string number one. With these input i'd expect 2 different output: CACGG and GAUUGC.

To date the code i have written is like: for line in file:

 if (line[0] == "A") or (line[0] == "C") or (line[0] == "T") or (line[0] == "G"): 
    apt.append(line) 
    count = count + 1 
 else: 
    line = line.strip() 
    pattern = "(\(\.+\))" 
    match = re.search(pattern, line) 
    if match: 
       loop.append(apt[count][match.start():match.end()]) 
    else: 
       continue

This obviously retrieves only the first match of the pattern that occurs in the second line of the file, giving only CACGG as output.

How can I modify the code in order to retrieve also the second occurrence of the pattern?

thankyou, any help appreciated

1 Answer 1

3

If you don't mind using re.finditer:

>>> import re

>>> str1 = "ACGCUGAGAGGACGAUGCGGACGUGCUUAGGACGUUCACACGGUGGAAGUUCACAACAAGCAGACGACUCGCUGAGGAUCCGAGAUUGCUCGCGAUCGG"
>>> str2 = "...((.((....(((..((....(((((.((((.(((((...))))).)))).....)))))..))..))))).))((((((((....)))).)))).."

>>> pat = re.compile(r"\([^()]+\)")

>>> for m in pat.finditer(str2):
...     print '%02d-%02d: %s' % (m.start(), m.end(), m.group())
...     print str1[m.start():m.end()]

38-43: (...)
CACGG
83-89: (....)
GAUUGC

ideone demo

The regex \([^()]+\) gets the part in parentheses that doesn't have any more parentheses inside. [^()] by the way is a negated class that doesn't match any parentheses.

You could also use the pattern: \(\.+\) by the way.


In your case, it could be something like:

if (line[0] == "A") or (line[0] == "C") or (line[0] == "T") or (line[0] == "G"): 
    apt.append(line) 
    count = count + 1 
else: 
    line = line.strip() 
    pattern = r"\(\.+\)" 
    for match in pattern.finditer(line):
        loop.append(apt[count][match.start():match.end()])

It will be faster if you compile the pattern before reading the file.

I cannot test this code, but here, keep in mind that each piece found will be appended to loop.

Sign up to request clarification or add additional context in comments.

4 Comments

i can apply this logic also to a file with multiple strings with letters, each followed by multiple strings with dot and brackets?
@Mojito88 Yes, you should be able to use this loop to do exactly that.
@Mojito88 Okay, I took your current code and edited it a bit.
Thanks you so much @Jerry. your answer was so helpful to me.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.