Searching multiple times a pattern in a string via regex in python

Question

When using the regex .search() I found that it matches only the first time a pattern occurs in a string, and to find all the recurrence of that pattern in the string .findall() is needed.

So, my question is: giving two different strings that "talks" to each other, i need to find each occurrences of a specific pattern in a string, then grab the position of this pattern and take the elements in that positions from the first string, then print them or save in a new list.

To be more clear i'll provide an example:

ACGCUGAGAGGACGAUGCGGACGUGCUUAGGACGUUCACACGGUGGAAGUUCACAACAAGCAGACGACUCGCUGAGGAUCCGAGAUUGCUCGCGAUCGG

...((.((....(((..((....(((((.((((.(((((...))))).)))).....)))))..))..))))).))((((((((....)))).))))..

These are the two strings, first with letters, second with dots and brackets. The pattern I want to find, compiled by regex is "((.+))". Once the pattern is found on the second string, then grab the position of the pattern and return the correspective elements of string number one. With these input i'd expect 2 different output: CACGG and GAUUGC.

To date the code i have written is like: for line in file:

 if (line[0] == "A") or (line[0] == "C") or (line[0] == "T") or (line[0] == "G"): 
    apt.append(line) 
    count = count + 1 
 else: 
    line = line.strip() 
    pattern = "(\(\.+\))" 
    match = re.search(pattern, line) 
    if match: 
       loop.append(apt[count][match.start():match.end()]) 
    else: 
       continue

This obviously retrieves only the first match of the pattern that occurs in the second line of the file, giving only CACGG as output.

How can I modify the code in order to retrieve also the second occurrence of the pattern?

thankyou, any help appreciated

Jerry · Accepted Answer · 2014-02-03 16:12:28Z

3

If you don't mind using re.finditer:

>>> import re

>>> str1 = "ACGCUGAGAGGACGAUGCGGACGUGCUUAGGACGUUCACACGGUGGAAGUUCACAACAAGCAGACGACUCGCUGAGGAUCCGAGAUUGCUCGCGAUCGG"
>>> str2 = "...((.((....(((..((....(((((.((((.(((((...))))).)))).....)))))..))..))))).))((((((((....)))).)))).."

>>> pat = re.compile(r"\([^()]+\)")

>>> for m in pat.finditer(str2):
...     print '%02d-%02d: %s' % (m.start(), m.end(), m.group())
...     print str1[m.start():m.end()]

38-43: (...)
CACGG
83-89: (....)
GAUUGC

ideone demo

The regex \([^()]+\) gets the part in parentheses that doesn't have any more parentheses inside. [^()] by the way is a negated class that doesn't match any parentheses.

You could also use the pattern: \(\.+\) by the way.

In your case, it could be something like:

if (line[0] == "A") or (line[0] == "C") or (line[0] == "T") or (line[0] == "G"): 
    apt.append(line) 
    count = count + 1 
else: 
    line = line.strip() 
    pattern = r"\(\.+\)" 
    for match in pattern.finditer(line):
        loop.append(apt[count][match.start():match.end()])

It will be faster if you compile the pattern before reading the file.

I cannot test this code, but here, keep in mind that each piece found will be appended to loop.

edited Feb 3, 2014 at 16:12

answered Feb 3, 2014 at 15:57

Jerry

71.9k14 gold badges106 silver badges148 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Mojito88 Over a year ago

i can apply this logic also to a file with multiple strings with letters, each followed by multiple strings with dot and brackets?

Jerry Over a year ago

@Mojito88 Yes, you should be able to use this loop to do exactly that.

Jerry Over a year ago

@Mojito88 Okay, I took your current code and edited it a bit.

Harish Thalluri Over a year ago

Thanks you so much @Jerry. your answer was so helpful to me.

Collectives™ on Stack Overflow

Searching multiple times a pattern in a string via regex in python

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related