2

I have a regex like --

query = "(A((hh)|(hn)|(n))?)"

and an input inp = "Ahhwps edAn". I want to extract all the matched pattern along with unmatched(remaining) but with preserving order of the input. The output should look like -- ['Ahh', 'wps ed', 'An'] or ['Ahh', 'w', 'p', 's', ' ', 'e', 'd', 'An']. I had searched online but found nothing. How can I do this?

1
  • @WiktorStribiżew, Thanks a lot. It worked like a charm. Commented Sep 6, 2017 at 18:05

3 Answers 3

2

The re.split method may output captured submatches in the resulting array.

Capturing groups are those constructs that are formed with a pair of unescaped parentheses. Your pattern abounds in redundant capturing groups, and re.split will return all of them. You need to remove those unnecessary ones, and convert all capturing groups to non-capturing ones, and just keep the outer pair of parentheses to make the whole pattern a single capturing group.

Use

re.split(r'(A(?:hh|hn|n)?)', s)

Note that there may be an empty element in the output list. Just use filter(None, result) to get rid of the empty values.

Sign up to request clarification or add additional context in comments.

1 Comment

I observed empty strings and removed them. Thanks for suggesting filter method.
2

The match objects' span() method is really useful for what you're after.

import re

pat = re.compile("(A((hh)|(hn)|(n))?)")
inp = "Ahhwps edAn"

result=[]
i=k=0
for m in re.finditer(pat,inp):
    j,k=m.span()
    if i<j:
        result.append(inp[i:j])
    result.append(inp[j:k])
    i=k
if i<len(inp):
    result.append(inp[k:])

print result

Here's what the output looks like.

['Ahh', 'wps ed', 'An']

This technique handles any non-matching prefix and suffix text as well. If you use an inp value of "prefixAhhwps edAnsuffix", you'll get the output I think you'd want:

['prefix', 'Ahh', 'wps ed', 'An', 'suffix']

Comments

0

You can try this:

import re
import itertools
new_data = list(itertools.chain.from_iterable([re.findall(".{"+str(len(i)/2)+"}", i) for i in inp.split()]))

Output:

['Ahh', 'wps', 'ed', 'An']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.