2

I'm trying to match multiple patterns using regex sub grouping and replace the match with an asterisk for a data file that has similar format to the string below. However, I am getting only the desired results for the first match. The subsequent matches are consuming string that I did not expect. Is there a better approach to getting the desired output below?

    import re
    myString = '-fruit apple -number    123 -animal  cat  -name     bob'

    match = re.compile('(-fruit\s+)(\w+)|'
                       '(-animal\s+)(cat)|'
                       '(-name\s+)(bob)')
    print(match.sub('\g<1>*', myString))

Current Output:

-fruit * -number    123 *  *

Desired Output:

-fruit * -number    123 -animal  *  -name     *

1 Answer 1

3

Alternation does not reset the group numbers, thus your groups are numbered like (1)(2)|(3)(4)|(5)(6) but you do only reinsert group 1 - but should do so for groups 3 and 5 too. As non-matched groups are treated as empty string when replacing, you can simply add them to your pattern like \g<1>\g<2>\g<3>*.

On a sidenote I would recommend using raw strings when working with regex patterns (r'pattern'), so you do not have to wonder where to double backslash (e.g. \\b).

Sign up to request clarification or add additional context in comments.

3 Comments

That worked supberb! Thanks for the explanation and the raw string tip!
"Alternation does not reset the group numbers". Thank you for a clear and concise explanation.
"Alternation does not reset the group numbers". You can 'visualize' it: print(match.findall(myString))

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.