String substitutions based on the matching object (Python)

Question

I struggle to understand the group method in Python's regular expressions library. In this context, I try to do substitutions on a string depending on the matching object.

That is, I want to replace the matched objects (+ and \n in this example) with a particular string in the my_dict dictionary (with rep1 and rep2 respectively).

As seen from this question and answer, I have tried this:

content = '''
Blah - blah \n blah * blah + blah.
'''

regex = r'[+\-*/]'

for mobj in re.finditer(regex, content):
    t = mobj.lastgroup
    v = mobj.group(t)

    new_content = re.sub(regex, repl_func(mobj), content)

def repl_func(mobj):
    my_dict = { '+': 'rep1', '\n': 'rep2'}
    try:
        match = mobj.group(0)
    except AttributeError:
        match = ''
    else:
        return my_dict.get(match, '')

print(new_content)

But I get None for t followed by an IndexError when computing v.

Any explanations and example code would be appreciated.

It's hard to guess what your code is supposed to do (there's many syntactic errors, indentation is broken, logic unclear). Better you provide an example describing what you'd like to achieve. — TomR8
– TomR8, Commented Nov 24, 2016 at 15:19
@TomR8 Apologies! I fixed all syntax issues & typos (hopefully). — user6167676
– user6167676, Commented Nov 24, 2016 at 15:44

TomR8 · Accepted Answer · 2016-11-24 22:29:29Z

Despite of Wiktor's truly pythonic answer, there's still the question why the OP's orginal algorithm wouldn't work. Basically there are 2 issues:

The call of new_content = re.sub(regex, repl_func(mobj), content) will substitute all matches of regex with the replacement value of the very first match.

The correct call has to be new_content = re.sub(regex, repl_func, content). As documented here, repl_func gets invoked dynamically with the current match object!

repl_func(mobj) does some unnecessary exception handling, which can be simplified:

my_dict = {'\n': '', '+':'rep1', '*':'rep2', '/':'rep3', '-':'rep4'}
def repl_func(mobj):
    global my_dict
    return my_dict.get(mobj.group(0), '')

This is equivalent to Wiktor's solution - he just got rid of the function definition itself by using a lambda expression.

With this modification, the for mobj in re.finditer(regex, content): loop has become superfluos, as it does the same calculation multiple times.

Just for the sake of completeness here is a working solution using re.finditer(). It builds the result string from the matched slices of content:

my_regx = r'[\n+*/-]'
my_dict = {'\n': '', '+':'rep1'     , '*':'rep2', '/':'rep3', '-':'rep4'}
content = "A*B+C-D/E"
res = ""
cbeg = 0
for mobj in re.finditer(my_regx, content):
    # get matched string and its slice indexes
    mstr = mobj.group(0)
    mbeg = mobj.start()
    mend = mobj.end()

    # replace matched string
    mrep = my_dict.get(mstr, '')

    # append non-matched part of content plus replacement
    res += content[cbeg:mbeg] + mrep

    # set new start index of remaining slice
    cbeg = mend

# finally add remaining non-matched slice
res += content[cbeg:]
print (res)

Wiktor Stribiżew · Accepted Answer · 2016-11-24 17:23:01Z

The r'[+\-*/]' regex does not match a newline, so your '\n': 'rep2' would not be used. Else, add \n to the regex: r'[\n+*/-]'.

Next, you get None because your regex does not contain any named capturing groups, see re docs:

match.lastgroup
The name of the last matched capturing group, or None if the group didn’t have a name, or if no group was matched at all.

To replace using the match, you do not even need to use re.finditer, use re.sub with a lambda as the replacement:

import re
content = '''
Blah - blah \n blah * blah + blah.
'''

regex = r'[\n+*/-]'
my_dict = { '+': 'rep1', '\n': 'rep2'}
new_content = re.sub(regex, lambda m: my_dict.get(m.group(),""), content)
print(new_content)
# => rep2Blah  blah rep2 blah  blah rep1 blah.rep2

See the Python demo

The m.group() gets the whole match (the whole match is stored in match.group(0)). If you had a pair of unescaped parentheses in the pattern, it would create a capturing group and you could access the first one with m.group(1), etc.

Collectives™ on Stack Overflow

String substitutions based on the matching object (Python)

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related