What is the best way to sub captured groups back into a python regex?

Question

Given a regex like r'a (\w+) regex', I know I can capture the group, but given a captured group I want to then sub it back into the regex. I've included below a function I've built to do this, but because I'm no expert at regular expressions I'm wondering if there is a more standard implementation of such behavior, or what the "best practice" would be.

def reverse_capture(regex_string, args, kwargs):
    regex_string = str(regex_string)
    if not args and not kwargs :
        raise ValueError("at least one of args or kwargs must be empty in reverse_capture")
    if kwargs :
        for kwarg in kwargs :
            regex_string = re.sub(r'(?:[^\\[]|[^\\](?:\\\\)+|[^\\](?:\\\\)*\\\[)\(\?P<.+>.+(?:[^\\[]|[^\\](?:\\\\)+|[^\\](?:\\\\)*\\\[)\)',
                                  kwarg,
                                  regex_string)
    elif args :
        for arg in args :
            regex_string = re.sub(r'(?:[^\\[]|[^\\](?:\\\\)+|[^\\](?:\\\\)*\\\[)\(.+(?:[^\\[]|[^\\](?:\\\\)+|[^\\](?:\\\\)*\\\[)\)',
                                  arg,
                                  regex_string)
    else :
        return regex_string

Note: the above function doesn't actually work yet, because I figured before I try covering every single case I should ask on this site.

EDIT:

I think I should clarify what I mean a bit. My goal is to write a python function such that, given a regex like r"ab(.+)c" and an argument like, "Some strinG", we can have the following:

>>> def reverse_capture(r"ab(.+)c", "Some strinG")
"abSome strinGc"

That is to say, the argument will be substituted into the regex where the capture group is. There are definitely better ways to format strings; however, the regexes are given in my use case, so this is not an option.

For any one who's curious, what I'm trying to do is create a Django package that will use a template tag to find the regex associated to some view function or named url, optionally input some of arguments, and then check if the url from the template was accessed from matches the url generated by the tag. This will solve some navigation problems. There's a simpler package which does something similar, but it doesn't serve my use case.

Examples:

If reverse_capture is the function I'm trying to write, then here are some examples of input/output (I pass in the regexes as raw strings), as well as the function call:

reverse_capture : regex string -> regex input: a regex and a string output: the regex obtained by replacing the first capture group of regex which the argument, string.

examples:

>>> reverse_capture(r'(.+)', 'TEST')
'TEST'
>>> reverse_capture(r'a longer (.+) regex', 'TEST')
'a longer TEST regex'
>>> reverse_capture(r'regex with two (.+) capture groups(.+)', 'TEST')
'regex with two TEST capture groups(.+)'

Maybe there is a better way to do this, but between making sure that the entire expression isn't in brackets, that the parentheses you find are escaped, that their escaping characters aren't themselves escaped, etc... you can imagine that this gets a little messy! — Nick
– Nick, Commented Jun 21, 2014 at 0:11
Make a smaller example that does part of what you want to do. Asking people to look at this insane escaping when your intention is not clear is likely to get ignored. — msw
– msw, Commented Jun 21, 2014 at 0:50
Rather than trying to parse the regex to figure out where the capturing groups are, why not use string formatting to place text where the capturing groups need to go? — user2357112
– user2357112, Commented Jun 21, 2014 at 1:20
Why exactly do you want to do this, anyway? Do you want to use the result as a regex, or do you just want to get the full text the regex matched? For a match object match, match.group() is the matched text. — user2357112
– user2357112, Commented Jun 21, 2014 at 1:33
hi @user2357112, I've added an update which I hope will clarify somewhat. I do indeed want to use the result as a regex, and I definitely agree that string formatting is nicer, but unfortunately that won't work here. What I'm trying to do (which is described a bit in my edit) is essentially pull url regexes using a kind of reverse url lookup on in the Django platform (not the actual reverse url lookup), plug some arguments into those regexes, and then see if the url a template is being rendered from matches the new regex. It's pretty tied in to how the framework works. — Nick
– Nick, Commented Jun 21, 2014 at 4:30

user2357112 · Accepted Answer · 2014-06-21 01:27:28Z

3

Parsing regexes can be kind of complicated. Rather than trying to parse the regex to figure out where you need to substitute the matches, why not build the regex from a format string with convenient places to string-format the matches right in?

Here's an example template:

>>> regex_template = r'{} lives at {} Baker Street.'

We insert capturing groups to build the regex:

>>> import re
>>> word_group = r'(\w+)'
>>> digit_group = r'(\d+)'
>>> regex = regex_template.format(word_group, digit_group)

Match it against a string:

>>> groups = re.match(regex, 'Alfred lives at 325 Baker Street.').groups()
>>> groups
('Alfred', '325')

And string-format the matches into place:

>>> regex_template.format(*groups)
'Alfred lives at 325 Baker Street.'

answered Jun 21, 2014 at 1:27

user2357112

286k32 gold badges490 silver badges571 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Nick Over a year ago

Thanks, I think this is a great solution for most use cases, but unfortunately I won't be constructing the regexes. While I could theoretically build the format strings in parallel with all my regexes, because I want to release this as a package I think it would save a lot of people time/code if I figure out to just do it on the regex itself.

Community · Accepted Answer · 2017-04-13 12:48:30Z

0

For anyone coming across this question in the future, after I searched around, it appeared that there were no good library functions for substituting values into a regex's capture groups.

The easiest way to solve this problem/write your own function, is to make a DFA (Deterministic Finite Automaton), which isn't very hard.

If you are determined on solving it using regexes, then you can convert your DFA into a regex using answers to this question, which is how I ended up implementing my own solution.

edited Apr 13, 2017 at 12:48

CommunityBot

11 silver badge

answered Oct 3, 2014 at 20:19

Nick

1761 silver badge3 bronze badges

Collectives™ on Stack Overflow

What is the best way to sub captured groups back into a python regex?

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related