0

Given a regex like r'a (\w+) regex', I know I can capture the group, but given a captured group I want to then sub it back into the regex. I've included below a function I've built to do this, but because I'm no expert at regular expressions I'm wondering if there is a more standard implementation of such behavior, or what the "best practice" would be.

def reverse_capture(regex_string, args, kwargs):
    regex_string = str(regex_string)
    if not args and not kwargs :
        raise ValueError("at least one of args or kwargs must be empty in reverse_capture")
    if kwargs :
        for kwarg in kwargs :
            regex_string = re.sub(r'(?:[^\\[]|[^\\](?:\\\\)+|[^\\](?:\\\\)*\\\[)\(\?P<.+>.+(?:[^\\[]|[^\\](?:\\\\)+|[^\\](?:\\\\)*\\\[)\)',
                                  kwarg,
                                  regex_string)
    elif args :
        for arg in args :
            regex_string = re.sub(r'(?:[^\\[]|[^\\](?:\\\\)+|[^\\](?:\\\\)*\\\[)\(.+(?:[^\\[]|[^\\](?:\\\\)+|[^\\](?:\\\\)*\\\[)\)',
                                  arg,
                                  regex_string)
    else :
        return regex_string

Note: the above function doesn't actually work yet, because I figured before I try covering every single case I should ask on this site.

EDIT:

I think I should clarify what I mean a bit. My goal is to write a python function such that, given a regex like r"ab(.+)c" and an argument like, "Some strinG", we can have the following:

>>> def reverse_capture(r"ab(.+)c", "Some strinG")
"abSome strinGc"

That is to say, the argument will be substituted into the regex where the capture group is. There are definitely better ways to format strings; however, the regexes are given in my use case, so this is not an option.

For any one who's curious, what I'm trying to do is create a Django package that will use a template tag to find the regex associated to some view function or named url, optionally input some of arguments, and then check if the url from the template was accessed from matches the url generated by the tag. This will solve some navigation problems. There's a simpler package which does something similar, but it doesn't serve my use case.

Examples:

If reverse_capture is the function I'm trying to write, then here are some examples of input/output (I pass in the regexes as raw strings), as well as the function call:

reverse_capture : regex string -> regex input: a regex and a string output: the regex obtained by replacing the first capture group of regex which the argument, string.

examples:

>>> reverse_capture(r'(.+)', 'TEST')
'TEST'
>>> reverse_capture(r'a longer (.+) regex', 'TEST')
'a longer TEST regex'
>>> reverse_capture(r'regex with two (.+) capture groups(.+)', 'TEST')
'regex with two TEST capture groups(.+)'
12
  • Maybe there is a better way to do this, but between making sure that the entire expression isn't in brackets, that the parentheses you find are escaped, that their escaping characters aren't themselves escaped, etc... you can imagine that this gets a little messy! Commented Jun 21, 2014 at 0:11
  • Make a smaller example that does part of what you want to do. Asking people to look at this insane escaping when your intention is not clear is likely to get ignored. Commented Jun 21, 2014 at 0:50
  • Rather than trying to parse the regex to figure out where the capturing groups are, why not use string formatting to place text where the capturing groups need to go? Commented Jun 21, 2014 at 1:20
  • Why exactly do you want to do this, anyway? Do you want to use the result as a regex, or do you just want to get the full text the regex matched? For a match object match, match.group() is the matched text. Commented Jun 21, 2014 at 1:33
  • hi @user2357112, I've added an update which I hope will clarify somewhat. I do indeed want to use the result as a regex, and I definitely agree that string formatting is nicer, but unfortunately that won't work here. What I'm trying to do (which is described a bit in my edit) is essentially pull url regexes using a kind of reverse url lookup on in the Django platform (not the actual reverse url lookup), plug some arguments into those regexes, and then see if the url a template is being rendered from matches the new regex. It's pretty tied in to how the framework works. Commented Jun 21, 2014 at 4:30

2 Answers 2

3

Parsing regexes can be kind of complicated. Rather than trying to parse the regex to figure out where you need to substitute the matches, why not build the regex from a format string with convenient places to string-format the matches right in?

Here's an example template:

>>> regex_template = r'{} lives at {} Baker Street.'

We insert capturing groups to build the regex:

>>> import re
>>> word_group = r'(\w+)'
>>> digit_group = r'(\d+)'
>>> regex = regex_template.format(word_group, digit_group)

Match it against a string:

>>> groups = re.match(regex, 'Alfred lives at 325 Baker Street.').groups()
>>> groups
('Alfred', '325')

And string-format the matches into place:

>>> regex_template.format(*groups)
'Alfred lives at 325 Baker Street.'
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, I think this is a great solution for most use cases, but unfortunately I won't be constructing the regexes. While I could theoretically build the format strings in parallel with all my regexes, because I want to release this as a package I think it would save a lot of people time/code if I figure out to just do it on the regex itself.
0

For anyone coming across this question in the future, after I searched around, it appeared that there were no good library functions for substituting values into a regex's capture groups.

The easiest way to solve this problem/write your own function, is to make a DFA (Deterministic Finite Automaton), which isn't very hard.

If you are determined on solving it using regexes, then you can convert your DFA into a regex using answers to this question, which is how I ended up implementing my own solution.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.