1

I need to capture the text from the \textbf{} command, \textbf will have multiple nested braces like below

\textbf{adadasas}

\textbf{adadasas \textit{xxx} adasda {xxx}}

\textbf{adadasas {} {} {} dxxxx}

i want to capture the value inside the \textbf{...}

i tried with the regex in python {([^{}]*+(?:(?R)[^{}]*)*+)} (from: Recursive pattern in regex)

x = regex.findall(r'\\textbf{([^{}]*+(?:(?R)[^{}]*)*+)}',cnt)

i am not getting all the value. when removing the text \\textbf in the regex it is capture all the occurances.

Please suggest how to write a regex for the one

1
  • What is the expected result for each of the examples? Commented Jul 8, 2022 at 10:24

1 Answer 1

4

You can repeat the first capture group (?1) instead of repeating the whole pattern with (?R) and capture what is inside the {} with group 2

\\textbf({([^{}]*+(?:(?1)[^{}]*)*+)})
  • \\textbf Match \textbf
  • ( Capture group 1
    • { Match a { char
    • ( Capture group 2
      • [^{}]*+ Optionally match any char except { } with a possessive quantifier
        • (?: Non capture group to match as a whole
          • (?1)[^{}]* Recurse the first subroutine and optionally match any char except curly's
        • )*+ Close the non capture group and optionally repeat using a possessive quantifier
    • ) Close group 2
    • } Match a } char
  • ) Close group 1

Regex demo

Note that if you use re.findall, you will get all values of the capture groups returned, and this pattern has 2 capture groups.

You can use re.finditer instead and get the group 2 value:

import regex

pattern = r"\\textbf({([^{}]*+(?:(?1)[^{}]*)*+)})"

cnt = ("\\textbf{adadasas}\n"
            "\\textbf{adadasas \\textit{xxx} adasda {xxx}}\n"
            "\\textbf{adadasas {} {} {} dxxxx}\n"
            "{adadasas {} {} {} dxxxx}")

matches = regex.finditer(pattern, cnt)

for _, match in enumerate(matches, start=1):
    print(match.group(2))

Output

adadasas
adadasas \textit{xxx} adasda {xxx}
adadasas {} {} {} dxxxx
Sign up to request clarification or add additional context in comments.

2 Comments

OP needs to understand that an extra group around the curly braces in the pattern is required, it is not the first group in the original pattern that is recursed.
@WiktorStribiżew Yes, I have added a breakdown to show where the groups are.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.