getting same regex groups inside a block of text

Question

I trying to write a pattern to get each CPNJ group inside a this block of text, but the condition is that, is needed starts with executados: and ends with a CNPJ group. But, my pattern always get the last group, I don't know what I should do for it's works.

The answer getting specific groups of patterns inside a block text does not works!

pattern: (?:executados\:)[\p{L}\s\D\d]+CNPJ\W+(?P<cnpj>\d+\.\d+\.\d+\/\d+-\d+)

string to test:

Dados dos executados:
1. FOO TEST STRING LTDA., CNPJ: 88.888.888/8888-88,
2. ANOTHER TEST STRING LTDA LTDA LTDA - ME, CNPJ: 99.999.999/9999-99,
3. FOO TEST STRING LTDA., CPF: 999.999.999-99,
4. FOO TEST STRING LTDA., CPF: 999.999.999-99.
Como medida de economia e celeridade processuais, atribuo a

I would to get the values {'cnpj': ['88.888.888/8888-88', '99.999.999/9999-99']}, this way is getting just the last.

@WiktorStribiżew I saw it, but I need that condition be respected, in this case, not get simple the CNPJ group, but, get all CNPJ group after executados: — Daniel Bailo
– Daniel Bailo, Commented Nov 22, 2021 at 18:05
Yes, and you get only those! Did you notice text[text.index("executados:"):])? — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Nov 22, 2021 at 18:10
hmm, sry, I saw it now! But, it's possible specift it in the pattern instead of code? — Daniel Bailo
– Daniel Bailo, Commented Nov 22, 2021 at 18:16
Only as TheFourthBird showed, with PyPi regex module. See this demo. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Nov 22, 2021 at 18:16

Wiktor Stribiżew · Accepted Answer · 2021-11-22 18:53:19Z

2

You can use PyPi regex module with the regex like

(?s)(?<=executados:.*?)CNPJ\W+(\d+\.\d+\.\d+/\d+-\d+)

See the regex demo.

Here is the Python demo:

import regex
text = """Dados dos executados:
1. FOO TEST STRING LTDA., CNPJ: 99.999.999/9999-99,
2. ANOTHER TEST STRING LTDA LTDA LTDA - ME, CNPJ: 99.999.999/9999-99,
3. FOO TEST STRING LTDA., CPF: 999.999.999-99,
4. FOO TEST STRING LTDA., CPF: 999.999.999-99.
Como medida de economia e celeridade processuais, atribuo a"""
print( regex.findall(r'(?s)(?<=executados:.*?)CNPJ\W+(\d+\.\d+\.\d+/\d+-\d+)', text) )

yielding

['99.999.999/9999-99', '99.999.999/9999-99']

The regex matches

(?s) - regex.DOTALL, enables . to match line break chars
(?<=executados:.*?) - right before the current location, there must be executados: and then any zero or more chars
CNPJ - a fixed string
\W+ - one or more non-word chars
(\d+\.\d+\.\d+/\d+-\d+) - the return value of regex.findall, Group 1: one or more digits and a . twice, then one or more digits, /, one or more digits, -` and one or more digits.

answered Nov 22, 2021 at 18:53

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Michael Lee Over a year ago

Module regex is great and definitely works in some situations. But, Python official modulere for RE handling has warned about the non-support for fixed-width lookbehind. It might be better to use fix-width lookbehind (i.e., ((?<=executados).)*), which is based on the official module re. It's a fact thatre has much more likely stable performance than any other counterparts, since cpython had 40k+ stars, while regex got merely dozens.

Collectives™ on Stack Overflow

getting same regex groups inside a block of text

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related