0

I'm trying to get this values - 10.547.889/0001-85, 00.219.460/0001-05 separated by groups, but the condition is that the pattern need start with executada(s):, can't be something like: r' - CNPJ:? (?P<cnpj>\d+\.\d+\.\d+\/\d+-\d+)'. So, the idea is start in executada(s) and get this groups.

Currently, my pattern just get the first group, I don't know how to get all them.

I'm using Python 3.8.5 and regex lib(doesn't re).

text = """
Solicite-se ao BANCO CENTRAL, via protocolo digital - SISBACEN ,
o BLOQUEIO de créditos existentes até o limite de R$ 30.257,45 (trinta mil, duzentos e
cinquenta e sete reais e quarenta e cinco centavos) da(s) executada(s): J.HENRIQUE
GALVANI COMERCIO DE ROUPAS - ME - CNPJ 10.547.889/0001-85, Riane Confecções de
Roupas Ltda - ME - CNPJ: 00.219.460/0001-05, Jose Henrique Galvani - CPF: 234.846.406-34
e Heliane Leonel Raymundo Galvani - CPF: 813.460.347-53, porventura
existentes junto a instituições financeiras, incluindo cartões de crédito, agenciadores
de pagamento, administradores de consórcio."""

pattern = r'executad\w(?:\(s\))?\W+(?:[\p{L}\s\-\.]+CNPJ\W+(?P<cnpj>\d+\.\d+\.\d+\/\d+-\d+),)+'

for item in regex.finditer(pattern, text, flags=regex.I|regex.S):
    print(item.groupdict())

{'cnpj': '00.219.460/0001-05'}

I was waiting for:

{'cnpj': '00.219.460/0001-05'}

{'cnpj': '10.547.889/0001-85'}

So, can someone help me with this trouble?

2
  • Note that flags=regex.S|regex.S = flags=regex.S and it is redundant in your regex. Commented Oct 19, 2021 at 20:48
  • oopss, I was supposed to put flags=regex.I|regex.S. Thanks for warn Commented Oct 20, 2021 at 13:18

2 Answers 2

1

Using the regex module, you could make use of the \G anchor:

(?:executad\w(?:\(s\))?\W+|\G(?!^)),?[\p{L}\s.-]+CNPJ\W+\K(?P<cnpj>\d+\.\d+\.\d+/\d+-\d+)

In parts, the pattern matches:

  • (?: Non capture group
    • executad\w Match executad, a word char (which could also be an a char if that is the only possibility)
    • (?:\(s\))?\W+ Optionally match (s) and 1+ non word chars
    • | Or
    • \G(?!^) Assert the current postion at the end of the previous match, but not at the start of the string
  • ) Close non capture group
  • ,?[\p{L}\s.-]+ Match an optional , and 1+ times any letter, whitespace char, . or -
  • CNPJ\W+ Match CNPJ and 1+ times non word chars
  • \K Clear the match buffer to forget what is matched so far
  • (?P<cnpj>\d+\.\d+\.\d+/\d+-\d+) Named group cnpj, capture the desired format

Regex demo | Python demo

For the example data, you can omit the regex.S flag as \W also matches a newline.

import regex

pattern = r"(?:executad\w(?:\(s\))?\W+|\G(?!^)),?[\p{L}\s.-]+CNPJ\W+\K(?P<cnpj>\d+\.\d+\.\d+/\d+-\d+)"

text = ("Solicite-se ao BANCO CENTRAL, via protocolo digital - SISBACEN ,\n"
    "o BLOQUEIO de créditos existentes até o limite de R$ 30.257,45 (trinta mil, duzentos e\n"
    "cinquenta e sete reais e quarenta e cinco centavos) da(s) executada(s): J.HENRIQUE\n"
    "GALVANI COMERCIO DE ROUPAS - ME - CNPJ 10.547.889/0001-85, Riane Confecções de\n"
    "Roupas Ltda - ME - CNPJ: 00.219.460/0001-05, Jose Henrique Galvani - CPF: 234.846.406-34\n"
    "e Heliane Leonel Raymundo Galvani - CPF: 813.460.347-53, porventura\n"
    "existentes junto a instituições financeiras, incluindo cartões de crédito, agenciadores\n"
    "de pagamento, administradores de consórcio.")

for item in regex.finditer(pattern, text):
    print(item.groupdict())

Output

{'cnpj': '10.547.889/0001-85'}
{'cnpj': '00.219.460/0001-05'}
Sign up to request clarification or add additional context in comments.

1 Comment

Perfect, exactly what I need! Thanks bro!
0

Check if this works for you:

text = """
Solicite-se ao BANCO CENTRAL, via protocolo digital - SISBACEN ,
o BLOQUEIO de créditos existentes até o limite de R$ 30.257,45 (trinta mil, duzentos e
cinquenta e sete reais e quarenta e cinco centavos) da(s) executada(s): J.HENRIQUE
GALVANI COMERCIO DE ROUPAS - ME - CNPJ 10.547.889/0001-85, Riane Confecções de
Roupas Ltda - ME - CNPJ: 00.219.460/0001-05, Jose Henrique Galvani - CPF: 234.846.406-34
e Heliane Leonel Raymundo Galvani - CPF: 813.460.347-53, porventura
existentes junto a instituições financeiras, incluindo cartões de crédito, agenciadores
de pagamento, administradores de consórcio."""

pattern = r'[0-9]{2}\.?[0-9]{3}\.?[0-9]{3}\/?[0-9]{4}\-?[0-9]{2}'

# cut text to start right after executada(s)
text = text.split("executada(s)")[1]

cnpjs = [{"cnpj": cnpj} for cnpj in regex.findall(pattern, text)]

print(cnpjs)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.