Python: How to use regex to find a repetitive string

Question

I have some data that I want to extract/ output when a keyword is found in the block of data. How can I retrieve all the data from the first '#' to the last ')' using regular expression?

//Log_1.txt
# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE 
(some_ID = [12345] reportChange::someMoreInfo called with invalid some ID)

# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE 
(some_ID = [12345] failed::someMoreInfo called with invalid some ID)

CODE

import re

with open("Log_1.txt", 'r') as f:
    result = re.search('#(.*)#', f.read())

print(result.group(0))

This isn't all of my code but if the keyword is "reportChange", the output should be >>>

# DON'T WANT #
  .
  .
  .
(some_ID = [12345] reportChange::someMoreInfo called with invalid some ID)

instead of

# DON'T WANT #

Yes, I want it right now but eventually plan to not have that. @depperm — omgsyd28
– omgsyd28, Commented Jul 16, 2019 at 14:21
if the keyword is someMoreInfo do you want the whole log or just from the nearest comment DON'T WANT — depperm
– depperm, Commented Jul 16, 2019 at 14:32
Yes I want the entire block of data the keyword is in and separated by the empty new line @depperm — omgsyd28
– omgsyd28, Commented Jul 16, 2019 at 14:36

depperm · Accepted Answer · 2019-07-16 14:34:45Z

2

Assuming you want from the latest # DON'T WANT # you can use the regex #(.*)#[^)]+yourKeyWordHere[^)]+\). In python you can use string formatting and have {} in place of the keyword to replace with whatever word you want.

import re

keyword='reportChange'

with open("Log_1.txt", 'r') as f:
    result = re.search('#(.*)#[^)]+{}[^)]+\)'.format(keyword), f.read())

print(result.group(0))

answered Jul 16, 2019 at 14:34

depperm

10.8k4 gold badges46 silver badges68 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

omgsyd28 Over a year ago

Thanks, that solution worked, but I need it to output the block of data each time the keyword is found. Not just the first instance or the last instance.

Patrick · Accepted Answer · 2019-07-16 14:36:57Z

1

As regular expression you have to use a negative lookahead, as well as a negative lookbehind.

Try this: (?!#).*(?<![)]) as regex. It should output everything between # and ).

For the future: Use regex101.com to test your regular expressions.

answered Jul 16, 2019 at 14:36

Patrick

1491 gold badge1 silver badge12 bronze badges

Comments

Andrej Kesely · Accepted Answer · 2019-07-16 14:41:31Z

This code prints only blocks of data where there's reportChange::someMoreInfo called with invalid some ID:

data = '''//Log_1.txt
# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE
(some_ID = [12345] reportChange::someMoreInfo called with invalid some ID)

# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE
(some_ID = [12345] failed::someMoreInfo called with invalid some ID)

# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE
(some_ID = [12345xxx] reportChange::someMoreInfo called with invalid some ID)
'''

import re

for d in re.split(r'\n\n', data):
    g = re.findall(r'^# DON\'T WANT #.*reportChange::someMoreInfo called with invalid some ID\)$', d, flags=re.M|re.DOTALL)
    if g:
        print(g[0])
        print()

Prints:

# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE
(some_ID = [12345] reportChange::someMoreInfo called with invalid some ID)

# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE
(some_ID = [12345xxx] reportChange::someMoreInfo called with invalid some ID)

Collectives™ on Stack Overflow

Python: How to use regex to find a repetitive string

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related