0

I have some data that I want to extract/ output when a keyword is found in the block of data. How can I retrieve all the data from the first '#' to the last ')' using regular expression?

//Log_1.txt
# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE 
(some_ID = [12345] reportChange::someMoreInfo called with invalid some ID)

# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE 
(some_ID = [12345] failed::someMoreInfo called with invalid some ID)

CODE

import re

with open("Log_1.txt", 'r') as f:
    result = re.search('#(.*)#', f.read())

print(result.group(0))

This isn't all of my code but if the keyword is "reportChange", the output should be >>>

# DON'T WANT #
  .
  .
  .
(some_ID = [12345] reportChange::someMoreInfo called with invalid some ID)

instead of

# DON'T WANT #
4
  • so you want DON'T WANT .... seems like a bad comment Commented Jul 16, 2019 at 14:20
  • Yes, I want it right now but eventually plan to not have that. @depperm Commented Jul 16, 2019 at 14:21
  • if the keyword is someMoreInfo do you want the whole log or just from the nearest comment DON'T WANT Commented Jul 16, 2019 at 14:32
  • Yes I want the entire block of data the keyword is in and separated by the empty new line @depperm Commented Jul 16, 2019 at 14:36

3 Answers 3

2

Assuming you want from the latest # DON'T WANT # you can use the regex #(.*)#[^)]+yourKeyWordHere[^)]+\). In python you can use string formatting and have {} in place of the keyword to replace with whatever word you want.

import re

keyword='reportChange'

with open("Log_1.txt", 'r') as f:
    result = re.search('#(.*)#[^)]+{}[^)]+\)'.format(keyword), f.read())

print(result.group(0))
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, that solution worked, but I need it to output the block of data each time the keyword is found. Not just the first instance or the last instance.
1

As regular expression you have to use a negative lookahead, as well as a negative lookbehind.

Try this: (?!#).*(?<![)]) as regex. It should output everything between # and ).

For the future: Use regex101.com to test your regular expressions.

Comments

1

This code prints only blocks of data where there's reportChange::someMoreInfo called with invalid some ID:

data = '''//Log_1.txt
# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE
(some_ID = [12345] reportChange::someMoreInfo called with invalid some ID)

# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE
(some_ID = [12345] failed::someMoreInfo called with invalid some ID)

# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE
(some_ID = [12345xxx] reportChange::someMoreInfo called with invalid some ID)
'''

import re

for d in re.split(r'\n\n', data):
    g = re.findall(r'^# DON\'T WANT #.*reportChange::someMoreInfo called with invalid some ID\)$', d, flags=re.M|re.DOTALL)
    if g:
        print(g[0])
        print()

Prints:

# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE
(some_ID = [12345] reportChange::someMoreInfo called with invalid some ID)

# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE
(some_ID = [12345xxx] reportChange::someMoreInfo called with invalid some ID)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.