1

I am trying to capture this multiline log style in python, similar to that of a log parser. Here is a log sample:

> [2019-11-21T00:58:47.922Z] This is a single log line
> [2019-11-21T00:59:02.781Z] This is a multiline log This is a multiline
> log This is a multiline log This is a multiline log
> [2019-11-21T00:58:47.922Z] This is a single log line
> [2019-11-21T00:59:02.781Z] This is a multiline log This is a multiline
> log This is a multiline log This is a multiline log

Unfortunately, the newline characters are messing me up. I've tried negative lookaheads, behinds, etc. I can never capture more than a single log line. When I try to include the newlines, I end up capturing the entire log.

What python regex can I use to capture each message indiviudally?

I've tried stuff like:

regex = re.compile(r"^\[20.*Z\][\s\S]+", re.MULTILINE)

:(

0

2 Answers 2

2

As an alternative you could match the pattern that marks the start of the log using the square brackets and repeat matching all following lines that do not start with an opening square bracket

^\[20[^\]]+Z\].*(?:\r?\n(?!\[).*)*

In parts that will match

  • ^ Start of the string
  • \[20[^\]]+Z\] Match [20, then any char except ] and then Z]
  • .* Match any char except a newline 0+ times
  • (?: Non capturing group
    • \r?\n(?!\[) Match a newline and assert that it does not start with [
    • .* Match any char except a newline 0+ times
  • )* Close non capturing group and repleat 0+ times

Regex demo

Sign up to request clarification or add additional context in comments.

Comments

2

You may use this regex in python with a lookahead:

^\[20[^]]*Z\][\s\S]+?$(?=\n\[|\Z)

RegEx Demo

RegEx Details:

  • ^: Start
  • \[20[^]]*Z\]: Match date-time string wrapped as [...Z]
  • [\s\S]+?$`: Match 1 or of any character including line breaks (non-greedy)
  • (?=\n\[|\Z): Positive lookahead condition to assert that we have a newline and start of date-time stamp [ at next position or it is end of input

Here is an alternate non-lookeahead solution that is more efficient:

^\[20[^]]*Z\].*(?:\n[^[].*)*

RegEx Demo 2

RegEx Details:

  • ^: Start
  • \[20[^]]*Z\]: Match date-time string wrapped as [...Z]
  • .*: Match rest of line (without line breaks)
  • (?:\n[^[].*)*: Match remaining part of message that is a line-break followed by a non [ character at start

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.