Find pattern on multiple lines within BIG log files

Question

To investigate within logs, I am trying to find the very first time a vulnerability in a workflow has been exploited.

The pattern is on multiple lines.

The pattern would be

AAAAAAAAA
BBBBBBBBB
CCCCCCCCC

The problem is that

AAAAAAAAA

or

BBBBBBBBB

or

CCCCCCCCC

Can be found anywhere indivdually in the log without showing the vulnerability; it is the exact pattern in this exact order that will help me.

For example

grep -Ei "AAAAAAAAA|BBBBBBBBB|CCCCCCCCC" logfile does not help me since all the lines with individual occurence of AAAAAAAAA BBBBBBBBB CCCCCCCCC will be there.

How can I solve this?

There are quite a number of "multiline match" questions - this one for example Multiline Regexp (grep, sed, awk, perl) looks close to yours — steeldriver
– steeldriver, Commented Apr 3, 2021 at 21:20
I don't think a multi-line regular expression is going to work in this context. since it would need to be AAAAA.*BBBBB.*CCCCC and each candidate AAAAA would force grep to span the rest of the "Big" file. — Philip Couling
– Philip Couling, Commented Apr 3, 2021 at 23:16

EWJ00 · Accepted Answer · 2021-04-04 06:36:04Z

Here's a way you can do it in python (I added to your example a bit to prove that you can still get the matches you desire even if there are random single lines of AAAAAAAAA, BBBBBBBBB, or CCCCCCCCC dispersed throughout the logfile) :

below are the contents of find_log_vulns.py

#! /usr/bin/python3

import re

test_string = """1234324
AAAAAAAAA
BBBBBBBBB
CCCCCCCCC
absdfjv4er4
AAAAAAAAA
BBBBBBBBB
CCCCCCCCC
123466666
AAAAAAAAA
ghrhvhhhfh
BBBBBBBBB
fjwjefjsjfjwjf
CCCCCCCCC
24wfsgggg
AAAAAAAAA
BBBBBBBBB
CCCCCCCCC
zzzz"""

matches = re.findall('AAAAAAAAA\nBBBBBBBBB\nCCCCCCCCC\n', test_string, re.MULTILINE)

print(matches)

The result I get from running the above:

$ ./find_log_vulns.py
['AAAAAAAAA\nBBBBBBBBB\nCCCCCCCCC\n', 'AAAAAAAAA\nBBBBBBBBB\nCCCCCCCCC\n', 'AAAAAAAAA\nBBBBBBBBB\nCCCCCCCCC\n']

As shown above, each match will be returned as an element in a list.

user339730 · Accepted Answer · 2021-04-09 13:09:00Z

1

using ripgrep:

rg -U 'A+\nB+\nC+' in
2:AAAAAAAAA
3:BBBBBBBBB
4:CCCCCCCCC
6:AAAAAAAAA
7:BBBBBBBBB
8:CCCCCCCCC
16:AAAAAAAAA
17:BBBBBBBBB
18:CCCCCCCCC

you can get rid of the line numbers, and so on. If you need separators between the matches you can do this:

rg -U 'A+\nB+\nC+' in | rg --passthru -e '(^A)' -r $'\n'A

AAAAAAAAA
BBBBBBBBB
CCCCCCCCC

AAAAAAAAA
BBBBBBBBB
CCCCCCCCC

AAAAAAAAA
BBBBBBBBB
CCCCCCCCC

answered Apr 9, 2021 at 13:09

user339730

Add a comment |

αғsнιη · Accepted Answer · 2021-04-09 15:50:03Z

Using awk:

awk -v ptrn="AAAAAAAAA\0BBBBBBBBB\0CCCCCCCCC\0" '
BEGIN{ split(ptrn, tmp, "\0"); lngth=gsub("\0", "", ptrn ) }
$0 ~ tmp[++fieldNr]{ buf=(buf==""?"": buf OFS) NR":"$0 ;
                     if ( fieldNr == lngth ) { print buf; exit }
                     next
                   }
{ fieldNr=0; buf="" }' infile

this will give you the line number followed by the matched line content; here we used "Partial Regexp Match" using the patterns from the "ptrn" against the lines. see How do I find the text that matches a pattern? for other matching options.

we used NUL character \0 to separate patterns.

Sample input:

AAAAAAAAA
BBBBBBBBB

CCCCCCCCC
AAAAAAAAA
BBBBBBBBB
ccccccccc
123AAAAAAAAA
BBBBBBBBB123
123CCCCCCCCC3

Output:

8:123AAAAAAAAA 9:BBBBBBBBB123 10:123CCCCCCCCC3

bu5hman · Accepted Answer · 2021-04-10 06:15:06Z

1

Just for fun with good old awk

cat file | wc -l
21287021

with > 3000,000 matches

time awk 'BEGIN{getline; a=$0; getline; b=$0}
       $0~/^C+$/ && a~/^A+$/ && b~/^B+$/{print "match starting on line "NR-2 }{a=b;b=$0}' file

real    0m12.644s
user    0m7.149s
sys     0m4.314s

Compared with rgon my machine

time rg -U 'A+\nB+\nC+' file
real    0m40.322s
user    0m16.503s
sys     0m17.246s

edited Apr 10, 2021 at 6:15

answered Apr 9, 2021 at 16:15

bu5hman

4,8512 gold badges16 silver badges29 bronze badges

Add a comment |

Stack Exchange Network

Find pattern on multiple lines within BIG log files

4 Answers 4

You must log in to answer this question.

Linked

Hot Network Questions

Find pattern on multiple lines within BIG log files

4 Answers 4

You must log in to answer this question.

Linked

Related

Hot Network Questions