Regex to find a multi line string that includes another string between lines

Question

my first Q here.

I have a log file that has multiple similar strings as hits:

Region: AR
OnlineID: Atl_Tuc
---Start---
FIFA 18 Legacy Edition
---END---

Region: FR
OnlineID: jubtrrzz
---Start---
FIFA 19
Undertale
Pro Evolution Soccer™ 2018
---END---

Region: US
OnlineID: Cu128yi
---Start---
KINGDOM HEARTS HD 1.5 +2.5 ReMIX
---END---

Region: RO
OnlineID: Se116
---Start---
Real Farm
EA SPORTS™ FIFA 20
LittleBigPlanet™ 3
---END---

Region: US
OnlineID: CAJ5Y
---Start---
Madden NFL 18: G.O.A.T. Super Bowl Edition
---END---

I wanna find all hits which contain fifa (fifa as a string). Fifa is example, I need to find all hits which contain some strings.

The last thing I could find is this regex: (?s)(?=^\r\n)(.*?)(fifa)(.*?)(?=\r\n\r\n)

But when I use this, it selects all hits including hits with no fifa, until it finds a fifa in a hit, so it selects more than 1 hit sometimes like this.

Second problem is I can't use .* in (fifa) bcz it causes wrong selection.

What can I do now?

The right output should be like this:

Region: AR
OnlineID: Atl_Tuc
---Start---
FIFA 18 Legacy Edition
---END---

Region: FR
OnlineID: jubtrrzz
---Start---
FIFA 19
Undertale
Pro Evolution Soccer™ 2018
---END---

Region: RO
OnlineID: Se116
---Start---
Real Farm
EA SPORTS™ FIFA 20
LittleBigPlanet™ 3
---END---

Maybe (?si)(?:^(?<!.)|\R{2})\K(?:(?!\R{2}).)*?\bfifa\b.*?(?=\R{2}|\z) will do? — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Nov 15, 2020 at 0:52
So, you want to remove all paragraphs not containing fifa? — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Nov 15, 2020 at 0:54
.* would match anything, you need to use (?-s:\bfifa\b.*\b20\b) — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Nov 15, 2020 at 1:10

Wiktor Stribiżew · Accepted Answer · 2020-11-15 01:09:22Z

3

You can use

(?si)(?:^(?<!.)|\R{2})\K(?:(?!\R{2}).)*?\bfifa\b.*?(?=\R{2}|\z)

See the regex demo

Details

(?si) - s makes . match line break chara (same as . matches newline ON) and case insensitive matching ON
(?:^(?<!.)|\R{2}) - matches start of a file or two line break sequences
\K - omits the matched line breaks
(?:(?!\R{2}).)*? - any char, 0 or more occurrences but as few as possible, not starting a double line break sequence
\bfifa\b - whole word fifa
.*? - any 0+ chars as few as possible
(?=\R{2}|\z) - up to the double line break or end of file.

Now, if you want to match a paragraph with fifa and then 20 on some of its line, use

(?si)(?:^(?<!.)|\R{2})\K(?:(?!\R{2}).)*?(?-s:\bfifa\b.*\b20\b).*?(?=\R{2}|\z)

The (?-s:\bfifa\b.*\b20\b) is a modifier group where . stops matching line breaks, and it matches a whole word fifa, then any 0+ chars other than line break chars, as many as possible, and then a 20 as a whole word.

See this regex demo.

answered Nov 15, 2020 at 1:09

Wiktor Stribiżew

631k41 gold badges502 silver badges633 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

morez890 Over a year ago

It is exactly what I was searching for. And now can I use another start or end for this regex? I mean using a string instead of two line break sequences for start or end of the selection ( (?:^(?<!.)|\R{2}) )

Wiktor Stribiżew Over a year ago

@morez890 Suit yourself. I do not have access to your data.

morez890 Over a year ago

Thanks for the solution, I was searching for this like 1 month on the internet.

Wiktor Stribiżew Over a year ago

^Language:(?-s:.*)\R

Wiktor Stribiżew Over a year ago

@morez890 Not sure what you mean, but you can work around some things in bat files.

|

mareoraft · Accepted Answer · 2020-11-15 01:28:47Z

1

It would be better not to use regex for this entire problem. I would use something simpler to cut the log file into pieces, 1 piece per paragraph.

Then use a regex to see if each paragraph is a "hit" or not.

Here is some Python code:

# read the file contents into a string
log_text = open('/input/log/file/path/here', 'r').read().strip()

# split the string into separate paragraphs
paragraphs = log_text.split('\n\n')

# filter the paragraphs to the ones you want
filtered_paragraphs = filter(is_wanted, paragraphs)

# recombine the filtered paragraphs into a new log string
new_log_text = '\n\n'.join(filtered_paragraphs)

# output new log text into new file
open('/output/log/file/path/here', 'w').write(new_log_text)

and of course you will need to define the is_wanted function:

import re

def is_wanted(paragraph):
    # discard first three and last line to get paragraph content
    p_content = '\n'.join(paragraph.split('\n')[3:-1])
    # input any regex pattern here, such as 'FIFA'.  You can pass it into the function as a variable if you need it to be customizable
    return bool(re.search(r'FIFA', p_content))

edited Nov 15, 2020 at 1:28

answered Nov 15, 2020 at 1:17

mareoraft

4,0086 gold badges34 silver badges68 bronze badges

2 Comments

morez890 Over a year ago

Thanks, I don't know if it works or not, but I wanna use regex in batch file, that's why I asked for regex.

mareoraft Over a year ago

No worries! I have verified that it works, but I understand that you may not want to use Python for your particular situation. Have a great day and welcome to Stack Overflow!

Collectives™ on Stack Overflow

Regex to find a multi line string that includes another string between lines

2 Answers 2

11 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

11 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related