4

my first Q here.

I have a log file that has multiple similar strings as hits:

Region: AR
OnlineID: Atl_Tuc
---Start---
FIFA 18 Legacy Edition
---END---

Region: FR
OnlineID: jubtrrzz
---Start---
FIFA 19
Undertale
Pro Evolution Soccer™ 2018
---END---

Region: US
OnlineID: Cu128yi
---Start---
KINGDOM HEARTS HD 1.5 +2.5 ReMIX
---END---

Region: RO
OnlineID: Se116
---Start---
Real Farm
EA SPORTS™ FIFA 20
LittleBigPlanet™ 3
---END---

Region: US
OnlineID: CAJ5Y
---Start---
Madden NFL 18: G.O.A.T. Super Bowl Edition
---END---

I wanna find all hits which contain fifa (fifa as a string). Fifa is example, I need to find all hits which contain some strings.

The last thing I could find is this regex: (?s)(?=^\r\n)(.*?)(fifa)(.*?)(?=\r\n\r\n)

But when I use this, it selects all hits including hits with no fifa, until it finds a fifa in a hit, so it selects more than 1 hit sometimes like this.

Second problem is I can't use .* in (fifa) bcz it causes wrong selection.

What can I do now?

The right output should be like this:

Region: AR
OnlineID: Atl_Tuc
---Start---
FIFA 18 Legacy Edition
---END---

Region: FR
OnlineID: jubtrrzz
---Start---
FIFA 19
Undertale
Pro Evolution Soccer™ 2018
---END---

Region: RO
OnlineID: Se116
---Start---
Real Farm
EA SPORTS™ FIFA 20
LittleBigPlanet™ 3
---END---
11
  • Please add desired output. What is the 'right' selection? Commented Nov 15, 2020 at 0:52
  • 2
    Maybe (?si)(?:^(?<!.)|\R{2})\K(?:(?!\R{2}).)*?\bfifa\b.*?(?=\R{2}|\z) will do? Commented Nov 15, 2020 at 0:52
  • @dawg post edited, check now Commented Nov 15, 2020 at 0:54
  • So, you want to remove all paragraphs not containing fifa? Commented Nov 15, 2020 at 0:54
  • 1
    .* would match anything, you need to use (?-s:\bfifa\b.*\b20\b) Commented Nov 15, 2020 at 1:10

2 Answers 2

3

You can use

(?si)(?:^(?<!.)|\R{2})\K(?:(?!\R{2}).)*?\bfifa\b.*?(?=\R{2}|\z)

See the regex demo

Details

  • (?si) - s makes . match line break chara (same as . matches newline ON) and case insensitive matching ON
  • (?:^(?<!.)|\R{2}) - matches start of a file or two line break sequences
  • \K - omits the matched line breaks
  • (?:(?!\R{2}).)*? - any char, 0 or more occurrences but as few as possible, not starting a double line break sequence
  • \bfifa\b - whole word fifa
  • .*? - any 0+ chars as few as possible
  • (?=\R{2}|\z) - up to the double line break or end of file.

Now, if you want to match a paragraph with fifa and then 20 on some of its line, use

(?si)(?:^(?<!.)|\R{2})\K(?:(?!\R{2}).)*?(?-s:\bfifa\b.*\b20\b).*?(?=\R{2}|\z)

The (?-s:\bfifa\b.*\b20\b) is a modifier group where . stops matching line breaks, and it matches a whole word fifa, then any 0+ chars other than line break chars, as many as possible, and then a 20 as a whole word.

See this regex demo.

Sign up to request clarification or add additional context in comments.

11 Comments

It is exactly what I was searching for. And now can I use another start or end for this regex? I mean using a string instead of two line break sequences for start or end of the selection ( (?:^(?<!.)|\R{2}) )
@morez890 Suit yourself. I do not have access to your data.
Thanks for the solution, I was searching for this like 1 month on the internet.
^Language:(?-s:.*)\R
@morez890 Not sure what you mean, but you can work around some things in bat files.
|
1

It would be better not to use regex for this entire problem. I would use something simpler to cut the log file into pieces, 1 piece per paragraph.

Then use a regex to see if each paragraph is a "hit" or not.

Here is some Python code:

# read the file contents into a string
log_text = open('/input/log/file/path/here', 'r').read().strip()

# split the string into separate paragraphs
paragraphs = log_text.split('\n\n')

# filter the paragraphs to the ones you want
filtered_paragraphs = filter(is_wanted, paragraphs)

# recombine the filtered paragraphs into a new log string
new_log_text = '\n\n'.join(filtered_paragraphs)

# output new log text into new file
open('/output/log/file/path/here', 'w').write(new_log_text)

and of course you will need to define the is_wanted function:

import re

def is_wanted(paragraph):
    # discard first three and last line to get paragraph content
    p_content = '\n'.join(paragraph.split('\n')[3:-1])
    # input any regex pattern here, such as 'FIFA'.  You can pass it into the function as a variable if you need it to be customizable
    return bool(re.search(r'FIFA', p_content))

2 Comments

Thanks, I don't know if it works or not, but I wanna use regex in batch file, that's why I asked for regex.
No worries! I have verified that it works, but I understand that you may not want to use Python for your particular situation. Have a great day and welcome to Stack Overflow!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.