Python condition with regex

Question

I have a file with those data :

PAS_BEGIN_0009999
    T71_MANUFACTURER_4=98
    T71_COLOR_ID_7=000
    T71_OS_7=08
PAS_END_0009999

PAS_BEGIN_0009996
    T72_VAS_SERVICE_IDENTIFIER_6=
    T72_ORDER_NB_7=0003
    T72_TECHNOLOGY_7=01
PAS_END_0009996

TPV_BEGIN
    PAS_20819001=3600000 
TPV_END

How can I simply isolate content betweenPAS_BEGIN_0009996 and PAS_BEGIN_0009996

Here is a link for the example : https://regexr.com/3vmeq

It finds something, but my python code doesn't find anything..

if re.match("PAS_BEGIN_0009999([\S\s]*)PAS_END_0009999", line):
    data.append(line)
    print(line)

Can anyone help me on this ? Thanks

re.match only searches at the start of the string. Use re.search. Also, change [\S\s]* to [\S\s]*? — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Sep 18, 2018 at 14:18
I don't think my problem is only with search or match. By changing to search, it doesn't change the result — vieroli
– vieroli, Commented Sep 18, 2018 at 14:26
A, yes, a common issue: you are reading line by line. Use with open(filepath, "r") as f: contents = f.read(). Search inside contents. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Sep 18, 2018 at 14:27
I did this if re.search("PAS_BEGIN_0009999[\S\s]*PAS_END_0009999", contents): data.append(contents) but it adds all my file,looks like it doesn't understand my regex — vieroli
– vieroli, Commented Sep 18, 2018 at 14:44
No, m = re.search(....), then if m:, then data.append(m.group()). Or use data = re.findall(regex, contents) — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Sep 18, 2018 at 15:56

Wiktor Stribiżew · Accepted Answer · 2018-09-18 19:30:02Z

1

You are reading a text file line by line, but your expected match is located on several lines. You need to read the whole file into a variable, then run a regex like yours, or, better, a pattern like a.*?b with re.DOTALL option so that . could match line break chars.

So, you may use something like

import re
fpath = 'your_file_path.txt'
data = ''
pattern=r'PAS_BEGIN_0009999(.*?)PAS_END_0009999'
with open(filepath, "r") as f:
    contents = f.read()
    m = re.search(pattern, contents)
    if m:
        data = m.group(1) # or `.group() if you need to include PAS_BEGIN_0009999 and PAS_END_0009999

If you need to find multiple occurrences, replace the re.search part (all lines after contents) with

data = re.findall(pattern, contents)

See the regex demo

answered Sep 18, 2018 at 19:30

Wiktor Stribiżew

631k41 gold badges502 silver badges633 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

vieroli Over a year ago

Thanks, with the comment from my post, you solved a part of my problem ;)

Collectives™ on Stack Overflow

Python condition with regex

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related