1

I have a file with those data :

PAS_BEGIN_0009999
    T71_MANUFACTURER_4=98
    T71_COLOR_ID_7=000
    T71_OS_7=08
PAS_END_0009999

PAS_BEGIN_0009996
    T72_VAS_SERVICE_IDENTIFIER_6=
    T72_ORDER_NB_7=0003
    T72_TECHNOLOGY_7=01
PAS_END_0009996

TPV_BEGIN
    PAS_20819001=3600000 
TPV_END

How can I simply isolate content betweenPAS_BEGIN_0009996 and PAS_BEGIN_0009996

Here is a link for the example : https://regexr.com/3vmeq

It finds something, but my python code doesn't find anything..

if re.match("PAS_BEGIN_0009999([\S\s]*)PAS_END_0009999", line):
    data.append(line)
    print(line)

Can anyone help me on this ? Thanks

5
  • 2
    re.match only searches at the start of the string. Use re.search. Also, change [\S\s]* to [\S\s]*? Commented Sep 18, 2018 at 14:18
  • I don't think my problem is only with search or match. By changing to search, it doesn't change the result Commented Sep 18, 2018 at 14:26
  • 2
    A, yes, a common issue: you are reading line by line. Use with open(filepath, "r") as f: contents = f.read(). Search inside contents. Commented Sep 18, 2018 at 14:27
  • I did this if re.search("PAS_BEGIN_0009999[\S\s]*PAS_END_0009999", contents): data.append(contents) but it adds all my file,looks like it doesn't understand my regex Commented Sep 18, 2018 at 14:44
  • 1
    No, m = re.search(....), then if m:, then data.append(m.group()). Or use data = re.findall(regex, contents) Commented Sep 18, 2018 at 15:56

1 Answer 1

1

You are reading a text file line by line, but your expected match is located on several lines. You need to read the whole file into a variable, then run a regex like yours, or, better, a pattern like a.*?b with re.DOTALL option so that . could match line break chars.

So, you may use something like

import re
fpath = 'your_file_path.txt'
data = ''
pattern=r'PAS_BEGIN_0009999(.*?)PAS_END_0009999'
with open(filepath, "r") as f:
    contents = f.read()
    m = re.search(pattern, contents)
    if m:
        data = m.group(1) # or `.group() if you need to include PAS_BEGIN_0009999 and PAS_END_0009999

If you need to find multiple occurrences, replace the re.search part (all lines after contents) with

data = re.findall(pattern, contents)

See the regex demo

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, with the comment from my post, you solved a part of my problem ;)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.