File reading and RE parsing

Question

I've a strange behaviour that I don't understand :

If I open my file , I find my bytes , but only once at a time :

f = open('d:\BB.ki', "rb")
f10 = re.findall( b'\x03\x00\x00\x10''(.*?)''\xF7\x00\xF0', f.read() )
print f10
['1BBBAAAABBBBAAAABBBBAAAABBBBAAAA\x00']

f = open('d:\BB.ki', "rb")
f11 = re.findall( b'\x03\x00\x00\x11''(.*?)''\xF7\x00\xF0', f.read() )
print f11
['2AAABBBBAAAABBBBAAAA\x00']

If I try to opening the file and getting severall bytes , I only get the 1st one (f11 is empty )

f = open('d:\BB.ki', "rb")
f10 = re.findall( b'\x03\x00\x00\x10''(.*?)''\xF7\x00\xF0', f.read() )
f11 = re.findall( b'\x03\x00\x00\x11''(.*?)''\xF7\x00\xF0', f.read() )
print f10,f11
['1BBBAAAABBBBAAAABBBBAAAABBBBAAAA\x00'] **[]**

May I use a loop , or something similar ?

Thanks

In addition to the answers below, you can always do f.seek(0) to reset the file stream pointer to the beginning of the file, and then the second read() will work :) — Tisho
– Tisho, Commented Jul 5, 2012 at 15:38

Mark Byers · Accepted Answer · 2012-07-05 13:35:15Z

1

After you call f.read() there are no more bytes available to be read so a second call to f.read() will return an empty string. Store the result of f.read() instead of reading twice:

s = f.read()
f10 = re.findall( b'\x03\x00\x00\x10''(.*?)''\xF7\x00\xF0', s)
f11 = re.findall( b'\x03\x00\x00\x11''(.*?)''\xF7\x00\xF0', s)

You may also want to scan the data just a single time, finding both expressions:

matches = re.findall( b'\x03\x00\x00[\x10\x11]''(.*?)''\xF7\x00\xF0', s)

If your file contains the bytes '\x03\x00\x00\x10\x03\x00\x00\x11_\xF7\x00\xF0' the method you proposed will find two overlapping matches (\x03\x00\x00\x11_ and _), whereas the single scan approach finds only a single match.

edited Jul 5, 2012 at 13:35

answered Jul 5, 2012 at 13:30

Mark Byers

844k202 gold badges1.6k silver badges1.5k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Waraba Over a year ago

Great ! the 1st solution is what I needed , THX :)

corn3lius · Accepted Answer · 2012-07-05 13:27:53Z

0

f.read() consumes the entire file. only f10 will seen.

try this maybe.

 for line in open('d:\BB.ki', "rb").readlines():
    f10 = re.findall( b'\x03\x00\x00\x10''(.*?)''\xF7\x00\xF0', line )
    f11 = re.findall( b'\x03\x00\x00\x11''(.*?)''\xF7\x00\xF0', line )

answered Jul 5, 2012 at 13:27

corn3lius

5,0053 gold badges33 silver badges37 bronze badges

Collectives™ on Stack Overflow

File reading and RE parsing

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related