Ignore newline character in binary file with Python?

Question

I open my file like so :

f = open("filename.ext", "rb") # ensure binary reading with b

My first line of data looks like this (when using f.readline()):

'\x04\x00\x00\x00\x12\x00\x00\x00\x04\x00\x00\x00\xb4\x00\x00\x00\x01\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\x18\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x04\x00\x00\x00\x05\x00\x00\x00\x06\x00\x00\x00:\x00\x00\x00;\x00\x00\x00<\x00\x00\x007\x00\x00\x008\x00\x00\x009\x00\x00\x00\x07\x00\x00\x00\x08\x00\x00\x00\t\x00\x00\x00\n'

Thing is, I want to read this data byte by byte (f.read(4)). While debugging, I realized that when it gets to the end of the first line, it still takes in the newline character \n and it is used as the first byte of the following int I read. I don't want to simply use .splitlines()because some data could have an n inside and I don't want to corrupt it. I'm using Python 2.7.10, by the way. I also read that opening a binary file with the b parameter "takes care" of the new line/end of line characters; why is not the case with me?

This is what happens in the console as the file's position is right before the newline character:

>>> d = f.read(4)
>>> d
'\n\x00\x00\x00'
>>> s = struct.unpack("i", d)
>>> s
(10,)

What is the format of the file really? Are the newlines intended? Is the newline character just an accident, when the real information is bytes? Then, what's the problem with reading it? — Amit Gold
– Amit Gold, Commented Apr 27, 2016 at 17:22
Well, as any file, there are newline characters at the end of the line. Each line has a number of integers, but reading by groups of 4 bytes has erronous results, because when the file's position is at the end of the first line, just before \n, the next f.read(4) takes \n as the first byte, and the next three bytes are the first three bytes of the second line (the next line). — user3180077
– user3180077, Commented Apr 27, 2016 at 17:26
So basically the newlines are there for formatting/seperation and shouldn't be read as "data"? — Amit Gold
– Amit Gold, Commented Apr 27, 2016 at 17:30
If it's a binary file, then it probably shouldn't have "lines", and if it does, then the "newlines" are just data that is accidentally interpreted as newline. — Amit Gold
– Amit Gold, Commented Apr 27, 2016 at 17:34
readline() tries to interpret each byte as a character, but sometimes it's just data, and it isn't meant to have a meaning as a character. (asciitable.com for meanings of values). If you stored ONLY data, you wanna read it as ONLY data, even if it has a "meaning" of newline. — Amit Gold
– Amit Gold, Commented Apr 27, 2016 at 17:54

Amit Gold · Accepted Answer · 2016-04-27 18:21:00Z

1

(Followed from discussion with OP in chat)

Seems like the file is in binary format and the newlines are just mis-interpreted values. This can happen when writing 10 to the file for example.

This doesn't mean that newline was intended, and it is probably not. You can just ignore it being printed as \n and just use it as data.

answered Apr 27, 2016 at 18:21

Amit Gold

7677 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

justengel · Accepted Answer · 2016-04-27 17:50:49Z

0

You should just be able to replace the bytes that indicate it is a newline.

>>> d = f.read(4).replace(b'\x0d\x0a', b'') #\r\n should be bytes b'\x0d\x0a'
>>> diff = 4 - len(d)
>>> while diff > 0: # You can probably make this more sophisticated
...     d += f.read(diff).replace(b'\x0d\x0a', b'') #\r\n should be bytes b'\x0d\x0a'
...     diff = 4 - len(d)
>>> 
>>> s = struct.unpack("i", d)

This should give you an idea of how it will work. This approach could mess with your data's byte alignment.

If you really are seeing "\n" in your print of d then try .replace(b"\n", b"")

edited Apr 27, 2016 at 17:50

answered Apr 27, 2016 at 17:43

justengel

6,3704 gold badges28 silver badges44 bronze badges

Collectives™ on Stack Overflow

Ignore newline character in binary file with Python?

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related