0

I open my file like so :

f = open("filename.ext", "rb") # ensure binary reading with b

My first line of data looks like this (when using f.readline()):

'\x04\x00\x00\x00\x12\x00\x00\x00\x04\x00\x00\x00\xb4\x00\x00\x00\x01\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\x18\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x04\x00\x00\x00\x05\x00\x00\x00\x06\x00\x00\x00:\x00\x00\x00;\x00\x00\x00<\x00\x00\x007\x00\x00\x008\x00\x00\x009\x00\x00\x00\x07\x00\x00\x00\x08\x00\x00\x00\t\x00\x00\x00\n'

Thing is, I want to read this data byte by byte (f.read(4)). While debugging, I realized that when it gets to the end of the first line, it still takes in the newline character \n and it is used as the first byte of the following int I read. I don't want to simply use .splitlines()because some data could have an n inside and I don't want to corrupt it. I'm using Python 2.7.10, by the way. I also read that opening a binary file with the b parameter "takes care" of the new line/end of line characters; why is not the case with me?

This is what happens in the console as the file's position is right before the newline character:

>>> d = f.read(4)
>>> d
'\n\x00\x00\x00'
>>> s = struct.unpack("i", d)
>>> s
(10,)
9
  • What is the format of the file really? Are the newlines intended? Is the newline character just an accident, when the real information is bytes? Then, what's the problem with reading it? Commented Apr 27, 2016 at 17:22
  • Well, as any file, there are newline characters at the end of the line. Each line has a number of integers, but reading by groups of 4 bytes has erronous results, because when the file's position is at the end of the first line, just before \n, the next f.read(4) takes \n as the first byte, and the next three bytes are the first three bytes of the second line (the next line). Commented Apr 27, 2016 at 17:26
  • So basically the newlines are there for formatting/seperation and shouldn't be read as "data"? Commented Apr 27, 2016 at 17:30
  • 2
    If it's a binary file, then it probably shouldn't have "lines", and if it does, then the "newlines" are just data that is accidentally interpreted as newline. Commented Apr 27, 2016 at 17:34
  • 1
    readline() tries to interpret each byte as a character, but sometimes it's just data, and it isn't meant to have a meaning as a character. (asciitable.com for meanings of values). If you stored ONLY data, you wanna read it as ONLY data, even if it has a "meaning" of newline. Commented Apr 27, 2016 at 17:54

2 Answers 2

1

(Followed from discussion with OP in chat)

Seems like the file is in binary format and the newlines are just mis-interpreted values. This can happen when writing 10 to the file for example.

This doesn't mean that newline was intended, and it is probably not. You can just ignore it being printed as \n and just use it as data.

Sign up to request clarification or add additional context in comments.

Comments

0

You should just be able to replace the bytes that indicate it is a newline.

>>> d = f.read(4).replace(b'\x0d\x0a', b'') #\r\n should be bytes b'\x0d\x0a'
>>> diff = 4 - len(d)
>>> while diff > 0: # You can probably make this more sophisticated
...     d += f.read(diff).replace(b'\x0d\x0a', b'') #\r\n should be bytes b'\x0d\x0a'
...     diff = 4 - len(d)
>>> 
>>> s = struct.unpack("i", d)

This should give you an idea of how it will work. This approach could mess with your data's byte alignment.

If you really are seeing "\n" in your print of d then try .replace(b"\n", b"")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.