19

I have a binary file which contains records of position of a plane. Each record look like:

0x00: Time, float32
0x04: X, float32 // X axis position
0x08: Y, float32 // Y axis position
0x0C: Elevation, float32
0x10: float32*4 = Quaternion (x,y,z axis and w scalar)
0x20: Distance, float32 (unused)

So each record is 32 bytes long.

I would like to get a Numpy array.

At offset 1859 there is an unsigned int 32 (4 bytes) which indicates the number of elements of the array. 12019 in my case.

I don't care (for now) header data (before offset 1859)

Array only start at offset 1863 (=1859+4).

I defined my own Numpy dtype like

dtype = np.dtype([
    ("time", np.float32),
    ("PosX", np.float32),
    ("PosY", np.float32),
    ("Alt", np.float32),
    ("Qx", np.float32),
    ("Qy", np.float32),
    ("Qz", np.float32),
    ("Qw", np.float32),
    ("dist", np.float32),
])

And I'm reading file using fromfile:

a_bytes = np.fromfile(filename, dtype=dtype)

But I don't see any parameter to provide to fromfile to pass offset.

3 Answers 3

20

You can open the file with a standard python file open, then seek to skip the header, then pass in the file object to fromfile. Something like this:

import numpy as np
import os

dtype = np.dtype([
    ("time", np.float32),
    ("PosX", np.float32),
    ("PosY", np.float32),
    ("Alt", np.float32),
    ("Qx", np.float32),
    ("Qy", np.float32),
    ("Qz", np.float32),
    ("Qw", np.float32),
    ("dist", np.float32),
])

f = open("myfile", "rb")
f.seek(1863, os.SEEK_SET)

data = np.fromfile(f, dtype=dtype)
print x 
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks. It solved my problem. I also noticed data = np.memmap(filename, dtype=dtype, mode='r', offset=offset_array, shape=N) `
right on, if its a large file then the memmap may be the way to go.
3

I faced a similar problem, but none of the answers above satisfied me. I needed to implement something like virtual table with a very big number of binary records that potentially occupied more memory than I can afford in one numpy array. So my question was how to read and write a small set of integers from/to a binary file - a subset of a file into a subset of numpy array.

This is a solution that worked for me:

import numpy as np
recordLen = 10 # number of int64's per record
recordSize = recordLen * 8 # size of a record in bytes
memArray = np.zeros(recordLen, dtype=np.int64) # a buffer for 1 record

# Create a binary file and open it for write+read
with open('BinaryFile.dat', 'w+b') as file:
    # Writing the array into the file as record recordNo:
    recordNo = 200 # the index of a target record in the file
    file.seek(recordSize * recordNo)
    bytes = memArray.tobytes()
    file.write(bytes)

    # Reading a record recordNo from file into the memArray
    file.seek(recordSize * recordNo)
    bytes = file.read(recordSize)
    memArray = np.frombuffer(bytes, dtype=np.int64).copy()
    # Note copy() added to make the memArray mutable

Comments

0

I suggest using numpy frombuffer:

with open(file_path, 'rb') as file_obj:
    file_obj.seek(seek_to_position)
    data_ro = np.frombuffer(file_obj.read(total_num_bytes), dtype=your_dtype_here)
    data_rw = data_ro.copy() #without copy(), the result is read-only

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.