I created a parser for some complex binary files using numpy.fromfile and defining the various dtypes necessary for reading each portion of the binary file. The resulting numpy array was then placed into a pandas dataframe and the same dtype that was defined for converting the binary files into the numpy array was recycled to define the column names for the pandas dataframe.
I was hoping to replicate this process using python struct but ran into an issue. If part of my structure requires a value to be a group of 3 ints, I can define the dtype as numpy.dtype([('NameOfField', '>i4', 3)]) and the returned value from the binary file is [int, int, int]. Can this be replicated using struct or do I need to regroup the values in the returned tuple based on the dtype before ingesting it into my pandas dataframe ?? I have read the python struct documentation and have not noticed any examples of this.
When using a dtype of >3i returns a result of int, int, int instead of [int, int, int] like I need.
Edit ...
Below is a generic example. This method using numpy.fromfile works perfect but is slow when working on my huge binary files so I am trying to implement using struct
import numpy as np
import pandas as pd
def example_structure():
dt = np.dtype([
('ExampleFieldName', '>i4', 3)
])
return dt
# filename of binary file
file_name = 'example_binary_file'
# define the dtype for this chunk of binary data
d_type = example_structure()
# define initial index for the file in memory
start_ind = 0
end_ind = 0
# read in the entire file generically
x = np.fromfile(file_name, dtype='u1')
# based on the dtype find the chunk size
chunk_size = d_type.itemsize
# define the start and end index based on the chunk size
start_ind = end_ind
end_ind = chunk_size + start_ind
# extract just the first chunk
temp = x[start_ind:end_ind]
# cast the chunk as the defined dtype
temp.dtype = d_type
# store the chunk in its own pandas dataframe
example_df = pd.DataFrame(temp.tolist(), columns=temp.dtype.names)
This will return a temp[0] value of [int, int, int] that will then be read into the pandas dataframe as a single entry under the column ExampleFieldName. If I attempt to replicate this using struct the temp[0] value is int, int, int, which is not be read properly into pandas. Is there a way to make struct group values like I can do using numpy ??
[int, int, int]looks like a list, or is it a 1d array?int, int, intis that a tuple,(int, int, int), or something else?structmuch, and not withnumpy. However this stackoverflow.com/questions/30035287/…cythonquestion shows that there is a certain relatedness betweencstruct, pythonstructandnumpycompound dtypes.