Ensuring python equivalence of matlab's `fread`

Question

I have a binary test file found at http://jmp.sh/VpTZxgQ and I am trying to rewrite some matlab code in python which reads this file.

What I have realised is that matlab's fread remembers what has already been read so that it skips the number of bytes that have already been read. How do I ensure I get the same behaviour in python?

Matlab Code:

clear all; close all;

path = pwd;
ext = 'bin';
stem = 'test';
filename = [stem,'.',ext];
filename = fullfile(path,filename);
fid = fopen(filename,'r');

fread(fid,2,'int16')
fread(fid,32,'char')
fread(fid,2,'int16')

Python Code:

import numpy as np  

def fread(filename, n, precision):
     with open(filename, 'rb') as fid:
         data_array = np.fromfile(fid, precision).reshape((-1, 1)).T

     return data_array[0,0:n]

print fread('test.bin', 2, np.int16)                                                                                                                         
print fread('test.bin', 32, np.str)
print fread('test.bin', 2, np.int16)

Ideally I would want the output of these formulations to be the same, but they are not. In fact python gives a value error when I try to set precision to np.str...

As a bonus question - I'm assuming that reading a binary file and making sense of the data requires that the user has an understanding of how the data was formatted in order to make any sensible information of the data. Is this true?

Side note: you are not using precision in your fread definition. — sco1
– sco1, Commented Dec 1, 2015 at 17:36
As for your bonus question: correct. A binary file is just a stream of bits, you have to make sense of it by reading into the appropriate number and type of variables. What's worse: you have to consider things like endianness, even if you get the size and type of variables right. — Andras Deak -- Слава Україні
– Andras Deak -- Слава Україні, Commented Dec 1, 2015 at 17:38
You can use fid.seek to navigate to the correct position in your file. This answer may be useful. — sco1
– sco1, Commented Dec 1, 2015 at 17:45
Jack, I wouldn't be surprised if the python version did the same. Any reasonable file-processing function should continue where it left off by default, and resetting/relocating should be the non-trivial task. So I'd first try it with subsequent fromfile calls (with proper input) without any tricks, and if by any chance it gives you the same result for 3 calls, then start worrying about relocation after each call:) — Andras Deak -- Слава Україні
– Andras Deak -- Слава Україні, Commented Dec 1, 2015 at 17:53

Nick · Accepted Answer · 2015-12-01 19:21:53Z

6

As the comments suggest, you need to use a file descriptor, which is what the Matlab code is doing:

import numpy as np

def fread(fid, nelements, dtype):
     if dtype is np.str:
         dt = np.uint8  # WARNING: assuming 8-bit ASCII for np.str!
     else:
         dt = dtype

     data_array = np.fromfile(fid, dt, nelements)
     data_array.shape = (nelements, 1)

     return data_array

fid = open('test.bin', 'rb');

print fread(fid, 2, np.int16)
print fread(fid, 32, np.str)
print fread(fid, 2, np.int16)

Reading & Writing data to a file in binary requires the reader and writer to agree on a specified format. As the commenters suggest, endianess may become an issue if you save the binary on one computer and try to read it on another. If the data is always written and read on the same CPU, then you won't run into the issue.

Output for the test.bin:

MATLAB Output             Python+Numpy Output
------------------------------------------------------
ans =                     

    32                    [[32]
     0                     [ 0]]

ans =                   

    35                    [[ 35]
    32                     [ 32]
    97                     [ 97]
   102                     [102]
    48                     [ 48]
    52                     [ 52]
    50                     [ 50]
    95                     [ 95]
    53                     [ 53]
    48                     [ 48]
   112                     [112]
   101                     [101]
   114                     [114]
    99                     [ 99]
    95                     [ 95]
   115                     [115]
   112                     [112]
    97                     [ 97]
   110                     [110]
    32                     [ 32]
    32                     [ 32]
    32                     [ 32]
    32                     [ 32]
    32                     [ 32]
    32                     [ 32]
    32                     [ 32]
    32                     [ 32]
    32                     [ 32]
    32                     [ 32]
    32                     [ 32]
    32                     [ 32]
    32                     [ 32]]

ans =

    32                     [[32]
     0                      [ 0]]

edited Dec 1, 2015 at 19:21

answered Dec 1, 2015 at 18:28

Nick

2,45230 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Andras Deak -- Слава Україні Over a year ago

As I also commented, passing that precision to fromfile in that way will not work.

Andras Deak -- Слава Україні Over a year ago

I think passing it to fromfile as fromfile(fid,count=n,dtype=precision) should work.

Nick Over a year ago

Yes, that worked, but i ran into trouble for np.str which has an element size = 0, so the code assumes ASCII 8-bit characters (np.uint8).

Andras Deak -- Слава Україні Over a year ago

Yeah I can see how you'd have to treat that separately. But anyway: a string is not a portable data type. Is it?

Collectives™ on Stack Overflow

Ensuring python equivalence of matlab's `fread`

Matlab Code:

Python Code:

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Matlab Code:

Python Code:

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related