6

A measurement system (in our lab) produces data of 12 bits per sample in a packed format, i.e. 2 samples of 12 bits each are packed into 3 bytes:

   buf[l + 2]  |   buf[l + 1]  |   buf[l + 0]  
7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0
-----------------------|-----------------------
B A 9 8 7 6 5 4 3 2 1 0|B A 9 8 7 6 5 4 3 2 1 0
    sample[2*i + 1]    |    sample[2*i + 0]     

For NumPy I created the following unpacking function that will take a Python byte buffer apply some stride tricks and bit manipulations to it, returning the desired float32 array:

def unpack_s12p_to_f32(buf):
    import numpy
    import numpy.lib.stride_tricks as npst
    s12p = numpy.frombuffer(buf, dtype=numpy.int32)
    s12p_sv = numpy.copy( numpy.transpose(
        npst.as_strided(s12p,
            shape=(2, int((s12p.size*4)/3)),
            strides=(0,3), writeable=False) ))
    m12b = (1<<12)-1
    s12p_sv[:,0] &= m12b
    s12p_sv[:,0] <<= 20
    s12p_sv[:,1] >>= 12
    s12p_sv[:,1] &= m12b
    s12p_sv[:,1] <<= 20
    return s12p_sv.reshape(-1).astype(numpy.float32) * (2.**-31)

We now seek a method to replicate this function within Matlab. However, I was unsuccessful finding/identifying equivalent functions that would allow me to manipulate Matlab array objects in the same way.


Example data and conversion result

From one of our datasets I extracted 200 samples. When written as Python buffer literal, passed through the above unpacking function and plotted using Matplotlib it looks like this

from matplotlib.pyplot import plot,show

data = \
b'\x19P\x05g\x90\x05I\x10\x01\xcf_\xfa\x87\x7f\xf5a\xbf\xf7\xb7\xff\xff9\xf0\x04]P' \
b'\x04)\x90\xfe\xad\xdf\xf6M\x1f\xf4s\x7f\xfb\xf5\x7f\x02=\xd0\x04W\xf0\x01\xfb\x7f' \
b'\xfb\x81\xff\xf6\x85\x7f\xf7\x8f\xbf\xfb\x05p\x03O\x90\x045\x90\x02\xf7\x7f\xfb' \
b'\x7f\xff\xf6q_\xf7\xb7\xff\xff!p\x03G\x90\x02\xdd\x7f\xfb\xc3\xdf\xf9s\xff\xf6' \
b'\x91?\xfb\xeb?\x011\xf0\x035\xb0\xff\xe1\xff\xfa\x81\xbf\xf6\x89\x9f\xf9\xc7\x9f' \
b'\xfe\r\x90\x02=\xf0\x02\x19\x90\xfe\xc3?\xf9\xa3\x1f\xfa\x8d\x7f\xf7\x99\xbf\xfc' \
b'\t\x10\x03=0\x02\x13\xb0\xfe\xb3?\xf8\x8b\xff\xf7\x83\xbf\xf9\xcd_\xfe\x050\x01' \
b'\x130\xff\xd3?\xfc\xb9\xbf\xfa\xa5\xdf\xf9\xa5_\xfc\xe9\xdf\xfe\xdb\xff\xfd\xf7' \
b'\x9f\xff\xe7\x9f\xfb\xcb?\xfc\xbd\xff\xf9\xab\xff\xfc\xfd\xdf\xff\xf2O\xfe\xe2\xcf' \
b'\xfc\xdc\xcf\xfc\xd4\xaf\xfd\xe8\xcf\xfd\xdc\xcf\xfd\xfc\x8f\x00\xfa\xaf\xfe\xec' \
b'\x0f\xfd\xc6/\xfc\xde\x0f\xff\xf2\xcf\xfd\xe2\x8f\xfe\xe8/\xff\xf4o\xfc\xce\xcf\xff' \
b'\x08@\xff\xf8\xef\xfd\xe8o\x00\x10`\xff\xe6/\xfd\xeco\xff\x06`\x00\xfe\x8f\xfe\xfa' \
b'\xef\xff\xe2\xaf\xfc\xdao\xfe\x00\x00\x00\xf8\x8f\xfd\xe0\xef\xfe\x00\xc0\xfe\xeao\xfe'

plot( unpack_s12p_to_f32(data) )
show()

producing the following output

200 samples from an example 12 bit/sample dataset, unpacked with the shown function and plotted using matplotlib

2
  • Is the original data in a binary file? Maybe you can just use fread( ) with the 'ubit12' option. Commented Aug 21 at 1:38
  • The original data is in an store mode (i.e. uncompresssed) ZIP file, holding measurement parameters, metadata and the actual acquisitions; each measurement trigger event producing its own entry in the ZIP file. Commented Aug 21 at 8:00

4 Answers 4

5

Your code does not match your layout schematic. Your code assumes the following:

   buf[l + 2]  |   buf[l + 1]  |   buf[l + 0]  
7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0
-----------------------|-----------------------
B A 9 8 7 6 5 4 3 2 1 0|B A 9 8 7 6 5 4 3 2 1 0
    sample[2*i + 1]    |    sample[2*i + 0]    

The bytes are reversed, you are using a little-endian system, and thus the first byte in the buffer contains the lowest-valued bits of your 24-bit value.

I converted your data to decimal values, and imported it into MATLAB thus:

data = [
    25    80     5   103   144     5    73    16     1   207    95   250   135   127   245    97   191   247   183   255   255 ...
    57   240     4    93    80     4    41   144   254   173   223   246    77    31   244   115   127   251   245   127     2 ...
    61   208     4    87   240     1   251   127   251   129   255   246   133   127   247   143   191   251     5   112     3 ...
    79   144     4    53   144     2   247   127   251   127   255   246   113    95   247   183   255   255    33   112     3 ...
    71   144     2   221   127   251   195   223   249   115   255   246   145    63   251   235    63     1    49   240     3 ...
    53   176   255   225   255   250   129   191   246   137   159   249   199   159   254    13   144     2    61   240     2 ...
    25   144   254   195    63   249   163    31   250   141   127   247   153   191   252     9    16     3    61    48     2 ...
    19   176   254   179    63   248   139   255   247   131   191   249   205    95   254     5    48     1    19    48   255 ...
   211    63   252   185   191   250   165   223   249   165    95   252   233   223   254   219   255   253   247   159   255 ...
   231   159   251   203    63   252   189   255   249   171   255   252   253   223   255   242    79   254   226   207   252 ...
   220   207   252   212   175   253   232   207   253   220   207   253   252   143     0   250   175   254   236    15   253 ...
   198    47   252   222    15   255   242   207   253   226   143   254   232    47   255   244   111   252   206   207   255 ...
     8    64   255   248   239   253   232   111     0    16    96   255   230    47   253   236   111   255     6    96     0 ...
   254   143   254   250   239   255   226   175   252   218   111   254     0     0     0   248   143   253   224   239   254 ...
     0   192   254   234   111   254];

This code reproduces your plot:

data = int32(reshape(data, 3, []));
values = zeros(2, size(data, 2), 'int32');
values(2, :) = bitshift(data(3, :), 4) + bitshift(data(2, :), -4);
values(1, :) = bitshift(bitand(data(2, :), 15), 8) + data(1, :);  % the `bitand` is not really necessary, as we shift right later by 20, dropping the top 4 bits
values = single(bitshift(values, 20)) * 2^-31;
values = reshape(values, 1, []);
plot(values)

We're converting the individual values to int32, because your bit shift operations work on that bit length. We then combine the data by shifting according to the schematic, shift left by 20 bits (causing values where the top bit is 1 to become negative) and multiplying by 2^-31 for correct scaling.

To make the indexing easy, we're reshaping the array of 300 values to 3x100, and writing the result in an array 2x100. We could also have used simple indexing instead:

data = int32(data);
values = zeros(1, numel(data) / 3 * 2, 'int32');
values(2:2:end) = bitshift(data(3:3:end), 4) + bitshift(data(2:3:end), -4);
values(1:2:end) = bitshift(data(2:3:end), 8) + data(1:3:end);
values = single(bitshift(values, 20)) * 2^-31;
plot(values)
Sign up to request clarification or add additional context in comments.

2 Comments

Then I wrote the layout wrong in the question. The code I've posted works well, and we have it in production for years now.
I corrected the layout given in the question. Yes the whole thing is little endian.
3

Given that you already have an algorithm working, tested, and fast enough in python, I would use this to convert the data to a matlab .mat file. Then load this into matlab and continue to work in matlab from that point onwards.

python script:

from scipy.io import savemat
dat = unpack_s12p_to_f32(data)
savemat("matlab_matrix.mat", {"data": dat, "label": "data imported and converted to matlab using python"})

then in matlab:

load('matlab_matrix.mat')
plot(data)

enter image description here

Later to streamline the workflow, this could be done by a simple python script called from the command line when you get new data.

Is there a reason you want to import this directly to matlab with no interpretation step?

2 Comments

Well, this is exactly what I suggested to my coworker, who insists on using Matlab. I even gave them the very exact script you suggested. They don't like that approach. But yes, this is what I would do as well. This is a mountain and prophet kind of situation…
would it help convince them if you invoke it from inside matlab using pyrunfile ?
2

If I understand correctly, the MATLAB code below might fit your purpose, where bitshift and bitand can be used.

clc;
clearvars;

rng(0);
x = int32(rand*2^24)
buf = dec2bin(x,24)

sample0 = buf(1:12);
sample1 = buf(13:end);

% for sample 0
x_sample0 = bin2dec(sample0)
x_sample0_unpack = bitshift(x,-12)

% for sample 1
x_sample1 = bin2dec(sample1)
x_sample1_unpack = bitand(x, 2^12-1)

and you will see

x =

  int32

   13668795


buf =

    '110100001001000110111011'


x_sample0 =

        3337


x_sample0_unpack =

  int32

   3337


x_sample1 =

   443


x_sample1_unpack =

  int32

   443

5 Comments

Doing it sample by sample is too slow: The amount of data this needs to operate on can be easily several dozen GiB in size, sliced into acquisition records on the order of ~100 MiB. With NumPy and stride tricks you can broadcast the operations over the whole buffer and take advantage of the optimized close-to-metal implementation.
You can exactly replicate your NumPy code using the functions in this answer. All the bit… functions work element-wise on arrays. Also, a loop over elements in an array is not slow in MATLAB. Try it first, if it might be fast enough for you. Put in the effort of vectorizing only if you need it a bit faster.
How long does your Python implementation take on each record?
@datenwolf That can pehaps be done in Matlab too, but it would help if you provided a small input / desired output example in your question
@LuisMendo I've added a small data example to my question text
1

I think you can process the data in MATLAB, but you need some pre-processing on the buffer data since MATLAB has no equivalent to np.frombuffer as in Python. In this sense, you can have call Python to parse the data first, and then hand it to MATLAB to unpack the bytes.


With your data, probably you can do it in MATLAB like this (you can run np.frombuffer in get_data.py script to parse the data)

clc;
clearvars;

% run Python script to parse and save data
!python get_data.py

% load data to MATLAB
fid = fopen("dat.txt","r");
dat = int32(fscanf(fid, "%lu"));
fclose(fid);

% conversion
x = int32(sum(int32(2.^(0:8:16))'.*reshape(dat,3, []),1));
s = [bitand(x, 2^12-1); bitshift(x,-12)];

% plot
plot(single(bitshift(s(:),20))/2^31)

and you will see enter image description here

where get_data.py file is given with

import numpy as np

data = \
b'\x19P\x05g\x90\x05I\x10\x01\xcf_\xfa\x87\x7f\xf5a\xbf\xf7\xb7\xff\xff9\xf0\x04]P' \
b'\x04)\x90\xfe\xad\xdf\xf6M\x1f\xf4s\x7f\xfb\xf5\x7f\x02=\xd0\x04W\xf0\x01\xfb\x7f' \
b'\xfb\x81\xff\xf6\x85\x7f\xf7\x8f\xbf\xfb\x05p\x03O\x90\x045\x90\x02\xf7\x7f\xfb' \
b'\x7f\xff\xf6q_\xf7\xb7\xff\xff!p\x03G\x90\x02\xdd\x7f\xfb\xc3\xdf\xf9s\xff\xf6' \
b'\x91?\xfb\xeb?\x011\xf0\x035\xb0\xff\xe1\xff\xfa\x81\xbf\xf6\x89\x9f\xf9\xc7\x9f' \
b'\xfe\r\x90\x02=\xf0\x02\x19\x90\xfe\xc3?\xf9\xa3\x1f\xfa\x8d\x7f\xf7\x99\xbf\xfc' \
b'\t\x10\x03=0\x02\x13\xb0\xfe\xb3?\xf8\x8b\xff\xf7\x83\xbf\xf9\xcd_\xfe\x050\x01' \
b'\x130\xff\xd3?\xfc\xb9\xbf\xfa\xa5\xdf\xf9\xa5_\xfc\xe9\xdf\xfe\xdb\xff\xfd\xf7' \
b'\x9f\xff\xe7\x9f\xfb\xcb?\xfc\xbd\xff\xf9\xab\xff\xfc\xfd\xdf\xff\xf2O\xfe\xe2\xcf' \
b'\xfc\xdc\xcf\xfc\xd4\xaf\xfd\xe8\xcf\xfd\xdc\xcf\xfd\xfc\x8f\x00\xfa\xaf\xfe\xec' \
b'\x0f\xfd\xc6/\xfc\xde\x0f\xff\xf2\xcf\xfd\xe2\x8f\xfe\xe8/\xff\xf4o\xfc\xce\xcf\xff' \
b'\x08@\xff\xf8\xef\xfd\xe8o\x00\x10`\xff\xe6/\xfd\xeco\xff\x06`\x00\xfe\x8f\xfe\xfa' \
b'\xef\xff\xe2\xaf\xfc\xdao\xfe\x00\x00\x00\xf8\x8f\xfd\xe0\xef\xfe\x00\xc0\xfe\xeao\xfe'


s12p = np.frombuffer(data, dtype=np.uint8)
np.savetxt('dat.txt', s12p, fmt = "%i")

6 Comments

"MATLAB has no equivalent to np.frombuffer" because when you import the data into MATLAB it already is a matrix, so there is no need to convert to a matrix. Of course you can read raw data in MATLAB, it just ends up in a matrix form right away. You have typecast to convert a uint8 matrix (raw bytes) to int32 or whatever.
I like your way of converting bytes to numbers in groups of 24 bits.
@CrisLuengo thank you for your comment! Could you shed some lights on how the raw data (as provided by OP in the example) can be imported to MATLAB? I have no idea about that. Much appreciated!
OP provided the data as a Python string, it’s not “raw data” in that sense, OP likely converted to string in Python to post here. MATLAB doesn’t parse strings in the same way. MATLAB can read raw data stored on disk in a binary file, or read as bytes through some network connection or a serial port or whatever. In all those cases you immediately end up with an array (either chars or uint8 or some other bit depth value; and unfortunately often by default, converted to doubles).
@ThomasIsCoding the measurement system (that actually I designed and built) is writing the digitized data into store-mode (i.e. no compression) ZIP archive files, with each trigger event constituting a separate entry that basically is a raw dump of the data as transferred by the digitizer into the record buffers. A few tricks with ZIP entry comments are done to pad the start of each entry payload to align with page/block boundaries, but that's a "transparent" detail.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.