How can I malloc a dynamic buffer from byte data with ctypes?

Question

Every reference I find for creating a buffer in ctypes seems to create one of static length...
Where I'm dealing with data read from a file handled by ctypes that defines inline buffers within a struct where the length is initially unknown until read.

import ctypes

class Buffer16(ctypes.Structure):
    _fields_ = [
        ('length', ctypes.c_ushort.__ctype_be__ ),
        ('data', ctypes.c_ubyte*0 ) # to be resized via malloc
    ]

    def __new__(cls): # not executed for some reason
        b16 = ctypes.Structure.__new__(cls) # wish I could interrupt before reading the 0-length array...
        # some unknown magic here to malloc b16.data
        return b16

class Test(ctypes.Structure):
    _fields_ = [
        ('data', ctypes.c_uint.__ctype_be__ ),
        ('buf1', Buffer16 ),
        ('buf2', Buffer16 )
    ]

I can easily define the data as a c_ubyte array as read from the file, and initialize the struct with Structure.from_address(ctypes.addressof(bytedata))...
But the problem here is __new__ and __init__ don't get executed, so the buffers aren't sized appropriately.

here's some test data for an example:

>>> bytedata = (ctypes.c_ubyte*19)(*b'\x00\x04\x18\x80\x00\x04test\x00\x07testing')
>>> 
>>> testinstance = Test.from_address(ctypes.addressof(bytedata))
>>> testinstance.data # just some dummy data which is correct
268416
>>> testinstance.buf1.length # this is correct
4
>>> testinstance.buf1.data # this should be __len__ == 4
<__main__.c_ubyte_Array_0 object at 0x...>
>>> testinstance.buf2.length # this is wrong (0x7465 from b'te'), it should be 7
29797

Is there a better way that can inline malloc than from_address?
(casting is no different from from_address other than testinstance[0])

Mark Tolonen · Accepted Answer · 2021-07-13 02:42:19Z

2

You've got variable-sized data in your structure. How would you create this structure in C? Typically only the last element in a structure can be an array and C allows one index beyond the end of the structure, but in this case you have two variables.

Although it can be done in ctypes, I'll first suggest unpacking the data as you go with the struct module. If you are reading the data from a file, all you really care about is obtaining the data and the buffers and it doesn't need to be in ctypes format, nor do you need the lengths beyond their use reading the buffers:

import struct
import io

# create a file-like byte stream
filedata = io.BytesIO(b'\x00\x04\x18\x80\x00\x04test\x00\x07testing')

data,len1 = struct.unpack('>LH',filedata.read(6))
data1 = filedata.read(len1)
len2, = struct.unpack(f'>H',filedata.read(2))
data2 = filedata.read(len2)
print(hex(data),data1,data2)

Output:

0x41880 b'test' b'testing'

Here's a way to do it in ctypes by creating a custom class definition for each structure, but is the data really needed in a ctypes format?

import struct
import ctypes
import io

# Read a variable-sized Buffer16 object from the file.
# Once the length is read, declare a custom class with data of that length.
def read_Buffer16(filedata):
    length, = struct.unpack('>H',filedata.read(2))
    class Buffer16(ctypes.BigEndianStructure):
        _fields_ = (('length', ctypes.c_ushort),
                    ('data', ctypes.c_char * length))
        def __repr__(self):
            return f'Buffer16({self.length}, {self.data})'
    return Buffer16(length,filedata.read(length))

# Read a variable-sized Test object from the file.
# Once the buffers are read, declare a custom class of their exact type.
def read_Test(filedata):
    data, = struct.unpack('>L',filedata.read(4))
    b1 = read_Buffer16(filedata)
    b2 = read_Buffer16(filedata)
    class Test(ctypes.BigEndianStructure):
        _fields_ = (('data', ctypes.c_uint),
                    ('buf1', type(b1)),
                    ('buf2', type(b2)))
        def __repr__(self):
            return f'Test({self.data:#x}, {self.buf1}, {self.buf2})'
    return Test(data,b1,b2)

# create a file-like byte stream
filedata = io.BytesIO(b'\x00\x04\x18\x80\x00\x04test\x00\x07testing')

t = read_Test(filedata)
print(t)

Output:

Test(0x41880, Buffer16(4, b'test'), Buffer16(7, b'testing'))

Edit per comment

This might be how you'd store this file data in a C-like structure. The variable buffers are read in, stored in an array (similar to C malloc) and its length and address are stored in the structure. The class methods know how to read a particular structure from the file stream and return the appropriate object. Note, however, that just like in C you can read past the end of a pointer and risk exceptions or undefined behavior.

import struct
import ctypes
import io

class Buffer16(ctypes.Structure):
    _fields_ = (('length', ctypes.c_ushort),
                ('data', ctypes.POINTER(ctypes.c_char)))

    @classmethod
    def read(cls,file):
        length, = struct.unpack('>H',file.read(2))
        data = (ctypes.c_char * length)(*file.read(length))
        return cls(length,data)

    def __repr__(self):
        return f'Buffer16({self.data[:self.length]})'

class Test(ctypes.Structure):
    _fields_ = (('data', ctypes.c_uint),
                ('buf1', Buffer16),
                ('buf2', Buffer16))

    @classmethod
    def read(cls,file):
        data, = struct.unpack('>L',file.read(4))
        b1 = Buffer16.read(file)
        b2 = Buffer16.read(file)
        return cls(data,b1,b2)

    def __repr__(self):
        return f'Test({self.data:#x}, {self.buf1}, {self.buf2})'

# create a file-like byte stream
file = io.BytesIO(b'\x00\x04\x18\x80\x00\x04test\x00\x07testing')

t = Test.read(file)
print(t)
print(t.buf1.length)
print(t.buf1.data[:10]) # Just like in C, you can read beyond the end of the pointer

Output:

Test(0x41880, Buffer16(b'test'), Buffer16(b'testing'))
4
b'test\x00\x00\x00\x00\x00\x00'

edited Jul 13, 2021 at 2:42

answered Jul 11, 2021 at 22:53

Mark Tolonen

181k26 gold badges183 silver badges279 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Tcll Over a year ago

just for a fun fact, the data I'm actually working with contains 5 inline buffers per struct in an array, so cast() would actually work in my favor here (while combined struct size < data size) if that could be made to work. Also no it doesn't need to be done with ctypes per-se, I could do it with array.array just fine, but since I'm working on a package module around _ctypes, I was hoping I could keep it consistent if possible. (I find it rather annoying the solution is to define multiple classes for each buffer length)

Mark Tolonen Over a year ago

@Tcll It’s an odd structure. ctypes parallels C structures. To read a file in this format you’d more likely malloc the strings and store pointers in a C structure, not arrays. You’d be better off reading the data with struct and storing it a regular Python object.

Tcll Over a year ago

"you’d more likely malloc the strings and store pointers in a C structure, not arrays." This was actually initially what I was trying to do before the cobbled patch-job in the question... could I get an answer showing how to do things that way since it's more proper :+1:

Tcll Over a year ago

excellent and interesting update, definitely a workaround, but certainly not one that feels slimy. :+1: hopefully this'll inspire others who come across this as well.

Tcll · Accepted Answer · 2021-07-27 05:08:51Z

With credit to and inspiration from Mark Tolonen's answer, I realized his answer was a similar mechanic to the ctypes.Structure.from_address() method.

Here's my answer and tests with my updates to his:

from ctypes import Structure, c_char, c_ushort, c_uint, POINTER, addressof

c_bushort = c_ushort.__ctype_be__
c_buint = c_uint.__ctype_be__

class Buffer16(Structure):
    _fields_ = (
        ('length', c_bushort),
        ('data', POINTER( c_char ))
    )

    @classmethod
    def from_address(cls, addr):
        length = c_bushort.from_address( addr ).value
        data   = ( c_char*length ).from_address( addr+2 )
        return cls( length, data )

class Test(Structure):
    _fields_ = (
        ('data', c_buint),
        ('buf1', Buffer16),
        ('buf2', Buffer16)
    )

    @classmethod
    def from_address(cls, addr):
        data = c_buint.from_address( addr )
        b1   = Buffer16.from_address( addr+4 )
        b2   = Buffer16.from_address( addr+6+b1.length )
        return cls( data, b1, b2 )

bytedata = ( c_char*19 )( *b'\x00\x04\x18\x80\x00\x04test\x00\x07testing' )
t = Test.from_address( addressof( bytedata ) )

print( t.data )
print( t.buf1.data[:t.buf1.length] )
print( t.buf2.data[:t.buf2.length] )

and the results:

>>>
268416
b'test'
b'testing'

Also a minor note about the enforcement of .__ctype_be__ on ctypes.c_uint and ctypes.c_ushort...

Not all systems use the same default endian when reading data.

My systems in particular read data in little endian, so b'\x00\x04\x18\x80' returns 2149057536 when processed with ctypes.c_uint, rather than the expected 268416.

Collectives™ on Stack Overflow

How can I malloc a dynamic buffer from byte data with ctypes?

2 Answers 2

Edit per comment

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Edit per comment

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related