Create PyString from c character array without copying

Question

I have a large buffer of strings (basically 12GB) from a C app.

I would like to create PyString objects in C for an embedded Python interpreter without copying the strings. Is this possible?

Anything is possible in computing, given enough time, money and computing resources. Is that really your question? — Robert Harvey
– Robert Harvey, Commented Jul 31, 2014 at 19:54
@RobertHarvey no that example uses a copy. See docs.python.org/2/c-api/string.html#PyString_FromStringAndSize — aterrel
– aterrel, Commented Jul 31, 2014 at 20:12
BufferProtocols and NumPy works this way, just give the c pointer. I was hoping there is a way to do this with strings. — aterrel
– aterrel, Commented Jul 31, 2014 at 20:15
@Santa do you have an example of calling ctypes from C to an embedded Python interpreter? — aterrel
– aterrel, Commented Jul 31, 2014 at 20:34

Travis Oliphant · Accepted Answer · 2014-07-31 21:32:40Z

7

I don't think that is possible for the basic reason that Python String objects are embedded into the PyObject structure. In other words, the Python string object is the PyObject_HEAD followed by the bytes of the string. You would have to have room in memory to put the PyObject_HEAD information around the existing bytes.

answered Jul 31, 2014 at 21:32

Travis Oliphant

1,76514 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

aterrel Over a year ago

Can I just use numpy.str_? It seems these have problems comparing to other PyStrings though.

aterrel · Accepted Answer · 2014-08-06 03:22:44Z

One can't use PyString without a copy, but one can use ctypes. Turns out that ctypes.c_char_p works basically like a string. For example with the following C code:

static char* names[7] = {"a", "b", "c", "d", "e", "f", "g"};                                      
PyObject *pFunc, *pArgs, *pValue;
pFunc = td_py_get_callable("my_func");
pArgs = PyTuple_New(2);
pValue = PyLong_FromSize_t((size_t) names);
PyTuple_SetItem(pArgs, 0, pValue);
pValue = PyLong_FromLong(7);
PyTuple_SetItem(pArgs, 1, pValue);
pValue = PyObject_CallObject(pFunc, pArgs);

One can then pass the address and the number of character strings With the following python my_func:

def my_func(names_addr, num_strs):
    type_char_p = ctypes.POINTER(ctypes.c_char_p)
    names = type_char_p.from_address(names_addr)
    for idx in range(num_strs):
        print(names[idx])

Of course who really wants to pass around a address and a length in Python. We can put these in a numpy array and pass around then cast if we need to use them:

def my_func(name_addr, num_strs):
    type_char_p = ctypes.POINTER(ctypes.c_char_p)
    names = type_char_p.from_address(names_addr)
    // Cast to size_t pointers to be held by numpy
    p = ctypes.cast(names, ctypes.POINTER(ctypes.c_size_t))
    name_addrs = numpy.ctypeslib.as_array(p, shape=(num_strs,))
    // pass to some numpy functions
    my_numpy_fun(name_addrs)

The challenge is that evaluating the indices of numpy arrays is only going to give you an address, but the memory is the same as the original c pointer. We can cast back to a ctypes.POINTER(ctypes.c_char_p) to access values:

def my_numpy_func(name_addrs):
    names = name_addrs.ctypes.data_as(ctypes.POINTER(ctypes.c_char_p))
    for i in range(len(name_addrs)):
        print names[i]

It's not perfect as I can't use things like numpy.searchsorted to do a binary search at the numpy level, but it does pass around char* without a copy well enough.

Collectives™ on Stack Overflow

Create PyString from c character array without copying

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related