10

I have a library function (written in C) that generates text by writing the output to FILE *. I want to wrap this in Python (2.7.x) with code that creates a temp file or pipe, passes it into the function, reads the result from the file, and returns it as a Python string.

Here's a simplified example to illustrate what I'm after:

/* Library function */
void write_numbers(FILE * f, int arg1, int arg2)
{
   fprintf(f, "%d %d\n", arg1, arg2);
}

Python wrapper:

from ctypes import *
mylib = CDLL('mylib.so')


def write_numbers( a, b ):
   rd, wr = os.pipe()

   write_fp = MAGIC_HERE(wr)
   mylib.write_numbers(write_fp, a, b)
   os.close(wr)

   read_file = os.fdopen(rd)
   res = read_file.read()
   read_file.close()

   return res

#Should result in '1 2\n' being printed.
print write_numbers(1,2)

I'm wondering what my best bet is for MAGIC_HERE().

I'm tempted to just use ctypes and create a libc.fdopen() wrapper that returns a Python c_void_t, then pass that into the library function. I'm seems like that should be safe in theory--just wondering if there are issues with that approach or an existing Python-ism to solve this problem.

Also, this will go in a long-running process (lets just assume "forever"), so any leaked file descriptors are going to be problematic.

7
  • os.popen() is incorrect. It requires at least one argument, the command line to invoke and get pipes to. Besides, it's deprecated in favour of subprocess, as the docs say. Commented Oct 23, 2015 at 20:26
  • Sorry, I meant os.pipe(). Updated. Commented Oct 23, 2015 at 20:30
  • 1
    Unless you're also planning to run this on Windows, which has the problem of potentially mismatched C runtime libraries, then I don't think you'll have any problem calling libc.fdopen and passing the resulting FILE pointer. But instead of using c_void_p, I'd create an opaque class FILE(Structure): pass and set libc.fdopen.restype = POINTER(FILE). This won't be converted to an integer result. OTOH, c_void_p as the restype gets converted to an integer, so you'd have to make sure that mylib.write_numbers.argtypes is also set to prevent truncating a 64-bit pointer value. Commented Oct 23, 2015 at 20:43
  • Did you consider using fmemopen? If the amount of data that will ever be written by a single write_numbers call is bounded b a reasonably small fixed constant, it could provide a good alternative to using a pipe. Commented Oct 24, 2015 at 1:46
  • 1
    @BrianMcFarland You don't have to (and I'm not sure you even can) read the FILE * back in. But you can simply read the char[] array that you passed to fmemopen. Commented Oct 27, 2015 at 6:28

1 Answer 1

5

First, do note that FILE* is an stdio-specific entity. It doesn't exist at system level. The things that exist at system level are descriptors (retrieved with file.fileno()) in UNIX (os.pipe() returns plain descriptors already) and handles (retrieved with msvcrt.get_osfhandle()) in Windows. Thus it's a poor choice as an inter-library exchange format if there can be more than one C runtime in action. You'll be in trouble if your library is compiled against another C runtime than your copy of Python: 1) binary layouts of the structure may differ (e.g. due to alignment or additional members for debugging purposes or even different type sizes); 2) in Windows, file descriptors that the structure links to are C-specific entities as well, and their table is maintained by a C runtime internally1.

Moreover, in Python 3, I/O was overhauled in order to untangle it from stdio. So, FILE* is alien to that Python flavor (and likely, most non-C flavors, too).

Now, what you need is to

  • somehow guess which C runtime you need, and
  • call its fdopen() (or equivalent).

(One of Python's mottoes is "make the right thing easy and the wrong thing hard", after all)


The cleanest method is to use the precise instance that the library is linked to (do pray that it's linked with it dynamically or there'll be no exported symbol to call)

For the 1st item, I couldn't find any Python modules that can analyze loaded dynamic modules' metadata to find out which DLLs/so's it have been linked with (just a name or even name+version isn't enough, you know, due to possible multiple instances of the library on the system). Though it's definitely possible since the information about its format is widely available.

For the 2nd item, it's a trivial ctypes.cdll('path').fdopen (_fdopen for MSVCRT).


Second, you can do a small helper module that would be compiled against the same (or guaranteed compatible) runtime as the library and would do the conversion from the aforementioned descriptor/handle for you. This is effectively a workaround to editing the library proper.


Finally, there's the simplest (and the dirtiest) method using Python's C runtime instance (so all the above warnings apply in full) through Python C API available via ctypes.pythonapi. It takes advantage of

  • the fact that Python 2's file-like objects are wrappers over stdio's FILE* (Python 3's are not)
  • PyFile_AsFile API that returns the wrapped FILE* (note that it's missing from Python 3)
    • for a standalone fd, you need to construct a file-like object first (so that there would be a FILE* to return ;) )
  • the fact that id() of an object is its memory address (CPython-specific)2

    >>> open("test.txt")
    <open file 'test.txt', mode 'r' at 0x017F8F40>
    >>> f=_
    >>> f.fileno()
    3
    >>> ctypes.pythonapi
    <PyDLL 'python dll', handle 1e000000 at 12808b0>
    >>> api=_
    >>> api.PyFile_AsFile
    <_FuncPtr object at 0x018557B0>
    >>> api.PyFile_AsFile.restype=ctypes.c_void_p   #as per ctypes docs,
                                             # pythonapi assumes all fns
                                             # to return int by default
    >>> api.PyFile_AsFile.argtypes=(ctypes.c_void_p,) # as of 2.7.10, long integers are
                    #silently truncated to ints, see http://bugs.python.org/issue24747
    >>> api.PyFile_AsFile(id(f))
    2019259400
    

Do keep in mind that with fds and C pointers, you need to ensure proper object lifetimes by hand!

  • file-like objects returned by os.fdopen() do close the descriptor on .close()
    • so duplicate descriptors with os.dup() if you need them after a file object is closed/garbage collected
  • while working with the C structure, adjust the corresponding object's reference count with PyFile_IncUseCount()/PyFile_DecUseCount().
  • ensure no other I/O on the descriptors/file objects since it would screw up the data (e.g. ever since calling iter(f)/for l in f, internal caching is done that's independent from stdio's caching)
Sign up to request clarification or add additional context in comments.

24 Comments

If you're worried about the library using a different C runtime (mostly a Windows problem), then using PyFile_AsFile solves nothing, and limits the code to Python 2 for no good reason. Why bring Cython into the discussion? That's a random segue.
Also, never pass id(f) as a pointer. You want py_object(f) to pass a Python object -- as PyObject * for CPython. Using id to get a base address is specific to CPython, and passing Python integers as arguments also defaults to being converted as 32-bit C int values, which will truncate a 64-bit pointer value.
I'd like to see some backing for "truncating pointers to integers". Python does have a notion of long integers, you know, and there's completely no reason to truncate a c_void_p.
What's your aversion to setting api.PyFile_AsFile.argtypes=(ctypes.py_object,) and calling as api.PyFile_AsFile(f)? It's simpler, and also the intended usage.
@ivan_pozdeev - As a fairly experienced C programmer, this is the first I've heard the notion that using a FILE * as part of a public API is a bad idea. Not saying you're wrong--I'm rarely writing libraries meant for public use. But are you really saying the use of a file number is superior? FILE * is part of the C standard. File descriptors that come from open, e.g. are not. So you're saying while stdio.h is far more portable, it's bad to use for public APIs? Have you ever seen this cause a problem in practice? Read a blog post on it? Or is this purely speculative?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.