8

So I am relatively new to IPC and I have a c program that collects data and a python program that analyses the data. I want to be able to:

  1. Call the python program as a subprocess of my main c program
  2. Pass a c struct containing the data to be processed to the python process
  3. Return an int value from the python process back to the c program

I have been briefly looking at Pipes and FIFO, but so far cannot find any information to address this kind of problem, since as I understand it, a fork() for example will simply duplicate the calling process, so not what I want as I am trying to call a different process.

4
  • 1
    It's unclear (to me at least) what you specifically want to do. I don't think the code you posted is relevant (or is it)? Are you trying to just call some C code from Python, that passes back an forth the data shown in the C struct? Commented Dec 16, 2015 at 15:12
  • Thank you Brian, I have edited the question to give it more focus Commented Dec 16, 2015 at 15:21
  • If possible, I would serialize the data to a string and pass it as argument to the c-program and take the exit code as the return value. This is of course not possible, if the amount of data passed or the return value can be too big. Commented Dec 16, 2015 at 15:24
  • The approach I used in a very similar situation was to use shared memory (with mmap). The data to be passed to the python script was put in the shared memory then the python process was called which then did it's processing and put the results back into the shared memory. I wrote a library to make dealing with the mmap backed data easier on the python side of things. If things are more complex than this perhaps go with a message queue based approach. Commented Dec 16, 2015 at 15:35

4 Answers 4

7

About fork() and the need to execute a different process. It is true that fork() creates a copy of the current process. But this is usually coupled with exec() (one of the various forms) to get the process copy to execute a different program.

As for IPC, you have several choices. Someone mentioned a queue - but something like ZeroMQ is a overkill. You can do IPC with one of several mechanisms.

  1. Pipes (named pipes or anonymous)
  2. Unix domain sockets
  3. TCP or UDP via the sockets API
  4. Shared memory
  5. Message queues

The pipe approach is the easiest. Note that when you pass data back and forth between the C program and Python, you will need to worry about the transfer syntax of the data. If you choose to use C structs (which can be non portable), you will need to unpack the data on the Python side. Else you can use some textual format - combination of sprintf/sscanf, or JSON etc.

Sign up to request clarification or add additional context in comments.

4 Comments

With a due respect, a ZeroMQ approach is by far not an overkill. Supposing Osmond has stated on his own to be a "relatively new" to inter-process communications, the very ZeroMQ concept of thinking may save his future from repeated falling into so many common traps of amateur / low-quality raw pipe / sock concurrency programming. This is based on real experience with a team learning-curve and productivity once moved the architecture right into re-use of ZeroMQ tools in Scaleable Formal Patterns ( highly abstract behaviours, rather than hacking coordination of low-level resources )
That's a good point. But ZeroMQ (or similar) is not always the answer. Each problem calls for a suitable tool. If I have to do simple IPC between two processes to get something done, I am certainly not going to use ZeroMQ. Each tool has its use and place.
With a wisdom of “When the only tool you have is a hammer, every problem begins to resemble a nail.” ( Maslows's hammer ) in mind, still I am pretty sure, Maslow's Hammer is not the bottom-line issue. How low is the probability a new coder will within a reasonable time create a distributed ( N+M ) failure-protected ( fail-safe ), performance-scaleable messaging between running processes, natively developed in C++ - python - MQL4 - java - FORTRAN - LISP - Go - PHP - ... you name 'em all ... C#? My life-experience based guesstimate is Zero. ( probability )
Just chiming in that msgpack may help with data syntax. There are nice msgpack libraries for C and Python. There are even msgpack libraries for microcontrollers if needed.
6

I suggest looking at the application and structuring the issues you are confronted with.

Multi-threading

Starting two processes is by far not the biggest issue, as Ziffusion said you can have another process do something else. Plus there are python bindings for C, so you can create another thread for example (no need for it to be a process) and call your python routines from the C program.

Communication

Sharing information is more interesting as you have to solve two issues: one is technically getting the data from one place to another and viceversa; the other is how two different things can work on the same data. This goes into messaging patterns and process flow:

  • who generates the data?
  • who receives the data?
  • is there a piece of code waiting for something before proceeding?
  • is there the need to control what happens to the data while the data is processed?
  • do I want to code it myself?
  • can I use libraries in the project?
  • are there security limitations?
  • ...

Once you answer the above questions, you can define how your pieces of the application are going to interact. One main distinction is synchronous vs asynchronous.

Sync vs Async

Synchronous means that for every message there is a reply which should be contained in a time envelope of finite (usually as small as possible) size. This in order to avoid latency. This a pattern best used when you have to finely control what's happening, or you need an answer to the question in timely manner. It is, in fact, how http works to download web pages: whenever you load a web site, you want to see the content right now. This is a pattern called REQuest/REPly

Asynchronous is often used in case of heavy processing: the data producer (for example a a database interface, or a sensor) sends a bulk of data to a worker thread, without waiting for an answer. The worker thread then starts doing its job on the data, and when it's done sends the results to a data sink/user. This pattern is called PUBlish/SUBscribe.

There are many others, but these form the basics of communication.

Marshalling

Another issue you face is how to structure the data passing, marshalling. How to get the meaning and content of your data from one context to a totally different one. For example from your C part to your Python part. Maintaining serializing libraries is tedious and perilous not to mention prone to backward compatibility issues.

Implementation

When you come to implementation you usually want the cleanest and most powerful code. The two things are clearly against each other. So I usually go look for a library that can do exactly what I need. In this case my advice is to try ZeroMQ: it is thin, flexible, low-level. It will give you a powerful framework to interface threads, processes and even machines. ZeroMQ provides the link, but you still need a protocol to run over this link. To avoid incredible headaches and streamline your work with respect to the marshaling issue, I suggest you investigate available marshaling libraries that make this task easy. Cap'n proto, Flatbuffers, Protocol buffers (Google, can't post more than 2 links yet) They make it easy to define your data in an intermediate language, and parse it from any other language without you having to write all the classes yourself.

As for pipes and shared memory my humble opinion is: forget they exist.

2 Comments

A lovely piece of wisdom, Claudio. More members with your approach to both system design & S/O answers. My hat is raised
@user3666197 thank you for the kind words. I try to keep my approach broader more often than not.
1

The way you are organizing the architecture is a bit messy. What you really want is Message Queues. So in your example:

  • Your python worker listens for new information to process in queue A;
  • Your C program input data in queue A;
  • Your python worker process the data and queue the result into queue B;
  • Your C program listens for new items on queue B;

This may vary, but the concept is simple.

They are easy to implement, and has tons of libraries and tools to aid you on this task. ZeroMQ would do for you, for sure. It works with C and Python.

1 Comment

I'll take a look at ZeroMQ, it would be interesting to see if this can be done via shared memory though too. Message queuing is not something I was aware of before so thank you for introducing me to the concept
0

If your struct is simple enough, you could even not use IPC at all. Provided, you can serialize it as string parameters that could be used as program arguments and provided the int value to return can be in the range 0-127, you could simply:

  • in C code:

    • prepare the command arguments to pass to the Python script
    • fork-exec (assuming a Unix-like system) a Python interpretor with the script path and the script arguments
    • wait for child termination
    • read what the script passed as code termination
  • in Python:

    • get the arguments from command line and rebuild the elements of the struct
    • process it
    • end the script with exit(n) where n is an integer in the range 0-127 that will be returned to caller.

If above does not meet your requirements, next level would be to use pipes:

  • in C code:

    • prepare 2 pipe pairs one for C->Python (let's call it input), one for Python->C (let's call it output)
    • serialize the struct into a char buffer
    • fork
    • in child
      • close write side of input pipe
      • close read side of output pipe
      • dup read side of input pipe to file descriptor 0 (stdin) (see `dup2)
      • dup write side of output pipe to file descriptor 1 (stdout)
      • exec a Python interpretor with the name of the script
    • in parent
      • close read side of input pipe
      • close write side of output pipe
      • write the buffer (eventually preceded by its size if it cannot be known a priori) to the write side on input file
      • wait for the child to terminate
      • read the return value from the read side of output pipe
  • in Python:

    • read the serialized data from standard input
    • process it
    • write the output integer to standard output
    • exit

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.