19

I work in Python, and I want to find a workflow for enabling two processes (main-process and sub-process) to communicate with each other. By that, I mean the ability of main-process to send some data to sub-process (perhaps, by writing to sub-process's stdin) and the ability of sub-process to send some data back to the main one. This also implies that both can read the data sent to them (I was thinking of reading from stdin).

I was trying to use subprocess library, but it seems that it's intended to work with processes that are designed to give an output only once and then terminate, whereas I want to exchange data dynamically and shut the sub-process down only when such a command is received.

I've read lots of answers here on StackOverflow tackling problems closely related to mine, but none of them did I find satisfying, as the questions those answers were meant to were different from mine in one important detail: I need my main-process to be able to exchange data with its sub-process dynamically as many times as needed, not just once, which in turn implies that the sub-process should run until it receives a certain command from main-process to terminate.

I'm open to using third-party libraries, but it would be much better if you proposed a solution based solely on the Python Standard Library.

6
  • Possible duplicate of IPC with a Python subprocess Commented Jan 6, 2019 at 10:35
  • 1
    @Omer Nope, it's not. First, the task in the question you're referring to is more specific in that the subprocess's standard streams can't be used as they're reserved for something else. Second, I was asking for dynamic exchange of data, while the implementation proposed there — as I understand — produces the result that is opposite to what I'm looking for. Commented Jan 6, 2019 at 12:03
  • 1
    Been a while since I worked with pipes but iirc they allow you to dynamically exchange data between processes, here's a non-python explanation: tldp.org/LDP/lpg/node7.html Pretty sure the linked thread talks about something similar, try re-examining it (I might be wrong though) Commented Jan 6, 2019 at 12:53
  • It still does sound like you're looking for good old IPC... for two Python processes. Which: a) Sort of makes the question not to be really Python specific; b) there are few bits missing to choose the most suitable IPC mechanism (before looking at how to use it in Python perhaps); We know the processes are expected to exchange data over longer period of time. Is this communication bi-directional? Is the sub-process only expected to live as long as the "main" process does or can it actually spin off life of its own? Nature of information passed?... Commented Jan 6, 2019 at 16:51
  • @OndrejK. Sorry if something in my question appeared to be vague to you, though I thought I put it as clearly as I could: I need the main-process to be able to spawn the sub-process and send data to it (it can be a plain text) while also be able to receive messages (can be a text as well) from it; the sub-process should run until the main-process says it to terminate (or, perhaps, until the main-process kills it). Thanks for reaching out. Commented Jan 6, 2019 at 17:18

2 Answers 2

7

It seems like pipe might be a suitable choice for your use case. Beware though that under normal circumstance both reading and writing ends expect data to be written or read, respectively. Also make sure you do not get surprised by buffering (nothing comes through because buffers would not get automatically flushed except on an expected boundary, unless set accordingly).

A basic example of how two pipes (they are unidirectional) can be used between two processes:

import os

def child():
    """This function is executed in a child process."""
    infile = os.fdopen(r1)
    outfile = os.fdopen(w2, 'w', buffering=1)
    for line in infile:
        if line.rstrip() == 'quit':
            break
        print(line.upper(), end='', file=outfile)

def parent():
    """This function is executed in a parent process."""
    outfile = os.fdopen(w1, 'w', buffering=1)
    infile = os.fdopen(r2)
    print('Foo', file=outfile)
    print(infile.readline(), end='')
    print('bar', file=outfile)
    print(infile.readline(), end='')
    print('quit', file=outfile)

(r1, w1) = os.pipe()  # for parent -> child writes
(r2, w2) = os.pipe()  # for child -> parent writes
pid = os.fork()
if pid == 0:
    child()  # child code runs here
elif pid > 0:
    parent()  # parent code runs here
    os.waitpid(pid, 0)  # wait for child
else:
    raise RuntimeError("This should not have happened.")

Indeed it would be easier and more practical to use subprocess, and you likely want to run another program. The former would require to be told not to close the pipe file descriptors and the latter would require the pipe file descriptors to be inheritable (not have O_CLOEXEC flag set).

Child program:

import os
import sys

infile = os.fdopen(int(sys.argv[1]))
outfile = os.fdopen(int(sys.argv[2]), 'w', buffering=1)    
for line in infile:
    if line.rstrip() == 'quit':
        break
    print(line.upper(), end='', file=outfile)

Parent program:

import os
import subprocess

(r1, w1) = os.pipe2(0)  # for parent -> child writes
(r2, w2) = os.pipe2(0)  # for child -> parent writes    
child = subprocess.Popen(['./child.py', str(r1), str(w2)], pass_fds=(r1, w2))
outfile = os.fdopen(w1, 'w', buffering=1)
infile = os.fdopen(r2)
print('Foo', file=outfile)
print(infile.readline(), end='')
print('bar', file=outfile)
print(infile.readline(), end='')
print('quit', file=outfile)
child.wait()

If the child program does not need standard input nor standard output, they could be used to get information respectively in and out of the child program. This would even be simpler.

Child program:

import sys

for line in sys.stdin:
    if line.rstrip() == 'quit':
        break
    print(line.upper(), end='', flush=True)

Parent program:

import os
import subprocess

(r1, w1) = os.pipe2(0)  # for parent -> child writes
(r2, w2) = os.pipe2(0)  # for child -> parent writes
child = subprocess.Popen(['./child.py'], stdin=r1, stdout=w2)
outfile = os.fdopen(w1, 'w', buffering=1)
infile = os.fdopen(r2)
print('Foo', file=outfile)
print(infile.readline(), end='')
print('bar', file=outfile)
print(infile.readline(), end='')
print('quit', file=outfile)
child.wait()

As stated, it is not really Python specific and these are just rough hints on how pipes as one option could be used.

Sign up to request clarification or add additional context in comments.

10 Comments

This still has the crucial problem of deadlock (invisible until the transaction becomes larger in a real application) as well as flushing.
@DavisHerring Have you worked with Asyncio standard library? If so, do you think it will help? I understand the problem of deadlock that comes out of this implementation. NodeJs, for example, solves it by using listeners (asynchronous event-driven architecture) which work in a non-blocking manner. Would be cool to implement something like this in Python. Thoughts?
@DavisHerring I was really just out to get the basics covered. As for deadlock potential: yes the parent/child must not blindly assume someone is listening, let alone cross-write at once. Flushing: should be line buffered all across the examples and flushed on individual message or when buffer is full (4k I'd presume).
@OndrejK.: It’s true that certain reasonable protocols prevent deadlock; I should have said that. Line-buffering, however, is automatic only on terminals. One can use pseudo-terminals to make it happen (cf. unbuffer), but that’s rather more complicated than altering the server program (if you can).
The example uses higher level (python) interface to read/write through io.TextIOWrapper (with file-like object opened in text mode). "If line_buffering is True, flush() is implied when a call to write contains a newline character or a carriage return." I do not entirely disagree with you, but I am also trying to keep the complexity down.
|
5

You want to make a Popen object with subprocess.PIPE for standard input and output and use its file objects to communicate—rather than using one of the cantrips like run (and the older, more specific ones like check_output). The challenge is avoiding deadlock: it’s easy to land in a situation where each process is trying to write, the pipe buffers fill (because no one is reading from them), and everything hangs. You also have to remember to flush in both processes, to avoid having a request or response stuck in a file object’s buffer.

Popen.communicate is provided to avoid these issues, but it supports only a single string (rather than an ongoing conversation). The traditional solution is select, but it also works to use separate threads to send requests and read results. (This is one of the reasons to use CPython threads in spite of the GIL: each exists to run while the other is blocked, so there’s very little contention.) Of course, synchronization is then an issue, and you may need to do some work to make the multithreaded client act like a simple, synchronous function call on the outside.

Note that both processes need to flush, but it’s enough if either implements such non-blocking I/O; one normally does that job in the process that starts the other because that’s where it’s known to be necessary (and such programs are the exception).

2 Comments

Thanks for the response! I, actually, tried to use Popen, but communicate() — as you pointed out — supports only a single string, which is why I rejected this idea, although I didn't try to use Popen in a different manner (just reading/writing from std rather than using communicate()). Should I try this? Also, I was thinking of using Asyncio standard library as a way to get around the deadlock problem. Do you have any ideas?
@AntonReient: communicate is a specialized tool provided because it lets people solve one common problem without any (perceived) complexity. My answer says to use the Popen streams, right?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.