Shell script output to file is buffered/truncated using python multiprocessing module

Question

I have a python framework which has to execute bash scripts as plugins. We are using multiprocessing module to create worker processes which pick the plugin details from a multiprocessing.JoinableQueue and execute the plugins using subprocess.Popen().

It has been observed that the final output generated by the shell scripts get truncated and as end result the entire execution goes waste.

So we tried moving to python threads for the workers maintaining the subprocess mechanism to spawn the shell script processes. And the truncation was no longer happening. But threads are awfully slow (due to GIL) and the responses to signals and events are also indeterminate(probably owing to the GIL release timings).

I have read in many places including other questions in stackoverflow that multiprocessing module does a buffering of stdout. We know this is the problem. But are unable to find a proper solution, as we can't give sys.stdout.flush from python for the data that the shell script has to echo to a file.

Also we tried os.fsync with some samples, and truncation is not happening. Again it can't be used directly for our purpose as the names of files created by the shell scripts are not known to the framework. Only a final archive is taken back by the framework.

My question is, is there any way to prevent this buffering in the processes spawned from multiprocessing modules? Will the -u option of python interpreter help here? Or will any modifications to the python library in /usr/lib64/python2.6/multiprocessing clear this problem?

Vivek · Accepted Answer · 2012-06-18 10:36:05Z

We found out that the commands sent over ssh inside the scripts were the ones which were getting truncated in their outputs.

For this we used the -n flag of ssh , which solved the problem. There is no more truncation. But this is a strange issue which happens only in python multiprocessing environment and must be considered seriously by anyone attempting to use such a model for their own purposes.

The man page of -n option says

Redirects stdin from /dev/null (actually, prevents reading from stdin). This must be used when ssh is run in the background. A common trick is to use this to run X11 programs on a remote machine. For example, ssh -n shadows.cs.hut.fi emacs & will start an emacs on shadows.cs.hut.fi, and the X11 connection will be automatically forwarded over an encrypted channel. The ssh program will be put in the background. (This does not work if ssh needs to ask for a password or passphrase; see also the -f option.)

Collectives™ on Stack Overflow

Shell script output to file is buffered/truncated using python multiprocessing module

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related