16

Consider this working code:

from multiprocessing import Process
from updaters import app1, app2

if __name__ == '__main__':

    apps = [ app1, app2]  
    for app in apps:
        instance = app()
        p = Process(target=instance.start_listener)
        p.start()
        p.join()

This works fine until one process fails (lets say some exception) - how Can I programmatically monitor the process and restart it if it fails

1
  • 2
    Note that since join() had no timeout and is blocking you will run your processes one after the other which is not what I think you are trying to do. Commented Mar 2, 2014 at 10:27

1 Answer 1

21

Poll the Process.is_alive() and if it returns False delete the process and start a new one, e.g.:

from multiprocessing import Process
from updaters import app1, app2
from time import sleep

if __name__ == '__main__':

    apps = [app1, app2]  
    processes = {}
    n = 0
    for app in apps:
        instance = app()
        p = Process(target=instance.start_listener)
        p.start()
        processes[n] = (p, app) # Keep the process and the app to monitor or restart
        n += 1

    while len(processes) > 0:
        for n in processes.keys():
            (p, a) = processes[n]
            sleep(0.5)
            if p.exitcode is None and not p.is_alive(): # Not finished and not running
                 # Do your error handling and restarting here assigning the new process to processes[n]
                 print(a, 'is gone as if never born!')
            elif p.exitcode < 0:
                print ('Process Ended with an error or a terminate', a)
                # Handle this either by restarting or delete the entry so it is removed from list as for else
            else:
                print (a, 'finished')
                p.join() # Allow tidyup
                del processes[n] # Removed finished items from the dictionary 
                # When none are left then loop will end
print ('FINISHED')
Sign up to request clarification or add additional context in comments.

8 Comments

Thanks @steve. can you please elaborate with a code example? where so I poll and how?
Steve: The first for loop needs to increment n after assigning to processes[n]. @alonisser: The running processes are stored in the processes dict. To "poll" it you could nest the last for loop inside a while processes: if you made the p.exitcode < 0: clause delete anything it doesn't/can't restart.
I can't believe that people are still suggesting busy waiting solutions.
what is 'a' here? Did you mean to type 'app'?
a it's not app. a is assigned on this line: (p, a) = processes[n]
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.