1

I'am trying to speedup a Python program , I made remark that there is a thread always running that scans the inputs from an external resource, and when it gets something, it will call another function that will parse the input data and return an understandable information (the parsing function also uses other functions).

A simple model of the scanning() function

def scanning(x):
    alpha = GetSomething(x)
    if alpha != 0:
        print Parsing(alpha)

So my idea is to convert this thread into a process that will run in parallel with the main process, and when it gets something, it will send it using a Queue to the main process which should then call the parsing function.

My questions are: it is possible to keep the scanning()function as it is and use it inside a process (even if it calls other functions)?

If not, what are the required modifications on the structure of the scanning() function to be used conveniently with the multiprocessing module?

What is the proper way to multiprocess a function that calls other functions in Python ?

6
  • I guess it would be like you said: a thread to populate a queue, some threads to scan whatever is in the queue Commented Apr 5, 2016 at 11:59
  • the option above is viable if the input are faster to get than the scanning part Commented Apr 5, 2016 at 12:03
  • @Whitefret I'am trying to replace threading with multiprocessing , but I wonder what's the proper way to do it since scanning() has a lot of function calls within it , I would appreciate it if you can help Commented Apr 5, 2016 at 14:48
  • I don't see why you would use multiprocessing except if you want to use several machine at the same time. In that case, I don't know a way to do that in plain python, but you could use MPI in C, RMI in java, or even why not Map/Reduce Commented Apr 5, 2016 at 16:33
  • @Whitefret the scanning thread is always running , so I want to profit from multiprocessing so it can run on a separate core not the same as the main program Commented Apr 5, 2016 at 16:54

1 Answer 1

2

Short answer: yes, it is possible.

To understand why, you need to understand one thing about multiprocessing. It does not remove multiprocessing-invoked function into a separate process: it creates a full replica of your entire process: including it's code, loaded modules and any global data that have been initialized before you forked your processes.

So if your code has some sub-functions defined, they will be available to your function after it's been split into a separate process, along with any data that have been pre-initialized. Any modifications to values, functions and namespaces of your main process after forking processes will not affect the forked process at all - you need to use special tools to communicate between processes.

So, let's suppose you have the following abstract code:

import SomeModule
define SomeFunction()
assign SomeValue

define ChildProcess():
    call SomeFunction()
    increase SomeValue
    do ChildProcessStuff

start ChildProcess()
decrease SomeValue
do MainProcessStuff

For both main and spawned processes, your code executes identically until the line start ChildProcess(). After this line your process splits into two which are fully identical at first, but have different points of execution. Main process goes past this line and proceeds straight to do MainProcessStuff, while your child process will never reach that line. Instead, it creates a replica of entire namespace and starts executing ChildProcess() as if it was called like a normal function followed by an exit().

Note how both main and child processes have access to SomeValue. Also note how their changes to it are independent, as they're doing them in different namespaces (and therefore to different SomeValues). This wouldn't be the case with threading module which does not split the namespaces, and it's an important distinction.

Also note that main process never executes the code in ChildProcess but it retains a reference to it, which can be used to track it's progress, terminate it prematurely etc.

You might also be interested in more in-depth information about Python threads and processes here.

Sign up to request clarification or add additional context in comments.

3 Comments

if we take the example of the scanning process , it will be the a new copy of the original application that got it running in the first place ,so I can call the parsing function from within it. But how would I send the parsed data back to the main process ?
@werberbang Typically, to communicate between processes you use pipes and/or queues. Create a pipe/queue instance before splitting processes, provide Process class with a reference to pipe/queue object (usually done by passing it to constructor) and once you start your child process, both processes will have access to pipe/queue and can read/write from it.
by doing that , will the main process still be active(performing other operations and interacting with the user) while the child process is performing the read and parse operations , or will it be in a wait state until reading something from the shared Queue

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.