I am working in Python 3.4, performing a naive search against partitioned data in memory, and am attempting to fork processes to take advantage of all available processing power. I say naive, because I am certain there are other additional things that can be done to improve performance, but those potentials are out of scope for the question at hand.
The system I am testing on is a Windows 7 x64 environment.
What I would like to achieve is a relatively even, simultaneous distribution across cpu_count() - 1 cores (reading suggests that distributing against all cores rather than n-1 cores does not show any additional improvement due to baseline os system processes). So 75% pegged cpu Usage for a 4 core machine.
What I am seeing (using windows task manager 'performance tab' and the 'process tab') is that I never achieve greater than 25% system dedicated cpu utilization and that the process view shows computation occurring one core at a time, switching every few seconds between the forked processes.
I haven't instrumented the code for timing, but I am pretty sure that my subjective observations are correct in that I am not gaining the performance increase I expected (3x on an i5 3320m).
I haven't tested on Linux.
Based on the code presented: - How can I achieve 75% CPU utilization?
#pseudo code
def search_method(search_term, partition):
<perform fuzzy search>
return results
partitions = [<list of lists>]
search_terms = [<list of search terms>]
#real code
import multiprocessing as mp
pool = mp.Pool(processes=mp.cpu_count() - 1)
for search_term in search_terms:
results = []
results = [pool.apply(search_method, args=(search_term, partitions[x])) for x in range(len(partitions))]
scikit-learnfunctions have in-built options for going on multiple local-host cores. The point is, whether the solver computational strategy allows for non-intervening parallelised processing or not. Themultiprocessingmodule has no clue whether it is possible to split the problem into more non-intervening parallel code-execution streams ( not speaking about data-access mechanics )<_search_method_>for a<_search_term_>on a given<_list_of_lists_>. Thus you may harness 10x, 100x, 1000x more CPU/core-s into such a privateCloud/Grid-engine tasking.