4

I have a large set of simple simulations that I need to run, and I'm wondering if they can be done simultaneously. Let me describe the situation: I have 1000 draws of prevalence for ~100 diseases, and 1000 draws of corresponding disability weights for those diseases (how bad it is to have that disease on a 0-1 scale)for 20 age-groups. The simulation that I need to do is to determine, given a set of prevalences, how many people would have different combinations of diseases. Here is what the input data would look like for 10 diseases:

from __future__ import division
import numpy as np
disease_number = np.array([1,2,3,4]*10)
age = np.array([5, 10, 15, 20]*10)
prevalence = np.random.uniform(0, 1, (40, 1000))
disability_weight = np.random.uniform(0,1,(40, 1000))

A simulation of a single draw would look something like this, for age 5, draw 1.

prev_draw1 = prevalence[age==5, 1]
disability_draw1 = disability_weight[age==5, 1]
simulation = np.random.binomial(1, prev_draw1, (100000, prev_draw1.shape[0])

Then to calculate the disability weight attributable to each disease given the comorbidity of multiple diseases, I do the following: Set the denominator as the sum of present disability weights, and use the disability weight of a given disease as the numerator. For disease 1:

denom = np.sum(disability_draw1**simulaiton)
denom[denom==1]=0
numerator = disability_draw1*simulation[:, 0]
adjusted_dw = np.sum(numerator/denom)

I would need to this adjusted dw calculation seperately for each disease. Is there any way for me to do these 1000 simulations simultaneously? Any help is appreciated, and I'm fairly new to python, so more descriptions are very helpful.

1 Answer 1

4

If you have multiple processors/cores, you could take a look at the multiprocessing module.

Running 1000 simulations at the same time might be a bit expensive though. You should probably run your simulations at the rate of one per core at a time.

You could use the Queue module and work with a Process pool.

Here is a mini sample of what it could look like (not tested):

from multiprocessing import Process, Queue

def run_simulation(simulations, results):
    while simulations.qsize() > 0:
        simulation_params = simulations.get()
        # run simulation
        results.put(simulation_result)
        simulations.task_done()

if __name__ == '__main__':
    simulations_to_run = Queue()
    simulations_to_run.put({}) # simulation parameters go in this dict, add all simulations, one per line (could be done in a loop, with a list of dicts)
    results = Queue()
    for i in range(8): #number processes you want to run
        p = Process(target=run_simulation, args=(simulations_to_run, results))
        p.start()

    simulations_to_run.join()
    # now, all results shoud be in the results Queue

http://docs.python.org/library/multiprocessing.html

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks very much for this input. Unfortunately, I need to perform this same set of 1000 simulations for 20 regions, 30 age-groups, and 2 sexes -- this is how I will parallelize the overall process on a cluster system.
If you need to run this on a cluster, you could use a Queue server like Beanstlkd (if on linux/unix). Instead of fetching the jobs from simulations_to_run, you fetch them from the queue server. Once the task is run, you could put it on a different tube on the beanstalkd server. This code should be easy to adapt to use a queue server. You would to have to run this script on all servers.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.