I have a program which performs a Monte Carlo-type simulation. Currently I have written version of the program against both OpenMP and OpenCL and wish to know the best approach for distributing the workload between the computers on my LAN.
My first idea was to write a sockets-based client/server application whereby the server would divide up work units to send to the clients, which would then complete them, and send back the results. In order to leverage systems with fast CPUs and GPUs I could run multiple instances of the client program on a system (a -omp and a -ocl executable).
However, sockets programming is seldom enjoyable and a pain to get right (deciding on a protocol etc.). Hence I decided to look at MPI, which seems nice, although am unsure how well it works when you want to include CPUs + GPUs into the mix or how well my server-prescribed 'work unit' fits in. (The process of determining which regions of the problem space to sample is non-trivial, hence the requirement for the sentient master process to coordinate things.)
Hence, I am interested to know if there are any other options available to me or what others have decided on in a similar situation.