I need to make different GET queries to a server to download a bunch of json files and write each download to disk and I want to launch some threads to speed that up.
Each download and writting of each file takes approximately 0.35 seconds.
I would like to know if, under linux at least (and under Windows since we are here), it is safe to write in parallel to disk and how many threads can I launch taking into account the waiting time of each thread.
If it changes something (I actually think so), the program doesn't write directly to disk. It just calls std::system to run the program wget because it is currently easier to do that way than importing a library. So, the waiting time is the time that the system call takes to return.
So, each writting to disk is being performed by a different process. I only wait that program to finish, and I'm not actually bound by I/O, but by the running time of an external process (each wget call creates and writes to a different file and thus they are completely independent processes). Each thread just waits for one call to complete.
My machine has 4 CPUs.
Some kind of formula to get an ideal number of threads according to CPU concurrency and "waiting time" per thread would be welcome.
NOTE: The ideal solution will be of course to make some performance testing, but I could be banned for the server if I abuse with so many request.
wget), which does the writting, so each writting to disk is being performed for a different process, and I don't know if that can change something.