I have created a structure, which provided a dataset as opts to a ProcessPoolExecutor and the inputs are the indices for the dataset.
I could provide a MWE, but I tried several approaches and all resulted in something like a Forkbomb.
I think the cause was some internal multiprocess execution of pytorch, but Ian unable to provide some proof on this assumption.
So the question: Is there a way to go multiprocess in pytorch with data, which is not able to create a batch from (different shape), without creating a forkbomb?
The goal is to save data at the end of the process to disk, so I don't have to take care about sync (and the data is at the end too large to fit all in memory for all data).