I'm implementing an Allreduce algorithm inside the mca/coll framework.
The algorithm I'm implementing needs each node to send at each step of the computation only a part of the vector (like a ring allreduce), but the size of this part and the position can change.
More importantly, I should be able to send non contiguous part of the vector.
I was thinking about using a custom mpi datatype, the problem is that the normal MPI_Datatype ... is just the final implementation on what is inside the mca (and the others) module(s).
I need to know how to create such custom datatype using only the low level calls of the module (no MPI_...).
For the send and recv part I was thinking about using the ompi_datatype_sendrcv (implemented in ompi/datatype/ompi_datatype_sendrcv.c) but suggestions even on this are welcome.
PMPI_*subroutines instead of theMPI_*ones in order not to confuse profilers or tools. What would be the issue with usingPMPI_*subroutines? If you really want to use the internals, look at theCbindings (e.g.ompi/mpi/c/type_contiguous.c), there is generally a straightforward mapping between theCMPI_*subroutines and the internalompi_*one.PMPI_*subroutine, and if you can showcase some performance benefits, open a pull request. the reviewer(s) will guide you on how to use the internal calls if needed.