-
Notifications
You must be signed in to change notification settings - Fork 884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to start a mpi process without mpiexec or mpirun? #13101
Comments
To enable communication between these two processes, certain mechanisms are necessary to facilitate the exchange of required information. This is generally achieved through the use of mpirun, which, in an oversimplified explanation, launches PMIx servers in the background. Another approach is to use direct run commands like srun with SLURM), where PMIx servers are also managed behind the scenes. Alternatively, one could develop custom PMIx servers, though this would essentially be reinventing the wheel. If you can describe a specific real-world scenario, we might be able to offer tailored advice on how to implement it. |
Here is one way I've done it (if I understand your question) - I call it the "sea of MPI" scenario. You start a PRRTE persistent distributed virtual machine (DVM) - basically a persistent form of From there, the individual processes can use PMIx calls to discover other processes, and then standard MPI connect/accept calls to create communicators. The DVM will provide the infrastructure to wire things up. Takes a bit of fiddling to get your app to work properly, but works fine once you get the hang of it. Your app tends to be a little more complicated as it has to navigate process discovery and wireup, requiring some understanding of PMIx as well as MPI. |
Of course, if you just want a simple solution, you could start your first process as a singleton and have it call |
@ggouaillardet @rhc54 Thank you very much for your reply. I am using nvshmem to develop an LLM service, and the initialization of nvshmem relies on the MPI environment. I need to start 16 processes across two hosts, with 8 processes on each host. However, due to many constraints from our legacy framework code, I cannot use mpirun or mpiexec to start the program and establish the communication group. Therefore, I need a way to start the processes and establish the communication group programmatically. |
I guess your best bet then is, as pointed by @rhc54, to start a single process in singleton mode, and then have it |
@ggouaillardet The 16 processes has been existed first, so MPI_Common_spawn may not be appropriate. |
10.1.4.2. Using the scheduler to “direct launch” (without mpirun(1)) shell$ srun -n 40 mpi-hello-world |
There is no such thing as "direct startup" in the manner you imply. Slurm's What I'm struggling to understand is why you care? The launcher (srun or mpirun) doesn't care what your app does or what messaging library it is using. It will provide information and infrastructure to support MPI, but you don't have to use it. You say that the processes will already exist - well, something had to start them! If you use srun or mpirun to do it, then MPI can just magically work. If you start them as singletons (i.e., starting them one-at-a-time without a launcher, or using ssh to individually start them), then you need to have something like PRRTE running in the background to provide the infrastructure that the launcher would have done. |
@hiworldwzj are you saying your legacy framework starts the 16 processes independently and you expect they will somehow join forces into a single MPI job? |
I want to start two processes byself, and use code to build group byself.
The text was updated successfully, but these errors were encountered: