Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to start a mpi process without mpiexec or mpirun? #13101

Open
hiworldwzj opened this issue Feb 16, 2025 · 9 comments
Open

how to start a mpi process without mpiexec or mpirun? #13101

hiworldwzj opened this issue Feb 16, 2025 · 9 comments

Comments

@hiworldwzj
Copy link

I want to start two processes byself, and use code to build group byself.

@ggouaillardet
Copy link
Contributor

ggouaillardet commented Feb 16, 2025

To enable communication between these two processes, certain mechanisms are necessary to facilitate the exchange of required information. This is generally achieved through the use of mpirun, which, in an oversimplified explanation, launches PMIx servers in the background. Another approach is to use direct run commands like srun with SLURM), where PMIx servers are also managed behind the scenes. Alternatively, one could develop custom PMIx servers, though this would essentially be reinventing the wheel.

If you can describe a specific real-world scenario, we might be able to offer tailored advice on how to implement it.

@rhc54
Copy link
Contributor

rhc54 commented Feb 16, 2025

Here is one way I've done it (if I understand your question) - I call it the "sea of MPI" scenario.

You start a PRRTE persistent distributed virtual machine (DVM) - basically a persistent form of mpirun. This is just a set of PMIx servers, one per node. You then start your individual processes as MPI singletons - they will "discover" the local PMIx server during MPI_Init.

From there, the individual processes can use PMIx calls to discover other processes, and then standard MPI connect/accept calls to create communicators. The DVM will provide the infrastructure to wire things up.

Takes a bit of fiddling to get your app to work properly, but works fine once you get the hang of it. Your app tends to be a little more complicated as it has to navigate process discovery and wireup, requiring some understanding of PMIx as well as MPI.

@rhc54
Copy link
Contributor

rhc54 commented Feb 16, 2025

Of course, if you just want a simple solution, you could start your first process as a singleton and have it call MPI_Comm_spawn to start the other process(es). This will launch a DVM under-the-covers to support the resulting job.

@hiworldwzj
Copy link
Author

@ggouaillardet @rhc54 Thank you very much for your reply. I am using nvshmem to develop an LLM service, and the initialization of nvshmem relies on the MPI environment. I need to start 16 processes across two hosts, with 8 processes on each host. However, due to many constraints from our legacy framework code, I cannot use mpirun or mpiexec to start the program and establish the communication group. Therefore, I need a way to start the processes and establish the communication group programmatically.

@ggouaillardet
Copy link
Contributor

I guess your best bet then is, as pointed by @rhc54, to start a single process in singleton mode, and then have it MPI_Comm_spawn() the remaining 15 tasks.

@hiworldwzj
Copy link
Author

@ggouaillardet The 16 processes has been existed first, so MPI_Common_spawn may not be appropriate.

@hiworldwzj
Copy link
Author

10.1.4.2. Using the scheduler to “direct launch” (without mpirun(1))
Some schedulers (such as Slurm) have the ability to “direct launch” MPI processes without using Open MPI’s mpirun(1). For example:

shell$ srun -n 40 mpi-hello-world
Hello world, I am 0 of 40 (running on node14.example.com)
Hello world, I am 1 of 40 (running on node14.example.com)
Hello world, I am 2 of 40 (running on node14.example.com)
...
Hello world, I am 39 of 40 (running on node203.example.com)
shell$
Similar to the prior example, this example launches 40 copies of mpi-hello-world, but it does so via the Slurm srun command without using mpirun(1).
@ggouaillardet Can you explain how this direct startup method is implemented?

@rhc54
Copy link
Contributor

rhc54 commented Feb 17, 2025

There is no such thing as "direct startup" in the manner you imply. Slurm's srun operates exactly the same as mpirun does - it starts a set of daemons (one per node), each hosting a PMIx server, and then fork/exec's your application processes underneath them. There is absolutely no difference.

What I'm struggling to understand is why you care? The launcher (srun or mpirun) doesn't care what your app does or what messaging library it is using. It will provide information and infrastructure to support MPI, but you don't have to use it.

You say that the processes will already exist - well, something had to start them! If you use srun or mpirun to do it, then MPI can just magically work. If you start them as singletons (i.e., starting them one-at-a-time without a launcher, or using ssh to individually start them), then you need to have something like PRRTE running in the background to provide the infrastructure that the launcher would have done.

@ggouaillardet
Copy link
Contributor

@hiworldwzj are you saying your legacy framework starts the 16 processes independently and you expect they will somehow join forces into a single MPI job?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants