Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed section needs improvements #115

Open
carstenbauer opened this issue Jul 13, 2024 · 1 comment
Open

Distributed section needs improvements #115

carstenbauer opened this issue Jul 13, 2024 · 1 comment

Comments

@carstenbauer
Copy link
Contributor

  • Last time I checked SharedArray only worked when shared memory was available, i.e., when all workers run on a single machine. If that's still the case, then it is not generally useful when doing "real" distributed computing involving multiple machines.
  • IMO the most important help that one can provide for Distributed.jl is to tell people how to practically start workers on multiple machines. Keyword arguments for addprocs like exename or dir are important if the file system isn't shared (Julia and the working directory might be at a different location than on the main computer). On cluster, you will likely need to use ClusterManagers.jl or learn how to get the hostnames of the machines from a job scheduler (like SLURM). OTOH, you might want to use MPI here anyways.
  • One could showcase an array-abstraction for Distributed like DistributedArrays.jl's or Dagger.jl's DArray (similar to CuArray for GPUs).
  • ...

If desired, I think I might be able to help with improving the section.

@gdalle
Copy link
Collaborator

gdalle commented Jul 13, 2024

Thanks, some help would be lovely here! @jacobusmmsmit wrote this section last week and I did some shallow editing to get it ready for JuliaCon but I didn't check the whole content cause I never used distributed computing. Excited to a review a PR in this direction

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants