Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Information request - using GPU-offloaded ecTrans via Atlas #178

Open
l90lpa opened this issue Nov 26, 2024 · 6 comments
Open

Information request - using GPU-offloaded ecTrans via Atlas #178

l90lpa opened this issue Nov 26, 2024 · 6 comments
Labels
enhancement New feature or request

Comments

@l90lpa
Copy link

l90lpa commented Nov 26, 2024

Is your feature request related to a problem? Please describe.

Hi, I work at the JCSDA with @fmahebert, and I'm reaching out because we're interested in trying to use GPU-offloaded ecTrans via Atlas within JEDI. We have some understanding of the current state of GPU-offloaded ecTrans from previous discussions and meetings but we'd like to try get a clearer sense of what one can do today, what's missing, and what your roadmap looks like. As well as, see if there are ways that we might be able to assist in this effort.

Our initial use case would require DIR_TRANS, INV_TRANS, and INV_TRANSAD (including their vordiv-wind support). Also, initially we would only need access to the GPU offloaded versions of these, and only at double precision.

While we'd really appreciate any information, and to be able to get a general overview, with our use case in mind we have some initial questions:

  • I'm aware that the adjoint transforms haven't been implemented (gpu offloaded versions), do you have plans to work on this?
  • What hurdles are left on the ecTrans side to be able to call GPU-offloaded ecTrans from Atlas? And, is there more work that needs to be done to support passing in/out device memory?

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

Organisation

JCSDA

@l90lpa l90lpa added the enhancement New feature or request label Nov 26, 2024
@samhatfield
Copy link
Collaborator

Hi Liam, welcome on board!:

  • Eventually we would like to have the adjoint code running on GPU. However we don't have any concrete plans or timelines to do this work. Unfortunately we have limited bandwidth to carry out this kind of technical work which is mostly absorbed by getting the regular forward code in a mature state across multiple platforms, not to mention fixing teething problems in using it within the IFS for forecasts. We're always open to collaboration there, on the other hand. If you have a usecase for INV_TRANSAD on GPU, do you want to try implementing it yourself?
  • I think it's best is @wdeconinck answers this point. My understanding is that we first have to be able to call the GPU backend from transi in order to use it from Atlas, and that's currently a hot topic (see e.g. https://github.com/wdeconinck/ectrans/tree/feature/transi_gpu).

@wdeconinck
Copy link
Collaborator

Hi @l90lpa ,

@lukasm91
Copy link
Collaborator

Hi all

Related to INV_TRANSAD, it could be worthwhile to see if @lukasm91 has any free cycles there for JEDI.

In general, yes, but may I ask what is the timeline for this?

@wdeconinck
Copy link
Collaborator

@l90lpa when you mean "gpu offload versions", does that mean that you expect the fields to be present on the host and be copied into device by ectrans? That is currently the working assumption.

We envision in the future to also allow to use device-resident fields, but that is currently not yet done.

@wdeconinck
Copy link
Collaborator

Hi @l90lpa another PR in the quest to integrate ectrans better into Atlas now allows for better error handling within Atlas when code paths are not implented yet. --> #193

@wdeconinck
Copy link
Collaborator

wdeconinck commented Dec 19, 2024

... And atlas PR ecmwf/atlas#252 allows atlas to link and run with transi_gpu_dp
As noted the not-implemented code paths will throw exceptions in atlas, once #193 is merged or otherwise when not yet merged abort or crash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants