Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorize np loops in limiter_optim_iter_full #13

Open
amametjanov opened this issue Aug 28, 2015 · 2 comments
Open

Vectorize np loops in limiter_optim_iter_full #13

amametjanov opened this issue Aug 28, 2015 · 2 comments

Comments

@amametjanov
Copy link
Member

There are 8 np loops in limiter_optim_iter_full subroutine in prim_advection_mod.F90. In most cases, np is 4 and most of the loops have trip counts of 4-by-4, 4, or 16. Since the call to this subroutine is already inside a nested OMP parallel region, further improvement should be done with SIMD. If vectorization is not possible, we should explore loop unroll by a factor of 4.

@mrnorman
Copy link

mrnorman commented Sep 1, 2015

Are you talking about manually unrolling? I think this is something the compiler should be doing for us, right? Regarding SIMD, a lot of those loops (not all though) are reductions. Can SIMD instructions run on reduction loops? I know that for the GPU port, we don't thread down into the np x np loops because of reductions over these small np x np chunks of data.

@amametjanov
Copy link
Member Author

A few months ago I looked at compiler generated listings for other subroutines in derivative_mod.F90 and saw that neither unroll nor SIMD was happening. SIMD was not done because it was deemed 'not profitable'. IIRC, np was also not deduced to be a compile-time constant to enable further optimizations. I am logging this issue here to put in our backlog tasks. This subroutine is called over a million times.

I saw an improvement with manual unroll in edge_mod.F90 based on GPTL timers. Will check how we do after integrating into ACME/models/atm/cam/src/dynamics/se/share/edge_mod.F90.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants