-
Notifications
You must be signed in to change notification settings - Fork 2
Running the forward models
This wiki page is intended to be used as an accompaniment to this release for the purpose of computational reproducibility and as a resource for anyone looking to implement similar methods.
All programs are written in C using the PETSc library. PETSc along with all optional dependencies (like hdf5, fftw, etc) can be installed using the spack package manager. An alternate is to install PETSc directly from source as described here. The command line scripts shown below will help one in getting started with running these programs but optimal performance requires further tuning. (NUMA effects, Affinity settings, MPI rank pinning, etc)
The source code for full array fresnel multislice is available at xwp_petsc/forward/2d/ms/
. The program ex_matter_repeat.c
simulates the case where the test object does not have any variations along the direction of propagation (like a zone plate). Compiling this program requires a build of PETSc with the FFTW interface and the complex scalar type (and it's good to have the debugging options turned off and int64 indices turned on). Older versions of PETSc have bugs related to the PETSc-FFTW interface so the use of version 3.13 or above is suggested.
Once PETSc has been installed ensure that the environment variables PETSC_ARCH/PETSC_DIR
are set. The makefile
present in the same directory can then be used to generate the executable. This executable can then be run by :
mpirun -np num_mpi_ranks ./executable \
-prop_steps N \
-prop_distance z(in SI units) \
-energy E(in keV) \
-mx Mx(grid size in x) \
-my My(grid size in y) \
-step_grid_x px(pixel size in x) \
-step_grid_x py(pixel size in y) \
-log_view(for detailed logfiles)
All the options can be directly baked into the program by editing the source code or using a PETSc options file.
The Full Array Fresnel Multislice algorithm uses global Fast Fourier Transforms in 2D and is therefore very bandwidth intensive. If using the Cray-MPI library, hugepages and DMAPP options can be turned on for better performance. For more details refer to this talk.
The source code for finite difference method is available at xwp_petsc/forward/2d/fd/
. The program ex_matter_repeat.c
simulates the case where the test object does not have any variations along the direction of propagation (like a zone plate). Compiling this program requires a build of PETSc with complex scalar type (and it's good to have the debugging options turned off and int64 indices turned on as before). There is a known bug with linear TS objects and a bugfix for the same is available here. Alternatively one can use the latest release of PETSc and add -snes_lag_jacobian -1
as a command line option.
Once PETSc has been installed ensure that the environment variables PETSC_ARCH/PETSC_DIR
are set. The makefile
present in the same directory can then be used to generate the executable. To run with multigrid preconditioning :
mpirun -np num_mpi_ranks ./executable \
-prop_steps N \
-prop_distance z(in SI units) \
-energy E(in keV) \
-mx Mx(grid size in x) \
-my My(grid size in y) \
-L_x px(support length along x : Mx * pixel size in x) \
-L_y py(support length along y : My * pixel size in y) \
-ts_type cn(crank-nicholson) \
-ksp_type fgmres -ksp_rtol 1e-5 \
-pc_type gamg -pc_gamg_type agg \
-pc_gamg_threshold -0.04 -pc_gamg_coarse_eq_limit 5000 \
-pc_gamg_use_parallel_coarse_grid_solver \
-pc_gamg_square_graph 10 \
-mg_levels_ksp_type gmres -mg_levels_pc_type jacobi \
-pc_gamg_reuse_interpolation true \
-ts_monitor
-log_view(for detailed logfiles)
And to run with asm preconditioning, the preconditioner options used are :
-ksp_type fgmres -ksp_rtol 1e-5 \
-pc_type asm \
-pc_asm_overlap 2 \
-sub_pc_type ilu \
All the options can be directly baked into the program by editing the source code or using a PETSc options file. The meaning of the various linear solver options can be gleaned from the PETSc manual.
When using a large number of nodes the option -pc_gamg_process_eq_limit
can be used for better runtimes. If using the Cray-MPI library, MPICH rank reordering leads to efficient communication patterns. For more details refer to this talk and man pages.
The options used here are from my own (limited) testing on KNL based clusters and extensive feedback obtained from the PETSc developers on the petsc-users@mcs.anl.gov
mailing list (archives). Better performance may be found on different architectures with different solver options and I have only explored a small search space for the same.