Compilation flags in arch.mk for GPU offloading
BerkeleyGW supports offloading on three GPU hardware, namely NVIDIA, AMD and Intel.
Below, we describe the flags necessary in arch.mk to compile BerkeleyGW
enabling GPU offload. Also, refer to the examples in the config/ directory, in
particular:
perlmutter.nvhpc.gpu.nersc.gov.mkNVIDIA GPUsfrontier.cray.gpu.ornl.gov.mkAMD GPUsaurora.intel.gpu.alcf.gov.mkIntel GPUs
Generally you need to specify a compiler, a library's API and an offloading programming model (OpenACC and/or OpenMP-target).
Compilation on NVIDIA Architectures
The preferred compiler option on NVIDIA architectures is the nvhcp compiler suite (usually provided within the PrgEnv-nvhpc module).
The preferred library's API is the NVHPC_API (based on CUDA) usually automatically provided in the linking and include path within NVHPC-SDK distribution.
To use the above combination set:
COMPFLAG = -DNVHPC -DNVHPC_API -DNVIDIA_GPU
nvhcp compiler support both OpenACC and OpenMP-target programming model, you can compile in just one of them or both, in the latter case the
default programming model (ALGO) is OpenACC, and OpenMP-target can be turned by setting the appropriated flags in input. Make sure to include
-DOPENACC and/or -DOMP_TARGET flags on the list after PARAFLAG = (or MATHFLAG =).
Typical Fortran compiler flags are:
F90free = ftn -Mfree -acc -mp=multicore,gpu -gpu=cc80 -cudalib=cublas,cufft -traceback -Minfo=all,mp,acc -goptLINK = ftn -acc -mp=multicore,gpu -gpu=cc80 -cudalib=cublas,cufftFOPTS = -fast -Mfree -Mlarge_arrays
Compilation on AMD Architectures
The preferred compiler option on AMD architectures is the Cray compiler suite (usually provided within the PrgEnv-cray module).
The preferred library's API is the HIP_API (based on HIP).
To use the above combination set:
COMPFLAG = -DCRAY -DHIP_API -DAMD_GPU
If the HIP_API is not usually automatically provided in the linking and include path within Cray or AMD distribution you can build it
yourself by following these steps:
- Clone the repository: git@github.com:ROCmSoftwarePlatform/hipfort.git
- Create a build directory and access it: mkdir build ; cd build
- Use the CMake to configure the library's API: cmake ../ -DHIPFORT_COMPILER='ftn' -DHIPFORT_COMPILER_FLAGS='-f free -fopenmp -g -eF ' -DHIPFORT_INSTALL_DIR=<Path_to_lib>
- Build using: make -j 8
Make sure to include the path to the library's API in the arch.mk file (${ROCM_PATH} is the path to the ROCm library which should be automatically provided by rocm module):
HIP_INC = -J/<Path_to_lib>/include/hipfort/amdgcn/ -I${ROCM_PATH}/include/
HIP_LIB = /<Path_to_lib>/lib/libhipfort-amdgcn.a -L${ROCM_PATH}/lib -lamdhip64 -lhipfft -lhipblas
Cray compiler support both OpenACC and OpenMP-target programming model, you can compile in just one of them or both, in the latter case the
default programming model (ALGO) is OpenACC, and OpenMP-target can be turned by setting the appropriated flags in input. Make sure to include
-DOPENACC and/or -DOMP_TARGET flags on the list after PARAFLAG = (or MATHFLAG =).
Typical Fortran compiler flags are:
F90free = ftn -f free -h acc -homp -g -ef -hacc_model=auto_async_none:no_fast_addr:no_deep_copy ${HIP_INC} ${HIP_LIB}LINK = ftn -f free -h acc -homp -g -ef -hacc_model=auto_async_none:no_fast_addr:no_deep_copy ${HIP_INC} ${HIP_LIB}FOPTS = -O1
Compilation on Intel Architectures
The preferred compiler option on Intel architectures is the Intel oneapi suite and ifx compiler (usually provided within the oneapi/release/YYY.MM.DD.vvvv module). The preferred library's API is the ONE_API (which incorporate MKL). Usually automatically provided in the linking and include path within oneapi/release distribution.
To use the above combination set:
COMPFLAG = -DINTEL -DINTEL_GPU -DONE_API
ifx compiler support only OpenMP-target programming model. Make sure to include -DOMP_TARGET flags on the list after PARAFLAG = (or MATHFLAG =).
F90free = mpif90 -fc=ifx -freeLINK = mpif90 -fc=ifx -freeFOPTS = -O2 -g -traceback -check shape -fp-model precise -no-ipo -align array64byte -fiopenmp -fopenmp-targets=spir64 -qmkl=sequential -lmkl_sycl -lsycl -lOpenCL- And append to the library path:
-qmkl=sequential -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -liomp5 -lpthread -lm -ldl