Compilation flags in `arch.mk` for GPU offloading

BerkeleyGW supports offloading on three GPU hardware, namely NVIDIA, AMD and Intel. Below, we describe the flags necessary in arch.mk to compile BerkeleyGW enabling GPU offload. Also, refer to the examples in the config/ directory, in particular:

perlmutter.nvhpc.gpu.nersc.gov.mk NVIDIA GPUs
frontier.cray.gpu.ornl.gov.mk AMD GPUs
aurora.intel.gpu.alcf.gov.mk Intel GPUs

Generally you need to specify a compiler, a library's API and an offloading programming model (OpenACC and/or OpenMP-target).

Compilation on NVIDIA Architectures

The preferred compiler option on NVIDIA architectures is the nvhcp compiler suite (usually provided within the PrgEnv-nvhpc module). The preferred library's API is the NVHPC_API (based on CUDA) usually automatically provided in the linking and include path within NVHPC-SDK distribution. To use the above combination set:

COMPFLAG = -DNVHPC -DNVHPC_API -DNVIDIA_GPU

nvhcp compiler support both OpenACC and OpenMP-target programming model, you can compile in just one of them or both, in the latter case the default programming model (ALGO) is OpenACC, and OpenMP-target can be turned by setting the appropriated flags in input. Make sure to include -DOPENACC and/or -DOMP_TARGET flags on the list after PARAFLAG = (or MATHFLAG =).

Typical Fortran compiler flags are:

F90free = ftn -Mfree -acc -mp=multicore,gpu -gpu=cc80 -cudalib=cublas,cufft -traceback -Minfo=all,mp,acc -gopt
LINK = ftn -acc -mp=multicore,gpu -gpu=cc80 -cudalib=cublas,cufft
FOPTS = -fast -Mfree -Mlarge_arrays

Compilation on AMD Architectures

The preferred compiler option on AMD architectures is the Cray compiler suite (usually provided within the PrgEnv-cray module). The preferred library's API is the HIP_API (based on HIP).

To use the above combination set:

COMPFLAG = -DCRAY -DHIP_API -DAMD_GPU

If the HIP_API is not usually automatically provided in the linking and include path within Cray or AMD distribution you can build it yourself by following these steps: - Clone the repository: git@github.com:ROCmSoftwarePlatform/hipfort.git - Create a build directory and access it: mkdir build ; cd build - Use the CMake to configure the library's API: cmake ../ -DHIPFORT_COMPILER='ftn' -DHIPFORT_COMPILER_FLAGS='-f free -fopenmp -g -eF ' -DHIPFORT_INSTALL_DIR=<Path_to_lib> - Build using: make -j 8

Make sure to include the path to the library's API in the arch.mk file (${ROCM_PATH} is the path to the ROCm library which should be automatically provided by rocm module):

HIP_INC = -J/<Path_to_lib>/include/hipfort/amdgcn/ -I${ROCM_PATH}/include/

HIP_LIB = /<Path_to_lib>/lib/libhipfort-amdgcn.a -L${ROCM_PATH}/lib -lamdhip64 -lhipfft -lhipblas

Cray compiler support both OpenACC and OpenMP-target programming model, you can compile in just one of them or both, in the latter case the default programming model (ALGO) is OpenACC, and OpenMP-target can be turned by setting the appropriated flags in input. Make sure to include -DOPENACC and/or -DOMP_TARGET flags on the list after PARAFLAG = (or MATHFLAG =).

Typical Fortran compiler flags are:

F90free = ftn -f free -h acc -homp -g -ef -hacc_model=auto_async_none:no_fast_addr:no_deep_copy ${HIP_INC} ${HIP_LIB}
LINK = ftn -f free -h acc -homp -g -ef -hacc_model=auto_async_none:no_fast_addr:no_deep_copy ${HIP_INC} ${HIP_LIB}
FOPTS = -O1

Compilation on Intel Architectures

The preferred compiler option on Intel architectures is the Intel oneapi suite and ifx compiler (usually provided within the oneapi/release/YYY.MM.DD.vvvv module). The preferred library's API is the ONE_API (which incorporate MKL). Usually automatically provided in the linking and include path within oneapi/release distribution.

To use the above combination set:

COMPFLAG = -DINTEL -DINTEL_GPU -DONE_API

ifx compiler support only OpenMP-target programming model. Make sure to include -DOMP_TARGET flags on the list after PARAFLAG = (or MATHFLAG =).

F90free = mpif90 -fc=ifx -free
LINK = mpif90 -fc=ifx -free
FOPTS = -O2 -g -traceback -check shape -fp-model precise -no-ipo -align array64byte -fiopenmp -fopenmp-targets=spir64 -qmkl=sequential -lmkl_sycl -lsycl -lOpenCL
And append to the library path: -qmkl=sequential -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -liomp5 -lpthread -lm -ldl

Compilation flags in arch.mk for GPU offloading

Compilation on NVIDIA Architectures

Compilation on AMD Architectures

Compilation on Intel Architectures

Compilation flags in `arch.mk` for GPU offloading