Skip to end of metadata
Go to start of metadata

2.5.1 Introduction 

It is always recommended to use SLURM batch job file for running GPU specific jobs. However, for quick tests also "srun" command would be acceptable.
We will now show batch scripts to run the CUDA-version of our DAXPY program. Runs with GNU and PGI environments are shown.

2.5.2 Running under GNU (plus CUDA 6.0) environment 

Here is a valid script ( daxpygnu.slurm ) for running under GNU environment :

#!/bin/bash
#SBATCH -N 1
#SBATCH -p gpu
#SBATCH -t 00:05:00
#SBATCH -J daxpygnu
#SBATCH -o daxpygnu.out.%j
#SBATCH -e daxpygnu.out.%j
#SBATCH --gres=gpu:2
#SBATCH --exclusive
#SBATCH

module purge
module load gcc/4.8.2 cuda/6.0
module list

set -xe

cd ${SLURM_SUBMIT_DIR:-.}
pwd

ldd ./daxpy.x.gnu

./daxpy.x.gnu

The output listing ( daxpygnu.out.<slurm_jobid_number> ) is as follows :

Currently Loaded Modules:
  1) gcc/4.8.2    2) cuda/6.0
+ cd /wrk/sbs/GuideBull/daxpy/cuda_gcc
+ pwd
/wrk/sbs/GuideBull/daxpy/cuda_gcc
+ ldd ./daxpy.x.gnu
        linux-vdso.so.1 =>  (0x00007fff107ff000)
        librt.so.1 => /lib64/librt.so.1 (0x00007fa145e66000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fa145c48000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007fa145a44000)
        libstdc++.so.6 => /appl/opt/gcc/4.8.2/lib64/libstdc++.so.6 (0x00007fa14573b000)
        libm.so.6 => /lib64/libm.so.6 (0x00007fa1454b6000)
        libgcc_s.so.1 => /appl/opt/gcc/4.8.2/lib64/libgcc_s.so.1 (0x00007fa1452a0000)
        libc.so.6 => /lib64/libc.so.6 (0x00007fa144f0c000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fa14607f000)
+ ./daxpy.x.gnu
n=134217728 : vlen=256 : griddim = 524288 1 1 : blockdim.x = 256 1 1
daxpy(n=134217728): sum=9.0072e+15 : check_sum=9.0072e+15 : diff=0

 

2.5.3 Running under PGI (plus CUDA 6.0) environment

Now proceeding under PGI environment. What is the big deal here ? Well, with pure CUDA codes you might as well continue with GNU environment.
However, if you ever intend to with OpenACC (as well as CUDA), you need the PGI compiler. It will generate you GPU-code from OpenACC directives. The GNU compiler
is currently unable to do that (this may come sometime in 2015, though).

The batch job script is pretty similar in PGI environment than in GNU environment. Different module needs to be loaded. And executable name is also different.

#!/bin/bash
#SBATCH -N 1
#SBATCH -p gpu
#SBATCH -t 00:05:00
#SBATCH -J daxpypgi
#SBATCH -o daxpypgi.out.%j
#SBATCH -e daxpypgi.out.%j
#SBATCH --gres=gpu:2
#SBATCH --exclusive
#SBATCH

module purge
module load pgi/14.4 cuda/6.0
module list

set -xe

cd ${SLURM_SUBMIT_DIR:-.}
pwd

ldd ./daxpy.x.pgi

./daxpy.x.pgi

The output is pretty much what we have expected :

Currently Loaded Modules:
  1) pgi/14.4    2) cuda/6.0
+ cd /wrk/sbs/GuideBull/daxpy/cuda_pgi
+ pwd
/wrk/sbs/GuideBull/daxpy/cuda_pgi
+ ldd ./daxpy.x.pgi
        linux-vdso.so.1 =>  (0x00007fff96dff000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f1c3adcc000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f1c3abae000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f1c3a9aa000)
        libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f1c3a6a4000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f1c3a41f000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f1c3a209000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f1c39e75000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f1c3afe5000)
+ ./daxpy.x.pgi
n=134217728 : vlen=256 : griddim = 524288 1 1 : blockdim.x = 256 1 1
daxpy(n=134217728): sum=9.0072e+15 : check_sum=9.0072e+15 : diff=0

 

2.5.4 Running OpenACC version under PGI and CUDA 6.0 environment

In order to run GPU applications built on using OpenACC directives, you need to keep your PGI environment on as well as CUDA 6.0 active.
As an example, we will show ho to run DAXPY C-version from the previous chapter, followed by running the Fortran version (output basically identical).

In both cases our SLURM batch script looks as follows:

#!/bin/bash
#SBATCH -N 1
#SBATCH -p gpu
#SBATCH -t 00:05:00
#SBATCH -J daxpyacc
#SBATCH -o daxpyacc.out.%j
#SBATCH -e daxpyacc.out.%j
#SBATCH --gres=gpu:2
#SBATCH --exclusive
#SBATCH

module purge
module load pgi/14.4 cuda/6.0
module list

set -xe

cd ${SLURM_SUBMIT_DIR:-.}
pwd

ldd ./daxpy.x.acc

./daxpy.x.acc

2.5.4.1 Output from the C/C++ & OpenACC version of DAXPY 

Output from C-version of OpenACC DAXPY is following:

Currently Loaded Modules:
  1) pgi/14.4    2) cuda/6.0
+ cd /wrk/sbs/GuideBull/daxpy/openacc_pgi/C
+ pwd
/wrk/sbs/GuideBull/daxpy/openacc_pgi/C
+ ldd ./daxpy.x.acc
        linux-vdso.so.1 =>  (0x00007fff32bff000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f965894f000)
        libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x00007f9658745000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f9658528000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f96582a4000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f9657f0f000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f9658b64000)
+ ./daxpy.x.acc
daxpy(n=134217728): sum=9.0072e+15 : check_sum=9.0072e+15 : diff=0

2.5.4.2 Output from the Fortran/OpenACC version of DAXPY

As for the Fortran version, the output looks pretty similar – apart from the ldd -part :

Currently Loaded Modules:
  1) pgi/14.4    2) cuda/6.0
+ cd /wrk/sbs/GuideBull/daxpy/openacc_pgi/Fortran
+ pwd
/wrk/sbs/GuideBull/daxpy/openacc_pgi/Fortran
+ ldd ./daxpy.x.acc
        linux-vdso.so.1 =>  (0x00007fffd07a1000)
        libaccapi.so => /homeappl/appl_taito/opt/pgi/14.4/linux86-64/14.4/libso/libaccapi.so (0x00007f61e488e000)
        libaccg.so => /homeappl/appl_taito/opt/pgi/14.4/linux86-64/14.4/libso/libaccg.so (0x00007f61e4779000)
        libaccn.so => /homeappl/appl_taito/opt/pgi/14.4/linux86-64/14.4/libso/libaccn.so (0x00007f61e4662000)
        libaccg2.so => /homeappl/appl_taito/opt/pgi/14.4/linux86-64/14.4/libso/libaccg2.so (0x00007f61e4557000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f61e4342000)
        libpgmp.so => /homeappl/appl_taito/opt/pgi/14.4/linux86-64/14.4/libso/libpgmp.so (0x00007f61e41c4000)
        libnuma.so => /homeappl/appl_taito/opt/pgi/14.4/linux86-64/14.4/libso/libnuma.so (0x00007f61e40c3000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f61e3ea5000)
        libpgf90.so => /homeappl/appl_taito/opt/pgi/14.4/linux86-64/14.4/libso/libpgf90.so (0x00007f61e3a2d000)
        libpgf90_rpm1.so => /homeappl/appl_taito/opt/pgi/14.4/linux86-64/14.4/libso/libpgf90_rpm1.so (0x00007f61e392b000)
        libpgf902.so => /homeappl/appl_taito/opt/pgi/14.4/linux86-64/14.4/libso/libpgf902.so (0x00007f61e3817000)
        libpgf90rtl.so => /homeappl/appl_taito/opt/pgi/14.4/linux86-64/14.4/libso/libpgf90rtl.so (0x00007f61e36f2000)
        libpgftnrtl.so => /homeappl/appl_taito/opt/pgi/14.4/linux86-64/14.4/libso/libpgftnrtl.so (0x00007f61e35be000)
        libpgc.so => /homeappl/appl_taito/opt/pgi/14.4/linux86-64/14.4/libso/libpgc.so (0x00007f61e344e000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f61e3246000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f61e2fc2000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f61e2c2d000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f61e49a1000)
+ ./daxpy.x.acc
daxpy(n=134217728): sum= 0.90072E+16 : check_sum= 0.90072E+16 : diff=  0.0000

 

 

  • No labels