AdaptiveCpp Notes

AdaptiveCpp is pretty fun to use. I’ll add some more notes after I use it more.

Install instructions (Arch Linux)

Oops! After writing this section, I can’t make it to compile. LLVM18 can’t compile with latest stdc++ headers.

This is for HIP (dispite being called ROCM):

doas pacman -Sy --needed cmake ninja boost hip-runtime-amd llvm18 clang18 libxml2

git clone https://github.com/AdaptiveCpp/AdaptiveCpp --depth 1
cd AdaptiveCpp
cmake -B build -G Ninja -DWITH_ROCM_BACKEND=ON -DLLVM_DIR=/usr/lib/llvm18/lib/cmake/llvm -DCLANG_EXECUTABLE_PATH=/usr/lib/llvm18/bin/clang++
cd build/
ninja
doas ninja install
acpp

For CUDA:

doas pacman -Sy --needed cuda

# ...

cmake -B build -G Ninja -DWITH_CUDA_BACKEND=ON -DLLVM_DIR=/usr/lib/llvm18/lib/cmake/llvm -DCLANG_EXECUTABLE_PATH=/usr/lib/llvm18/bin/clang++

You can also use the generic backend, which offloads compilation to program run-time.

It needs hipcc or clang + ROCm installed to compile code to AMD CPU.

cmake -B build -G Ninja -DLLVM_DIR=/usr/lib/llvm18/lib/cmake/llvm -DCLANG_EXECUTABLE_PATH=/usr/lib/llvm18/bin/clang++ -DCLANG_INCLUDE_PATH=/usr/lib/llvm18/include

Disclaimer for “single pass”

This as also advertised as “the ability to generate a single binary that can offload to all supported devices”.

Take AMD as an example. acpp still need to run hipcc to compile the LLVM IR embedded in the host executable.

  • acpp --acpp-targets=hip: AOT
  • acpp --acpp-targets=generic: JIT

There is no difference in the total number of compilers needed to compile/run your program. You still need HIP installed on the target machine to run the final executable.

They say JIT is faster.

Do not use sycl::buffer

I have a headache using SYCL buffers and accessors. Never use buffers.

Nvidia vs AMD

AMD: After installing hip-runtime-amd, the thing works out of the box.

Nvidia: You need a CUDA version supported by nvc++. I can’t get this to work.

Thoughts after an hour of use

SYCL is C++-friendly. I hope I can use it in other languages like the clik concurrency model.

The AdaptiveCpp compiler is implemeted as a plugin (compile pass) inside clang++. The compiler is not very good. If you use a inline static function inside a kernel, it would be only 90% as fast on CPU. Apply the patch below:

diff --git a/examples/bruteforce_nbody/bruteforce_nbody.cpp b/examples/bruteforce_nbody/bruteforce_nbody.cpp
index 37755ac..f37a50e 100644
--- a/examples/bruteforce_nbody/bruteforce_nbody.cpp
+++ b/examples/bruteforce_nbody/bruteforce_nbody.cpp
@@ -19,6 +19,7 @@
 #include "model.hpp"
 
 
+
 arithmetic_type mirror_position(const arithmetic_type mirror_pos,
                                 const arithmetic_type position)
 {
@@ -36,6 +37,13 @@ int get_num_iterations_per_output_step()
 }
 
 
+inline static void ff(particle_type &p, vector_type v, float dt) {
+        // Update position
+        p.x() += v.x() * dt;
+        p.y() += v.y() * dt;
+        p.z() += v.z() * dt;
+}
+
 int main()
 {
   const int iterations_per_output =
@@ -192,10 +200,7 @@ int main()
         // Bring v to the current state
         v += acceleration * dt;
 
-        // Update position
-        p.x() += v.x() * dt;
-        p.y() += v.y() * dt;
-        p.z() += v.z() * dt;
+        ff(p, v, dt);
 
         // Reflect particle position and invert velocities
         // if particles exit the simulation cube

Inside src/compiler/llvm-to-backend/amdgpu/LLVMToAmdgpu.cpp, you can see LLVM bitcode being linked together.

I hope future programs won’t need two LLVM-based compilers to run.

Still, this is the best way I know to write code to run on GPUs.

Thoughts about GPU compilers

With that said, it isn’t impossible to generate PTX and amdgcn.

ILGPU can produce PTX.

Neither acpp or hipcc/clang++ can compile C99 code to run on GPU. F

References

My experiments with SYCL: https://git.envs.net/iacore/test-sycl/

SYCL Reference: https://github.khronos.org/SYCL_Reference/

SYCL Cheatsheet: https://www.khronos.org/files/sycl/sycl-2020-reference-guide.pdf

AMDGCN: https://gpuopen.com/wp-content/uploads/2016/08/AMD_GCN3_Instruction_Set_Architecture_rev1.1.pdf

Explaining AMD GPU architecture: https://rocm.docs.amd.com/projects/HIP/en/latest/understand/programming_model.html