Improve reduce by key GPU performance. (#301)
Optimized reduce by key used by the GPU back end when accumulating synapse current contributions to compartment currents. This leads to significant speedup in the miniapp for cells with few compartments and many synapses. * Implement `gpu::reduce_by_key` device function that uses warp intrinsics to perform reduction between threads in a warp before using a global atomic update to store the result. * Add unit tests for `reduce_by_key` functionality. * Add micro benchmarks that compare against using CUDA atomics. * Modify `CudaPrinter` modcc class to emit `reduce_by_key` in place of `cudaAtomicAdd` functions. Some improvements to meter reporting: * Shorten names of metering regions in miniapp to make them easier to grep. * JSON is no longer used as an intermediate data type when gathering distributed meters into a single report, instead conversion to JSON is performed just before writing to file. * Add a print function for summarizing meter results t...
Showing
- miniapp/miniapp.cpp 11 additions, 4 deletionsminiapp/miniapp.cpp
- modcc/cudaprinter.cpp 7 additions, 4 deletionsmodcc/cudaprinter.cpp
- src/backends/gpu/kernels/reduce_by_key.hpp 132 additions, 0 deletionssrc/backends/gpu/kernels/reduce_by_key.hpp
- src/profiling/memory_meter.cpp 2 additions, 2 deletionssrc/profiling/memory_meter.cpp
- src/profiling/meter_manager.cpp 58 additions, 27 deletionssrc/profiling/meter_manager.cpp
- src/profiling/meter_manager.hpp 13 additions, 6 deletionssrc/profiling/meter_manager.hpp
- tests/ubench/CMakeLists.txt 7 additions, 3 deletionstests/ubench/CMakeLists.txt
- tests/ubench/README.md 58 additions, 0 deletionstests/ubench/README.md
- tests/ubench/cuda_compare_and_reduce.cu 6 additions, 8 deletionstests/ubench/cuda_compare_and_reduce.cu
- tests/ubench/cuda_reduce_by_key.cu 129 additions, 0 deletionstests/ubench/cuda_reduce_by_key.cu
- tests/unit/CMakeLists.txt 1 addition, 0 deletionstests/unit/CMakeLists.txt
- tests/unit/test_reduce_by_key.cu 105 additions, 0 deletionstests/unit/test_reduce_by_key.cu
Please register or sign in to comment