Gpu/restrict all the things (#1026)
Make all pointer arguments to kernels `__restrict__` to avoid unnecessary loads. The effect on the busyring benchmark (swapped pas -> hh) with 8192 cells on a V100 GPU (time for model-run in seconds): ``` |----------+-------| | Baseline | After | |----------+-------| | 2.347 | 2.268 | | 2.345 | 2.262 | | 2.321 | 2.276 | | 2.323 | 2.267 | | 2.330 | 2.249 | |----------+-------| | 2.321 | 2.249 | |----------+-------| ```
Showing
- arbor/backends/gpu/matrix_fine.cu 29 additions, 23 deletionsarbor/backends/gpu/matrix_fine.cu
- arbor/backends/gpu/multi_event_stream.cu 16 additions, 16 deletionsarbor/backends/gpu/multi_event_stream.cu
- arbor/backends/gpu/shared_state.cu 18 additions, 7 deletionsarbor/backends/gpu/shared_state.cu
- arbor/backends/gpu/threshold_watcher.cu 13 additions, 5 deletionsarbor/backends/gpu/threshold_watcher.cu
- arbor/memory/fill.cu 1 addition, 1 deletionarbor/memory/fill.cu
Please register or sign in to comment