fix gpu warp intrinsics (#2247) (21f50290) · Commits · arbor-sim / arbor

Unverified Commit 21f50290 authored 1 year ago by

boeschf Committed by GitHub 1 year ago

fix gpu warp intrinsics (#2247)

`reduce_by_key` depends on warp level intrinsics to transfer values
between different threads (lanes) participating in the reduction. The
pertinent intrinsic is `__shfl_down_sync` which is accessed through
Arbor's wrapper function `shfl_down`. However, the contribution from
each thread to the reduction was erroneously truncated to an integer
value. This PR fixes the signature of the respective wrapper functions
and modifies the unit test in order to check that floating point
reductions are not truncated.
While cleaning up the cuda code path, the workaround using two 32-bit
shuffle instructions for 64 bit data types (doubles) was removed - this
was probably a leftover from cuda versions prior to 9.0.

parent d4579b12

No related branches found

No related tags found

No related merge requests found

Hide whitespace changes

Inline Side-by-side

Showing with 23 additions and 32 deletions

Please register or to comment