Skip to content
Snippets Groups Projects
Unverified Commit 5f9f2a5a authored by thorstenhater's avatar thorstenhater Committed by GitHub
Browse files

Gpu/fuse set dt (#1025)

Fuse kernels `gather` and `vec_minus` into a single kernel `set_dt_impl` for a small
performance improvement.

Here is the effect on the busyring benchmark (swapped pas -> hh) with 8192 cells on a 
V100 GPU (time for `model-run` in seconds).

```
|----------+-------|
| Baseline | After |
|----------+-------|
|    2.318 | 2.314 |
|    2.335 | 2.307 |
|    2.345 | 2.315 |
|    2.333 | 2.306 |
|    2.331 | 2.320 |
|----------+-------|
|    2.318 | 2.306 |
|----------+-------|
```
parent d0aaf5ee
No related branches found
No related tags found
No related merge requests found
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment