Performance/copy to swap (#1027) (a316dd87) · Commits · arbor-sim / arbor

Unverified Commit a316dd87 authored 5 years ago by

thorstenhater Committed by GitHub 5 years ago

Performance/copy to swap (#1027)

Remove a redundant copy in favor of a swap operation for a gain in performance;
especially on GPU since copies are synchronous. Similarly, instead of solving the 
linear system into an intermediate array, write output directly into the target.

Here is the effect on the busyring benchmark (swapped pas -> hh) with 8192 cells on a
V100 GPU (time for model-run in seconds).
```
|----------+--------------------------------+------------------------------------|
| Baseline | fvm_lowered_cell: copy -> swap | matrix: solve + copy -> solve_into |
|----------+--------------------------------+------------------------------------|
|    2.230 |                          2.199 |                              2.129 |
|    2.231 |                          2.209 |                              2.132 |
|    2.225 |                          2.209 |                              2.136 |
|    2.227 |                          2.186 |                              2.130 |
|    2.220 |                          2.204 |                              2.133 |
|----------+--------------------------------+------------------------------------|
|     2.22 |                          2.186 |                              2.129 |
|----------+--------------------------------+------------------------------------|
```

parent a9f0b2fa

No related branches found

No related tags found

No related merge requests found

Hide whitespace changes

Inline Side-by-side

Showing with 68 additions and 68 deletions

Please register or to comment