Skip to content
Snippets Groups Projects
Unverified Commit a316dd87 authored by thorstenhater's avatar thorstenhater Committed by GitHub
Browse files

Performance/copy to swap (#1027)

Remove a redundant copy in favor of a swap operation for a gain in performance;
especially on GPU since copies are synchronous. Similarly, instead of solving the 
linear system into an intermediate array, write output directly into the target.

Here is the effect on the busyring benchmark (swapped pas -> hh) with 8192 cells on a
V100 GPU (time for model-run in seconds).
```
|----------+--------------------------------+------------------------------------|
| Baseline | fvm_lowered_cell: copy -> swap | matrix: solve + copy -> solve_into |
|----------+--------------------------------+------------------------------------|
|    2.230 |                          2.199 |                              2.129 |
|    2.231 |                          2.209 |                              2.132 |
|    2.225 |                          2.209 |                              2.136 |
|    2.227 |                          2.186 |                              2.130 |
|    2.220 |                          2.204 |                              2.133 |
|----------+--------------------------------+------------------------------------|
|     2.22 |                          2.186 |                              2.129 |
|----------+--------------------------------+------------------------------------|
```
parent a9f0b2fa
No related branches found
No related tags found
No related merge requests found
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment