Commits · 64e3d18e629fa5a4d3fcc7974c7f582c510ff2d8 · arbor-sim / arbor

Jan 22, 2019

Output accumulated meter values in meter printer (#672) · 64e3d18e

* Print a `meter-total` value of the accumulated meter values across all checkpoints to the meter `ostream` output.
* Add default behavior that prints the mean of a meter across all ranks if the meter type is not one of the known special cases (i.e. not one of time, memory or energy).

Fixes #671

Unverified

64e3d18e

Bugfix: SIMD indirect unit tests + indirect add (#674) · f824112b

Sam Yates authored 6 years ago

* Fix bug in optimized (scalar) unconstrained indirect addition.
* Fix bug in indirect arithmetic and scatter tests that tested only a subset of the test data.

Unverified

f824112b

Jan 18, 2019

Optimize vectorized compound_indexed_add (#673) · c23c694a

noraabiakar authored 6 years ago and

Sam Yates committed 6 years ago

* Optimize "none" index_constraint specialization of compound_indexed_add, so that it only reads/writes each distinct memory index once per vector.

Related to issue #637.

c23c694a

Jan 14, 2019

fix avx512 gather specialization (#670) · 5574e05f

noraabiakar authored 6 years ago and

Sam Yates committed 6 years ago

* Fix incorrect specialization of AVX512 gather in SIMD library.

Related to #637.
Improves avx512 performance on Intel Xeon Gold 6130.

5574e05f

Dec 18, 2018

mpi-gpu affinity part II (#659) · cfee0abd
Benjamin Cumming authored 6 years ago and Sam Yates committed 6 years ago
```
Extend sup library to support assigning unique GPUs to MPI ranks.

Fixes #648.
```
cfee0abd

Wrap std::function for sup::on_scope_exit. (#665) · f0b5892c

Sam Yates authored 6 years ago

* Provide a helper wrapper for use behind the scenes in the
implementation of `sup::on_scope_exit` so that we can work around
`std::function` not being nothrow move constructible (and maintaining
the nothrow move on the `sup::scope_exit` structure).

Fixes #664.

Unverified

f0b5892c

Dec 17, 2018

Assertion fix (#663) · 6db581c1

noraabiakar authored 6 years ago and

Sam Yates committed 6 years ago

Events arrive already sorted first by index then by time. 
* Remove sort by event index.
* Replace assertion that events are sorted by time with assertion that they are sorted by index. Assertion that the subrange of events with the same index is sorted by time already exists.

6db581c1

Dec 05, 2018

Refactor hardware detection to sup (#654) · 712070f1

Benjamin Cumming authored 6 years ago and

Sam Yates committed 6 years ago

Refactoring that moves the logic for determining available concurrency and available GPUs from the core Arbor library to the sup library. This also constitutes work towards providing functionality for allocating GPUs to particular ranks when multiple GPUs are visible per rank.

* Move core/thread estimation code to sup library.
* Change default resource behaviour to use one thread and no GPU.
* Provide an interface in the sup library for: acquiring a default GPU; for coordinating an allocation of GPUs across multiple MPI ranks.

712070f1

Nov 29, 2018

Fix thread-GPU affinity bug. (#656) · 5e3865cf

Benjamin Cumming authored 6 years ago

Ensure that all threads use the same GPU, which wasn't the case before.

* add `gpu_context::set_gpu()` method that will set all subsequent GPU calls from the calling thread run on the GPU of `gpu_context`.
* `fvm_lowered_cell_impl` now calls the `set_gpu` method on construction and `advance`.
* Also changed GPU memory allocation errors in `arb::memory` to throw `arb_exception` instead of calling `std::terminate` on error. Now errors due to poor GPU configuration can be caught by the calling application, and unit tests fail gracefully and allow other tests to run.

Fixes #655

Unverified

5e3865cf

Nov 27, 2018

Workaround for CMake 3.12 bug passing -thread to nvcc (#649) · af15856d

Sam Yates authored 6 years ago and

Benjamin Cumming committed 6 years ago

CMake wants to run a device link pass with nvcc despite
there being no CUDA seperable compilation enabled anywhere,
and then passes on -pthread to that unnecessary nvcc
invocation when we use the Threads dependency. The latter,
at least, is fixed in CMake 3.13.

We used the prefer -pthread option for compatibility with
our earlier build configuration; turning it off will
hopefully have no consequence.

We also enable device linking on the arbor library. Which
is not needed, but if they are going to insist on doing it,
it should be on the library rather than the executable.

CMake then goes and does it on the executable anyway. Great.

Fixes #645.

af15856d

Nov 21, 2018
- Forward cuda header paths to host compiler (#652) · 276baf03
  Benjamin Cumming authored 6 years ago and Sam Yates committed 6 years ago
```
* Forward CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES to compilation of arbor library and unit tests.

Fixes #651
```
  276baf03
Nov 13, 2018

squashed merge for fine matrix solver · 0b7f88ca
Felix Huber authored 6 years ago and Benjamin Cumming committed 6 years ago

0b7f88ca
Revert "Squashed merge for fine matrix solver (#640)" · 67b70a80
Sam Yates authored 6 years ago and Benjamin Cumming committed 6 years ago
```
This reverts commit be2a8a9f.
```
67b70a80

Squashed merge for fine matrix solver (#640) · be2a8a9f

Benjamin Cumming authored 6 years ago and

Sam Yates committed 6 years ago

Add a new Hines matrix solver implementation for the GPU that can solve a single tree in parallel with multiple threads. It replaces the interleaved solver, which used a single thread to solve each matrix.
Branches with the same common root in the tree can be solved independently on each of the forward and backward solution passes. 

* Add a matrix storage type, `arb::gpu::matrix_state_fine` that stores the branches of multiple trees for efficient backward and forward substitution.
* Extend the `arb::tree` data structure to support operations for choosing a new root node and determining a root node which minimises the maximum distance between the root and any of the trees leaves. 
* Implement code for rebalancing a set of matrix trees, a.k.a. a "forest" of trees.
* Add CUDA kernels for efficiently performing matrix assembly and matrix solution steps.
* Add CMake option `ARB_WITH_GPU_FINE_MATRIX` for toggling the new solver (default `on`).

be2a8a9f

Oct 16, 2018
- Further python3 fixes for tsplot (#630) · dfc2b673
  Sam Yates authored 6 years ago and Benjamin Cumming committed 6 years ago
  
  dfc2b673
Oct 15, 2018

Patch up Julia scripts for Julia 1.0 (#629) · c822f8b9

Sam Yates authored 6 years ago and

Benjamin Cumming committed 6 years ago

* Use `Unitful.uconvert` for scalar conversions (Float64 cast apparently does not work at the moment).
* Use .+ for scalar/array addition.
* Replace `immutable` with `struct`.
* Qualify included modules with `Main.` for using statements.
* Add informational note to FindJulia as component identification can take a long time as Julia may compile them from source.

c822f8b9

update html links in README to point to new arbor-sim (#628) · 7bd98a2a
Benjamin Cumming authored 6 years ago
```
fixes #627
```
7bd98a2a
Rename 'aux' namespace and paths to 'sup'. (#625) · e0203f34
Sam Yates authored 6 years ago and Benjamin Cumming committed 6 years ago
```
Fixes #622.
```
e0203f34

Oct 12, 2018

Make tsplot python2/python3 compatible. (#624) · 004d6737

Sam Yates authored 6 years ago

* Use python3 version of print.
* Use dict update method instead of item concatenation, as in Python3 dict.items() no longer returns a list.

Unverified

004d6737

Bump version post 0.1 for development. (#623) · 3f3cd9f9

Sam Yates authored 6 years ago

cf. CMake issue 16716: https://gitlab.kitware.com/cmake/cmake/issues/16716

* Bump version post 0.1 for development.
* Read version string from file VERSION.
* Strip suffix to make a numerical, CMake-compatible PROJECT_VERSION.

Unverified

3f3cd9f9

Smaller default build; check MPI support via find_package component. (#619) · 28e45aee

Sam Yates authored 6 years ago and

Benjamin Cumming committed 6 years ago

Fixes #618 and fixes #617.

*  Add convenience targets: 'examples' for all examples; 'tests' for all tests.
* Add support for component-testing in installed CMake package.
* Allow test for MPI support via find_package via component.
* Remove REQUIRED specification from `find_dependency()` commands in generated config.
* Update `mech_vec.cpp` to match new `fvm_lowered_cell_impl` constructor.

v0.1

28e45aee

Oct 11, 2018

fix weights in ring benchmark (#620) · 51fb4f3a

Benjamin Cumming authored 6 years ago and

Sam Yates committed 6 years ago

Fix potential numeric instabilities in the ring benchmark caused by passing arguments to an event generator in the wrong order.

51fb4f3a

Oct 10, 2018

Add installable CMake config for arbor (#616) · 7ade5c26

Sam Yates authored 6 years ago and

Benjamin Cumming committed 6 years ago

Fixes #612.

* Fix issues with permissions on directories created at install time (at least for CMake 3.11+).
* Add CMake export guff to various targets and install an `arbor-config.cmake` for consumption by other CMake-based projects.

7ade5c26

Oct 04, 2018

Extend ring (#611) · 488ece0c

Benjamin Cumming authored 6 years ago

Extend the ring benchmark to have an optional number of synapses attached to each cell, instead of a fixed count of one synapse per cell.
This doesn't change the behavior of the model: only the first synapse is used for communication. The other synapses only effect is to
increase the per-cell computational overheads, to more effectively mimic real world performance.

Unverified

488ece0c

Oct 03, 2018

pass correct index to the NMODL procedures (#610) · face9915

noraabiakar authored 6 years ago and

Benjamin Cumming committed 6 years ago

Fixes an error in vectorized kernels that sees the incorrect index passed to PROCEDURE calls.
The loop index variable was being passed, instead of the pack of vector indexes.

Fixes #609

face9915

Oct 01, 2018

Add CMake options for V100 support (#608) · 2334ada8
noraabiakar authored 6 years ago and Benjamin Cumming committed 6 years ago
```
Add CMake options for V100 support. fixes #605
```
2334ada8
Fix GPU installation (#607) · 9129b2eb
noraabiakar authored 6 years ago and Benjamin Cumming committed 6 years ago
```
Updates the install docs. Fixes #604
```
9129b2eb

Integrating Mac OS X and clang compiler into Travis CI (#601) · e755a420

akuesters authored 6 years ago and

Benjamin Cumming committed 6 years ago

changes: 
- .travis.yml:
  - added matrix for different osx's, since enumeration style only works for `env` and `compiler`

- scripts/travis/build.sh:
  - changed getting compiler version from ``${CXX} -dumpversion`` to ``${CXX} --version | grep -m1 ""`` 
  - added `--oversubscribe` flag to `mpiexec` on Mac to allow more processes on a node than processing elements
  - added `--mca btl tcp,self` flag for Open MPI to use the "tcp" and "self" BTLs for transporting MPI messages on Mac

e755a420

Fix double throw of captured exception in thread group. (#606) · d6aec81a

Sam Yates authored 6 years ago and

Benjamin Cumming committed 6 years ago

Fixes #603.

* Clear exception pointer in exception_state helper class after move of state.
* Rename exception_state::get() method to reset().
* Call std::terminate() if task_group is destroyed before tasks are collected with wait().
* Do not attempt to collect tasks in destructor for task_group.
* Do not attempt to rethrow exception in destructor for exception_state.
* Add unit test to verify correct exception behaviour when a task_group is runs and waits on a series of tasks.
* Add unit test for terminate behaviour as above.

Code quality fix ups:
* Remove unused warning variable warning in threading exception tests.
* Address if-statement spacing in threading.hpp.
* Use ARB_HAVE_MPI in execution_context.cpp instead of introducing a dependency on generated version header via feature macro ARB_MPI_ENABLED.

d6aec81a

Sep 26, 2018

Threading exceptions (#595) · b5662870

noraabiakar authored 6 years ago and

Benjamin Cumming committed 6 years ago

Propagate exceptions generated in `task_group` tasks on different threads in the threading backend, so that they are thrown on the main thread on `task_group.wait()`.

Add tests that verify that exceptions are propagated correctly.

Fixes #310.

b5662870

Sep 19, 2018
- Fixed warnings of signed-unsigned integer comparison in unit tests · ad26b114
  akuesters authored 6 years ago and Benjamin Cumming committed 6 years ago
  
  ad26b114
Sep 18, 2018
- Remove explicilt template specialization of dry_run_info (#599) · 8a81de71
  akuesters authored 6 years ago and Sam Yates committed 6 years ago
```
Fixes compilation error with clang.
```
  8a81de71
Sep 17, 2018

Dry-run mode (#582) · a2b39382

noraabiakar authored 6 years ago and

Benjamin Cumming committed 6 years ago

Dry-run mode: 
* An implementation of distributed_context that is used to mimic the performance of running an MPI distributed simulation with n ranks.
* Verifiable against an MPI run with the same parameters. 

Implementation: 
* Describe the model on a single domain (tile) and translate it to however many domains we want to mimic using arb::tile and arb::symmetric_recipe. This allows us to know the exact behavior of the entire system by only running the simulation on a single node.
* Mimic communication between domains using arb::dry_run_context

Example: 
* dryrun in example/ is a verifiable example of using dry-run mode with mc_cells

Other:
* Documentation of dry-run mode 
* unit test for dry_run_context

a2b39382

Sep 07, 2018
- removed the explicilt template specialization for compilation of MPI back end with clang (#593) · 2ff590ea
  akuesters authored 6 years ago and Benjamin Cumming committed 6 years ago
```
fixes #591
```
  2ff590ea
- repair compiler warnings with AppleClang (#592) · 6c89c7cd
  Benjamin Cumming authored 6 years ago
```
Turns out that CMake thinks Clang and AppleClang are different things.
```
  Unverified
  
  6c89c7cd
Sep 06, 2018

Clarify vectorization-enabled build errors. (#588) · 1ffccf2d

Sam Yates authored 6 years ago and

Benjamin Cumming committed 6 years ago

Fixes #587.

* Eliminate Clang warnings from GCC-tree-optimization bug work-around.
* Error with static-assert if simd type is used with a missing simd abi.
* Clarify install documentation regarding use of ARB_VECTORIZE with ARB_ARCH.

1ffccf2d

Sep 05, 2018
- Tweak fix for CUDA not-enabled with ARB_ARCH specification. (#586) · f8da6eaf
  Sam Yates authored 6 years ago and Benjamin Cumming committed 6 years ago
```
Fixes #584.

* Add CUDA compile guard generator expression to architecture options iff CUDA is an enabled language.
```
  f8da6eaf
- Only make CUDA -march workaround if compiling with CUDA target (#585) · 2d9980cc
  Benjamin Cumming authored 6 years ago
```
Fixes #584.
```
  Unverified
  
  2d9980cc
Sep 01, 2018
- Profiler fix (#580) · 2059c285
  noraabiakar authored 6 years ago and Benjamin Cumming committed 6 years ago
```
Remove redundant profiler calls that caused crashes when using event generators.
```
  2059c285
Aug 30, 2018

Opaque Public Context (#576) · d637c8bc

Benjamin Cumming authored 6 years ago

Make the execution context presented to users an opaque handle, moving all implementation of the gpu, thread and distributed contexts into the back end.

* move `execution_context` and `distributed_context` definitions to the back end
* create `execution_context` handle called `context` in the public API
* provide `make_context` helper functions that build different context configurations (default, user-specified local resources, with MPI)
* update documentation for all parts of the public API that touch contexts
* move `distributed_context` docs to the developer documentation (from the public API docs)

Unverified

d637c8bc