Commits · af15856d944937d008f08b2d1e6a0b69a926c8bc · arbor-sim / arbor

Nov 27, 2018

Workaround for CMake 3.12 bug passing -thread to nvcc (#649) · af15856d

Sam Yates authored 6 years ago and

Benjamin Cumming committed 6 years ago

CMake wants to run a device link pass with nvcc despite
there being no CUDA seperable compilation enabled anywhere,
and then passes on -pthread to that unnecessary nvcc
invocation when we use the Threads dependency. The latter,
at least, is fixed in CMake 3.13.

We used the prefer -pthread option for compatibility with
our earlier build configuration; turning it off will
hopefully have no consequence.

We also enable device linking on the arbor library. Which
is not needed, but if they are going to insist on doing it,
it should be on the library rather than the executable.

CMake then goes and does it on the executable anyway. Great.

Fixes #645.

af15856d

Nov 21, 2018
- Forward cuda header paths to host compiler (#652) · 276baf03
  Benjamin Cumming authored 6 years ago and Sam Yates committed 6 years ago
```
* Forward CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES to compilation of arbor library and unit tests.

Fixes #651
```
  276baf03
Nov 13, 2018

squashed merge for fine matrix solver · 0b7f88ca
Felix Huber authored 6 years ago and Benjamin Cumming committed 6 years ago

0b7f88ca
Revert "Squashed merge for fine matrix solver (#640)" · 67b70a80
Sam Yates authored 6 years ago and Benjamin Cumming committed 6 years ago
```
This reverts commit be2a8a9f.
```
67b70a80

Squashed merge for fine matrix solver (#640) · be2a8a9f

Benjamin Cumming authored 6 years ago and

Sam Yates committed 6 years ago

Add a new Hines matrix solver implementation for the GPU that can solve a single tree in parallel with multiple threads. It replaces the interleaved solver, which used a single thread to solve each matrix.
Branches with the same common root in the tree can be solved independently on each of the forward and backward solution passes. 

* Add a matrix storage type, `arb::gpu::matrix_state_fine` that stores the branches of multiple trees for efficient backward and forward substitution.
* Extend the `arb::tree` data structure to support operations for choosing a new root node and determining a root node which minimises the maximum distance between the root and any of the trees leaves. 
* Implement code for rebalancing a set of matrix trees, a.k.a. a "forest" of trees.
* Add CUDA kernels for efficiently performing matrix assembly and matrix solution steps.
* Add CMake option `ARB_WITH_GPU_FINE_MATRIX` for toggling the new solver (default `on`).

be2a8a9f

Oct 16, 2018
- Further python3 fixes for tsplot (#630) · dfc2b673
  Sam Yates authored 6 years ago and Benjamin Cumming committed 6 years ago
  
  dfc2b673
Oct 15, 2018

Patch up Julia scripts for Julia 1.0 (#629) · c822f8b9

Sam Yates authored 6 years ago and

Benjamin Cumming committed 6 years ago

* Use `Unitful.uconvert` for scalar conversions (Float64 cast apparently does not work at the moment).
* Use .+ for scalar/array addition.
* Replace `immutable` with `struct`.
* Qualify included modules with `Main.` for using statements.
* Add informational note to FindJulia as component identification can take a long time as Julia may compile them from source.

c822f8b9

update html links in README to point to new arbor-sim (#628) · 7bd98a2a
Benjamin Cumming authored 6 years ago
```
fixes #627
```
7bd98a2a
Rename 'aux' namespace and paths to 'sup'. (#625) · e0203f34
Sam Yates authored 6 years ago and Benjamin Cumming committed 6 years ago
```
Fixes #622.
```
e0203f34

Oct 12, 2018

Make tsplot python2/python3 compatible. (#624) · 004d6737

Sam Yates authored 6 years ago

* Use python3 version of print.
* Use dict update method instead of item concatenation, as in Python3 dict.items() no longer returns a list.

004d6737

Bump version post 0.1 for development. (#623) · 3f3cd9f9

Sam Yates authored 6 years ago

cf. CMake issue 16716: https://gitlab.kitware.com/cmake/cmake/issues/16716

* Bump version post 0.1 for development.
* Read version string from file VERSION.
* Strip suffix to make a numerical, CMake-compatible PROJECT_VERSION.

3f3cd9f9

Smaller default build; check MPI support via find_package component. (#619) · 28e45aee

Sam Yates authored 6 years ago and

Benjamin Cumming committed 6 years ago

Fixes #618 and fixes #617.

*  Add convenience targets: 'examples' for all examples; 'tests' for all tests.
* Add support for component-testing in installed CMake package.
* Allow test for MPI support via find_package via component.
* Remove REQUIRED specification from `find_dependency()` commands in generated config.
* Update `mech_vec.cpp` to match new `fvm_lowered_cell_impl` constructor.

v0.1

28e45aee

Oct 11, 2018

fix weights in ring benchmark (#620) · 51fb4f3a

Benjamin Cumming authored 6 years ago and

Sam Yates committed 6 years ago

Fix potential numeric instabilities in the ring benchmark caused by passing arguments to an event generator in the wrong order.

51fb4f3a

Oct 10, 2018

Add installable CMake config for arbor (#616) · 7ade5c26

Sam Yates authored 6 years ago and

Benjamin Cumming committed 6 years ago

Fixes #612.

* Fix issues with permissions on directories created at install time (at least for CMake 3.11+).
* Add CMake export guff to various targets and install an `arbor-config.cmake` for consumption by other CMake-based projects.

7ade5c26

Oct 04, 2018

Extend ring (#611) · 488ece0c

Benjamin Cumming authored 6 years ago

Extend the ring benchmark to have an optional number of synapses attached to each cell, instead of a fixed count of one synapse per cell.
This doesn't change the behavior of the model: only the first synapse is used for communication. The other synapses only effect is to
increase the per-cell computational overheads, to more effectively mimic real world performance.

488ece0c

Oct 03, 2018

pass correct index to the NMODL procedures (#610) · face9915

noraabiakar authored 6 years ago and

Benjamin Cumming committed 6 years ago

Fixes an error in vectorized kernels that sees the incorrect index passed to PROCEDURE calls.
The loop index variable was being passed, instead of the pack of vector indexes.

Fixes #609

face9915

Oct 01, 2018

Add CMake options for V100 support (#608) · 2334ada8
noraabiakar authored 6 years ago and Benjamin Cumming committed 6 years ago
```
Add CMake options for V100 support. fixes #605
```
2334ada8
Fix GPU installation (#607) · 9129b2eb
noraabiakar authored 6 years ago and Benjamin Cumming committed 6 years ago
```
Updates the install docs. Fixes #604
```
9129b2eb

Integrating Mac OS X and clang compiler into Travis CI (#601) · e755a420

akuesters authored 6 years ago and

Benjamin Cumming committed 6 years ago

changes: 
- .travis.yml:
  - added matrix for different osx's, since enumeration style only works for `env` and `compiler`

- scripts/travis/build.sh:
  - changed getting compiler version from ``${CXX} -dumpversion`` to ``${CXX} --version | grep -m1 ""`` 
  - added `--oversubscribe` flag to `mpiexec` on Mac to allow more processes on a node than processing elements
  - added `--mca btl tcp,self` flag for Open MPI to use the "tcp" and "self" BTLs for transporting MPI messages on Mac

e755a420

Fix double throw of captured exception in thread group. (#606) · d6aec81a

Sam Yates authored 6 years ago and

Benjamin Cumming committed 6 years ago

Fixes #603.

* Clear exception pointer in exception_state helper class after move of state.
* Rename exception_state::get() method to reset().
* Call std::terminate() if task_group is destroyed before tasks are collected with wait().
* Do not attempt to collect tasks in destructor for task_group.
* Do not attempt to rethrow exception in destructor for exception_state.
* Add unit test to verify correct exception behaviour when a task_group is runs and waits on a series of tasks.
* Add unit test for terminate behaviour as above.

Code quality fix ups:
* Remove unused warning variable warning in threading exception tests.
* Address if-statement spacing in threading.hpp.
* Use ARB_HAVE_MPI in execution_context.cpp instead of introducing a dependency on generated version header via feature macro ARB_MPI_ENABLED.

d6aec81a

Sep 26, 2018

Threading exceptions (#595) · b5662870

noraabiakar authored 6 years ago and

Benjamin Cumming committed 6 years ago

Propagate exceptions generated in `task_group` tasks on different threads in the threading backend, so that they are thrown on the main thread on `task_group.wait()`.

Add tests that verify that exceptions are propagated correctly.

Fixes #310.

b5662870

Sep 19, 2018
- Fixed warnings of signed-unsigned integer comparison in unit tests · ad26b114
  akuesters authored 6 years ago and Benjamin Cumming committed 6 years ago
  
  ad26b114
Sep 18, 2018
- Remove explicilt template specialization of dry_run_info (#599) · 8a81de71
  akuesters authored 6 years ago and Sam Yates committed 6 years ago
```
Fixes compilation error with clang.
```
  8a81de71
Sep 17, 2018

Dry-run mode (#582) · a2b39382

noraabiakar authored 6 years ago and

Benjamin Cumming committed 6 years ago

Dry-run mode: 
* An implementation of distributed_context that is used to mimic the performance of running an MPI distributed simulation with n ranks.
* Verifiable against an MPI run with the same parameters. 

Implementation: 
* Describe the model on a single domain (tile) and translate it to however many domains we want to mimic using arb::tile and arb::symmetric_recipe. This allows us to know the exact behavior of the entire system by only running the simulation on a single node.
* Mimic communication between domains using arb::dry_run_context

Example: 
* dryrun in example/ is a verifiable example of using dry-run mode with mc_cells

Other:
* Documentation of dry-run mode 
* unit test for dry_run_context

a2b39382

Sep 07, 2018
- removed the explicilt template specialization for compilation of MPI back end with clang (#593) · 2ff590ea
  akuesters authored 6 years ago and Benjamin Cumming committed 6 years ago
```
fixes #591
```
  2ff590ea
- repair compiler warnings with AppleClang (#592) · 6c89c7cd
  Benjamin Cumming authored 6 years ago
```
Turns out that CMake thinks Clang and AppleClang are different things.
```
  6c89c7cd
Sep 06, 2018

Clarify vectorization-enabled build errors. (#588) · 1ffccf2d

Sam Yates authored 6 years ago and

Benjamin Cumming committed 6 years ago

Fixes #587.

* Eliminate Clang warnings from GCC-tree-optimization bug work-around.
* Error with static-assert if simd type is used with a missing simd abi.
* Clarify install documentation regarding use of ARB_VECTORIZE with ARB_ARCH.

1ffccf2d

Sep 05, 2018
- Tweak fix for CUDA not-enabled with ARB_ARCH specification. (#586) · f8da6eaf
  Sam Yates authored 6 years ago and Benjamin Cumming committed 6 years ago
```
Fixes #584.

* Add CUDA compile guard generator expression to architecture options iff CUDA is an enabled language.
```
  f8da6eaf
- Only make CUDA -march workaround if compiling with CUDA target (#585) · 2d9980cc
  Benjamin Cumming authored 6 years ago
```
Fixes #584.
```
  2d9980cc
Sep 01, 2018
- Profiler fix (#580) · 2059c285
  noraabiakar authored 6 years ago and Benjamin Cumming committed 6 years ago
```
Remove redundant profiler calls that caused crashes when using event generators.
```
  2059c285
Aug 30, 2018

Opaque Public Context (#576) · d637c8bc

Benjamin Cumming authored 6 years ago

Make the execution context presented to users an opaque handle, moving all implementation of the gpu, thread and distributed contexts into the back end.

* move `execution_context` and `distributed_context` definitions to the back end
* create `execution_context` handle called `context` in the public API
* provide `make_context` helper functions that build different context configurations (default, user-specified local resources, with MPI)
* update documentation for all parts of the public API that touch contexts
* move `distributed_context` docs to the developer documentation (from the public API docs)

d637c8bc

Aug 29, 2018
- Fix cpu architecture specification vs nvcc bug. (#578) · c14a6e35
  Sam Yates authored 6 years ago and Benjamin Cumming committed 6 years ago
```
Fixes #575.

* Guard CPU architecture option for nvcc with generator expression.
```
  c14a6e35
Aug 24, 2018

Ring benchmark (#570) · eb4ed472

Benjamin Cumming authored 6 years ago and

Sam Yates committed 6 years ago

* Add new ring benchmark to examples.
* Refactored common functionality for reading miniapp parameters from a json file to `aux` (used by both bench and ring).

Fixes #516.

eb4ed472

remove ARB_HAVE_GPU from header file (#574) · 38337981

Benjamin Cumming authored 6 years ago

Move implementation of `gpu_context` from header to `cpp` file, so that `ARB_WITH_CUDA` doesn't leak from library implementation.

38337981

Aug 22, 2018

Create gpu_context and manage it as part of execution_context (#566) · 2c135d75

noraabiakar authored 6 years ago and

Sam Yates committed 6 years ago

* Add gpu_context as part of execution context containing information about GPU availability, managed_memory synchronization, and atomic double availability.
* Choose between ON and OFF for ARB_GPU in CMake. If ON compile for K20, K80, and P100

Note that we still need compile time information about the GPU in cuda_atomic.hpp for atomicAdd(double*, double*). This is because the function is only defined when the program is compiled  for sm_60 or more.

2c135d75

Refine gcc version test for FMA work-around. (#573) · 30374945
Sam Yates authored 6 years ago and Benjamin Cumming committed 6 years ago
```
Fixes #568.
```
30374945
Update redundant info in installation docs (#567) · fd93306a
Benjamin Cumming authored 6 years ago and Sam Yates committed 6 years ago
```
Fixes #564
```
fd93306a

Work-around for gcc version < 8.2 versus std::fma (#572) · 120316d0

Sam Yates authored 6 years ago and

Benjamin Cumming committed 6 years ago

Use a compat::fma wrapper for std::fma to avoid a bug in the tree optimizer in GCC version < 8.2.

See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87046
Fixes #568.

120316d0

Aug 20, 2018

Global temperature for NMODL mechanisms. (#565) · fa0d7aef

Sam Yates authored 6 years ago and

Benjamin Cumming committed 6 years ago

Global temperature for mechanisms.

* Make 'celsius' magic in modcc: now an indexed variable.
* Add a new temperature data source for indexed variables.
* Add support to printers for indexed variables that reference a scalar.
* Check that indexed variables aren't used in PROCEDURE blocks (this is a problem not just for 'celsius').
* Modify built-in mod files to pass celsius as a parameter to rates() procedures.
* Add global temperature to shared_state classes, and initialize through backend mechanism superclasses.
* Add some infrastructure for unit-test only mechanisms.
* Set modcc flags globally in top level CMakeLists.txt.
* Add test mechanism/module for checking celsius setting.
* Add unit test for multicore and gpu mechanism celsius setting.
* Make common mechanism private field data access helper for unit tests.
* Use helper in temperature, synapses tests.
* Fix warning in `distribued_context.hpp` about errant semicolon.
* Fix global scalar ref for SIMD printing.
* Use correct ARB_CXXOPT_ARCH instead of incorrect CXXOPT_ARCH in various CMakeLists.txt files.
* Add special case for no-non scalar indexed variables in API loop in SIMD printing.

Fixes #386

fa0d7aef

Aug 06, 2018

Bugfix/build osx macports (#563) · 3bafa1b3

Sam Yates authored 6 years ago

Two MacPorts/gcc7 issues:

std::uint64_t is unsigned long long on OS X, breaking an assumption about size_t in the distributed_context interface.
Problems with missing errno defines in the standard library headers.
With MacPorts gcc7, the installed c++config.h defines _GLIBCXX_HAVE_EOWNERDEAD and _GLIBCXX_HAVE_ENOTRECOVERABLE, but the corresponding errno defines are not provided by
sys/errno.h unless __DARWIN_C_SOURCE, which takes its value from _POSIX_C_SOURCE if defined, is greater than or equal to 200809L. Technically a MacPorts configuration bug? but easily worked around.

Use basic integral types for communication collectives interfaces.
Define _POSIX_C_SOURCE to be 200809L for glob.cpp.
Fixes #562.

3bafa1b3