Commits · af15856d944937d008f08b2d1e6a0b69a926c8bc · arbor-sim / arbor

Nov 27, 2018

Workaround for CMake 3.12 bug passing -thread to nvcc (#649) · af15856d

Sam Yates authored 6 years ago and

Benjamin Cumming committed 6 years ago

CMake wants to run a device link pass with nvcc despite
there being no CUDA seperable compilation enabled anywhere,
and then passes on -pthread to that unnecessary nvcc
invocation when we use the Threads dependency. The latter,
at least, is fixed in CMake 3.13.

We used the prefer -pthread option for compatibility with
our earlier build configuration; turning it off will
hopefully have no consequence.

We also enable device linking on the arbor library. Which
is not needed, but if they are going to insist on doing it,
it should be on the library rather than the executable.

CMake then goes and does it on the executable anyway. Great.

Fixes #645.

af15856d

Nov 21, 2018
- Forward cuda header paths to host compiler (#652) · 276baf03
  Benjamin Cumming authored 6 years ago and Sam Yates committed 6 years ago
```
* Forward CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES to compilation of arbor library and unit tests.

Fixes #651
```
  276baf03
Nov 13, 2018

squashed merge for fine matrix solver · 0b7f88ca
Felix Huber authored 6 years ago and Benjamin Cumming committed 6 years ago

0b7f88ca
Revert "Squashed merge for fine matrix solver (#640)" · 67b70a80
Sam Yates authored 6 years ago and Benjamin Cumming committed 6 years ago
```
This reverts commit be2a8a9f.
```
67b70a80

Squashed merge for fine matrix solver (#640) · be2a8a9f

Benjamin Cumming authored 6 years ago and

Sam Yates committed 6 years ago

Add a new Hines matrix solver implementation for the GPU that can solve a single tree in parallel with multiple threads. It replaces the interleaved solver, which used a single thread to solve each matrix.
Branches with the same common root in the tree can be solved independently on each of the forward and backward solution passes. 

* Add a matrix storage type, `arb::gpu::matrix_state_fine` that stores the branches of multiple trees for efficient backward and forward substitution.
* Extend the `arb::tree` data structure to support operations for choosing a new root node and determining a root node which minimises the maximum distance between the root and any of the trees leaves. 
* Implement code for rebalancing a set of matrix trees, a.k.a. a "forest" of trees.
* Add CUDA kernels for efficiently performing matrix assembly and matrix solution steps.
* Add CMake option `ARB_WITH_GPU_FINE_MATRIX` for toggling the new solver (default `on`).

be2a8a9f

Oct 15, 2018
- Rename 'aux' namespace and paths to 'sup'. (#625) · e0203f34
  Sam Yates authored 6 years ago and Benjamin Cumming committed 6 years ago
```
Fixes #622.
```
  e0203f34
Oct 12, 2018

Bump version post 0.1 for development. (#623) · 3f3cd9f9

Sam Yates authored 6 years ago

cf. CMake issue 16716: https://gitlab.kitware.com/cmake/cmake/issues/16716

* Bump version post 0.1 for development.
* Read version string from file VERSION.
* Strip suffix to make a numerical, CMake-compatible PROJECT_VERSION.

3f3cd9f9

Smaller default build; check MPI support via find_package component. (#619) · 28e45aee

Sam Yates authored 6 years ago and

Benjamin Cumming committed 6 years ago

Fixes #618 and fixes #617.

*  Add convenience targets: 'examples' for all examples; 'tests' for all tests.
* Add support for component-testing in installed CMake package.
* Allow test for MPI support via find_package via component.
* Remove REQUIRED specification from `find_dependency()` commands in generated config.
* Update `mech_vec.cpp` to match new `fvm_lowered_cell_impl` constructor.

v0.1

28e45aee

Oct 10, 2018

Add installable CMake config for arbor (#616) · 7ade5c26

Sam Yates authored 6 years ago and

Benjamin Cumming committed 6 years ago

Fixes #612.

* Fix issues with permissions on directories created at install time (at least for CMake 3.11+).
* Add CMake export guff to various targets and install an `arbor-config.cmake` for consumption by other CMake-based projects.

7ade5c26

Oct 01, 2018
- Add CMake options for V100 support (#608) · 2334ada8
  noraabiakar authored 6 years ago and Benjamin Cumming committed 6 years ago
```
Add CMake options for V100 support. fixes #605
```
  2334ada8
Aug 22, 2018

Create gpu_context and manage it as part of execution_context (#566) · 2c135d75

noraabiakar authored 6 years ago and

Sam Yates committed 6 years ago

* Add gpu_context as part of execution context containing information about GPU availability, managed_memory synchronization, and atomic double availability.
* Choose between ON and OFF for ARB_GPU in CMake. If ON compile for K20, K80, and P100

Note that we still need compile time information about the GPU in cuda_atomic.hpp for atomicAdd(double*, double*). This is because the function is only defined when the program is compiled  for sm_60 or more.

2c135d75

Aug 20, 2018

Global temperature for NMODL mechanisms. (#565) · fa0d7aef

Sam Yates authored 6 years ago and

Benjamin Cumming committed 6 years ago

Global temperature for mechanisms.

* Make 'celsius' magic in modcc: now an indexed variable.
* Add a new temperature data source for indexed variables.
* Add support to printers for indexed variables that reference a scalar.
* Check that indexed variables aren't used in PROCEDURE blocks (this is a problem not just for 'celsius').
* Modify built-in mod files to pass celsius as a parameter to rates() procedures.
* Add global temperature to shared_state classes, and initialize through backend mechanism superclasses.
* Add some infrastructure for unit-test only mechanisms.
* Set modcc flags globally in top level CMakeLists.txt.
* Add test mechanism/module for checking celsius setting.
* Add unit test for multicore and gpu mechanism celsius setting.
* Make common mechanism private field data access helper for unit tests.
* Use helper in temperature, synapses tests.
* Fix warning in `distribued_context.hpp` about errant semicolon.
* Fix global scal...

fa0d7aef

Jul 24, 2018

task_system as part of an execution_context (#537) · 7a6c1031

noraabiakar authored 6 years ago and

Benjamin Cumming committed 6 years ago

- Task system is no longer a single system private to the implementation of the threading backend and used everywhere. A separate task_system can be used (with a specified number of threads) for every simulation.
- arb::execution_context is the interface to task_system  and the previously defined distributed_context
- TBB and serial support has been removed. Cthreads is the only threading backend available.

7a6c1031

Jul 06, 2018

Migrate source/build to c++14 ... · 3ee79191

Sam Yates authored 6 years ago and

Benjamin Cumming committed 6 years ago

Migrate source/build to c++14                                                                                                                    (#522)

* Update `CMakeLists.txt` for C++14 option.
* Update to gcc 6 minimum.
* Update travis CI from gcc-5 to gcc-6
* Use `std::..._t` style type traits, replacing `util::` aliases.
* Use `std::cbegin`, `std::cend`, and `std::make_unique`, replacing `util::` versions.
* Remove `DEDUCED_RETURN_TYPE` macros.
* Remove redundant return type specifications.
* Use correct ADL for `begin` and `end` in (almost all) the range utilities.
* Remove redundant `mechinfo` ctor (aggregate initialization suffices).
* Use lambda capture initializers where appropriate.
* Use generic `std::equal_to`.
* Use variable templates for `math::infinity` and `math::pi`.
* Remove `enum_hash` workaround.
* Use `""s` string literals where we were using our own `""_s` construction.
* Use generic lambda for recursive lambda instead of `std::function` wrapper.
* Use generic lambda for generic arithmetic tests.

Fixes #358.

3ee79191

Jul 05, 2018

Test for xlC and refuse to build with it. (#519) · 775fe807

Sam Yates authored 6 years ago and

Benjamin Cumming committed 6 years ago

Fixes issue #517.

Deprecate the IBM xlC compiler.
xlC generates code that is an order of a magnitude slower than gcc, while generating spurious warnings, and requiring hacks and workarounds to pass all tests.
Supporting it makes no sense.

* Add test and fatal error for xlC detection in CheckCompilerXLC.cmake.
* Move xlC 13 misdetection work around to CheckCompilerXLC.cmake.
* Remove xlC-specific compatibility workarounds from code.

775fe807

Jun 25, 2018

Feature/lib install target part i (#506) · ad1c78ab

Sam Yates authored 6 years ago and

Benjamin Cumming committed 6 years ago

CMake and build refactoring

*   Use CUDA as first-class language (leading to CMake 3.9 minimum version requirement).

*   Use 'modern CMake' interface libraries for compiler options, include file and library dependency tracking. Interface library targets:
    * `arbor-deps`: compiler options and library requirements for the `libarbor.a` static library, as governed by configure-time options and environment.
    * `arbor-private-headers`: include path for non-installed headers, as required by unit tests and arbor itself.
    * `arbor-aux`: helper classes and utilities used across tests and examples.
    * `ext-json`, `ext-tclap`, `ext-tbb`, `ext-benchmark`, `ext-sphinx_rtd_theme`: externally maintained software that we include (directly or via submodule) in the `ext/` subdirectory.
 
*   Single static library `libarbor.a` includes all built-in modules and CUDA objects.

*   Simply configuration options:
    *  `ARB_WITH_TRACE`, `ARB_AUTORUN_MODCC_ON_CHANGES` `ARB_SYSTEM_TYPE` removed.
    * External `modcc` is provided by `ARB_MODCC` configuration option; if provided `modcc` is still buildable, but is not included in the default target.
    * `ARB_PRIVATE_TBBLIB`, defaulting to `OFF`, instructs the build to make TBB from the included submodule.

*   Extend `ErrorTarget` functionality to provide a dummy target or an error target based on a condition.
*   Generate header version defines and library version variables based on git status and project version, via new script `include/git-source-id`.
*   All generated binaries now placed in `bin/` subdirectory at build.
*   Install targets installs: public headers (incomplete); static library; `modcc` tool; `lmorpho` executable; `html` documentation (examples, tests and validation data are currently not installed).
*   Executable targets have had the `.exe` suffix removed; unit tests are labelled `unit` (arbor unit tests), `unit-modcc` (modcc unit tests), `unit-local` (distributed tests with local context), `unit-mpi` (distributed tests with MPI context).
*   More graceful handling of configure-time detection of `nrniv`, Julia and required Julia modules for validation data generation.
*   Add `cmake/FindJulia.cmake`, `cmake/FindTBB.cmake`  package finders, and adjust `cmake/FindUnwind.cmake` to use link library-style properties.
*  Adjust travis script to test `unit-local` and `unit-mpi` if appropriate.
*  Simply documentation `conf.py`.

Source relocation and reorganization

* All external project sources and files moved to `ext/`.
* Source code refactoring to decouple library-using code from the configure-time definitions that govern arbor behaviour: removes conditional code in public headers that depends upon `ARB_WITH_X`-type definitions at compile time. Affected code is is in the public interfaces for MPI, the threading implementation, and the profiler.
* Remove `util/debug.hpp`; split out functionality for pretty-printing from assertion handling.
* Make FVM cell non-physical voltage check a run-time cell-group parameter.
* Move spike double buffer implementation to `simulation.cpp`.
* Make timer utility wrap POSIX `clock_gettime` independent of threading configuration.
* Make `mpi_error` derive from `system_error` and follow C++11 `system_error` semantics.
* `EXPECTS` macro replaced by `arb_assert` macro.
* JSON dependency removed from `libarbor.a` and header files: moved to auxiliary library.
* Publicly visible macros garner an `ARB_` prefix as required.
* Move SWC test file to `test/unit` directory.
* Work-in-progress splitting of public from private includes: as a convention not entirely adhered to as yet, private headers within arbor source are included with `""`, public headers with `<>`.

Modcc interface changes

* Expose via `--namespace` option the functionality that sets the namespace in generated code.
* Use `--profile` option to add profiler hooks to generated code; uses public function interface directly rather than `PE/PL` macros in order to avoid public `PE` and `PL` defines.

ad1c78ab

Jun 01, 2018

Runtime distributed context (#485) · 5fde0b00

Benjamin Cumming authored 6 years ago and

Sam Yates committed 6 years ago

Move from choosing the distributed communication model from a compile time choice (the old `arb::communication::communication_policy` type) to a run time decision.

* Add `arb::distributed_context` class that provides the required interface for distributed communication implementations, using type-erasure to provide value semantics.
* Add two implementations for the distributed context: `arb::mpi_context` and `arb::local_context`.
* Allow distribution over a user-supplied MPI communicator by providing it as an argument to `arb::mpi_context`.
* Add `mpi_error` exception type to wrap MPI errors.
* Move contents of the `arb::communication` namespace to the `arb` namespace.
* Add preprocessor for-each utility `ARB_PP_FOREACH`.
* Rewrite all examples and tests to use the new distributed context interface.
* Add documentation for distributed context class and semantics, and update documentation for load balancer and simulation classes accordingly.

Fixes #472

5fde0b00

May 09, 2018

Mechanism Refactor: multicore and simd (#484) · 68135148

Sam Yates authored 6 years ago

First commit of two for mechanism refactor work (refer to PR #484 and PR #483).

FVM/mechanism code:
* Refactor mechanism data structures to decouple backend-specific implementations and mechanism metadata.
* Add mechanism catalogue for managing mechanism metadata and concrete implementation prototypes.
* Add fingerprint-checking to mechanism metadata and implementations to confirm they come from the same NMODL source (fingerprint is not yet computed, but tests are in place).
* Split FVM discretization work out from FVM integrator code.
* Use abstract base class over backend-templated FVM integrator class `fvm_lowered_cell_impl` to allow separate compilation of `mc_cell_group` and to remove the dummy backend code.
* Add a new FVM-specific scalar type `fvm_index_type` that is an alias for `int` to replace
`fvm_size_type` in fvm layouts and mechanisms. This was chosen as an alternative
to making `unsigned` versions of all our SIMD implementation classes.
* Extend `cable1d_neuron` global data to encompass: mechanism catalogue; default ion concentrations and charges; global temperature (only for Nernst); initial membrane potential.

Modcc:
* Collect printer sources in modcc under `printer/`.
* Move common functionality across printers into `printer/printerutil.{hpp,cpp}`.
* Add string to file I/O implemented in routines read_all and write_all in `io/bulkio.hpp`.
* Implement indent-friendly source code generation via a `std::streambuf` filter `io::prefixbuf` defined in `io/prefixbuf.hpp`, together with manipulators and a corresponding std::ostream-derived wrapper.
* Rewrite printers to use new infrastructure: cpu target incorporates SIMD printing options; CUDA printer at this point produces only stubs for CUDA kernel wrappers.
* Modify SIMD printing command line options for modcc: `-s` enables explicit vectorization using the SIMD classes;  `-S <N>` allows a specific data width to be prescribed.
* Fix problem in `test_ca.mod` with uninitialized ion current.
* Add infrastructure support to allow future pre-computation of SIMD index conflict cases for (hopefully) faster scatters and updates.
* Simplify `IndexedVariable` expressions in the AST, making data source explicit via a `sourceKind` enum, and leaving the indexing method and index names up to the printers.
* Allow state variables in the AST to 'shadow' an ion concentration — these are assigned in the
generated `write_ions` method.

SIMD classes:
* Add `simd_cast` operation between SIMD value types of the same width, and with `std::array`. (Note: this was tested and used in an early development version of the code, but not in this version. It was still a lacuna in the original SIMD wrappers, so it has been left in.)
* Restructure SIMD gather/scatter API to use a `simd::indirect` expression,  which encapsulates a pointer and SIMD offset.
* Add `simd::index_constraint` scoped enum to describe knowledge of contention in indirect indices, so that we can branch on this to the appropriate implementation.
* Add SIMD concrete implementation routines `reduce_add` for horizontal reduction and `element0` for access to first lane scalar value.
* Add SIMD value method `sum()` that exposes implementation `reduce_add`.
* Add SIMD concrete implementation routine `compound_indexed_add` that provides the implementation for `indirect(p, simd_indices) += simd_value` construction.
* Fix SIMD `implbase` bug where some static methods were using the `implbase` fall-back functions instead of the derived class specialized implementations.
* Move SIMD mathematical functions into friend routines of `simd_impl` in order to resolve implicit conversions from scalars in mixed SIMD-scalar operations.
* Use a templated `tag` class to dispatch on SIMD concrete implementation types, to avoid problems with incomplete types in method signatures.
* Remove old SIMD intrinsics.

CMake infrastructure:
* Downcase some variables in `CMakeLists.txt` files to  distinguish them visually from CMake keywords and variables.
* Split arbor modcc vectorization option (now `ARB_VECTORIZE`) and target-architecture optimization (now `ARB_ARCH`).
* For `arbor` and `arbormech` targets, and in particular not the `modcc` target, use `ARB_ARCH` to generate corresponding target-appropriate binaries, including, for example, appropriate SIMD support.
* Extend `CompilerOptions.cmake` to map as best as able between the various target architecture names (we use the gcc names) and the correct option to pass to the compiler based on the compiler and platform.
* Add work-around for misidentification by CMake of XL C as Clang.
* As a temporary work-around, include `arbormech` library twice on link line to resolve circular arbor–arbormech dependencies.

Unit tests:
* Extend repertoire of generic sequence equality/near equality testing support  in `common.hpp`.
* Add warning suppression for icc for the malloc instrumentation code.
* SIMD unit tests for indirect expressions, compound indirect add, reduction.
* Make some exact tests into floating point 'near' tests when comparing computed areas and lengths in swc and fvm layout tests, to account for compiler (e.g. icc) performing semantically inequivalent floating point operation reordering or fusion at `-O3`.
* Split out some of the CUDA tests into separate .cpp/.cu files for  separate-compilation purposes.

Other:
* The `padded_allocator` has been modified to propagate alignment/padding on move and copy (these semantics make their use much easier and safer in the multicore mechanism instantiation code).
* Map/table searching utilities in `util/maputil.hpp`.
* Fixes for correct sequence type categorization and `begin/end` ADL.
* Fixes for type guards for range methods that take universal references.
* Removal of some redundant code in range utilities through the use of universal references.
* Add new range view `reverse_view` for ranges delineated by bidirectional iterators.
* Add single argument form of `make_span` to count up from zero, and associated helper `count_along` that gives a span that indexes a supplied container.
* Moved `prefixbuf` to `modcc` source.
* Make sequence positive and negative tests in algorithms generic.
* Add `private`-subverting helper code/macro to `tests/unit/common.hpp` to reduce the number of public testing-only interfaces in the library code.
* Add virtual destructors for virtual base classes.
* Add new arb::math:: functions: `next_pow2` for unsigned integral types, `round_up` to round a number away from zero to next largest magnitude multiple.
* New `index_into` implementation that supports bidirectional access (moved to `util::` namespace).
* Fix problem in `test_ca.mod` with uninitialized ion current.
* Rework dangerous `memory::array(Iter, Iter)` constructor to be less dangerous (and do the expected thing).
* Allow ranges to be constructed from other ranges if the iterators are compatible.

68135148

Apr 11, 2018

Fix support for Keplar (K20 & K80) GPUs. (#470) · 6b659a39

Ben Cumming authored 6 years ago and

Sam Yates committed 6 years ago

Fixes issue #467 

* Add GPU synchronization points where required for Kepler to coordinate CPU access of managed memory.
* Use hand-rolled double precision atomic addition for Kelper targets.
* Replace `ARB_WITH_CUDA` build option with `ARB_GPU_MODEL` option that takes one of 'none', 'K20', 'K80' or 'P100', and set up source-code defines accoringly.
* Clean up of redundant compiler flags and defines no longer required now that the project uses separate compilation for CUDA sources.

6b659a39

Mar 27, 2018

Installation Guide (#459) · 0cf65a4c

Ben Cumming authored 7 years ago

Added an installation guide to the Read The Docs
Removed the outdated build/install information from README.md
Link from README to Read The Docs
Updated the splash page for Read The Docs

0cf65a4c

wrap warp intrinsics to fix depricated warnings (#456) · 7e6ea389

Ben Cumming authored 7 years ago

CUDA 9 introduced new, fine-grained, thread synchronization primitives.
In doing so, it introduced new forms of the warp intrinsics like __shfl_up, depricating the old symbols in the process.

It will be a while before we can use 9 as the default minimum, so we have to support compilers that expect the new and old behavior.

There are two options: wrap the intrinsics in question, or pass nvcc a flag to not issue warnings about depricated symbols. I go for the approach of wrapping, because I would rather keep the compiler warning turned on.

Fixes #379.

7e6ea389

Mar 20, 2018
- change project name to arbor in CMakeLists (#455) · 3019ae1e
  Ben Cumming authored 7 years ago
```
fixes #446.
```
  3019ae1e
Mar 16, 2018

SIMD wrappers for Arbor generated mechanisms. (#450) · 2dff9c41

Sam Yates authored 7 years ago

This provides a bunch of SIMD intrinsic wrappers as a precursor to the SIMD printers.

The aim is that the SIMD printer can be agnostic regarding the particular vector architecture.

The design is based rather loosely on the proposal P0214R6 for C++ Parallelism TS 2. The transcendental function implementations are adapted from the existing SIMD architecture-specific code, which in turn are based on the Cephes library algorithms.

The custom CSS for the html documentation have been tweaked.

2dff9c41

update tbb cmake to use check_git_submodule (#452) · 5fe81e83
Ben Cumming authored 7 years ago

5fe81e83

refactor git submodule support in cmake (#448) · 4c66432f

Ben Cumming authored 7 years ago

In some places our CMake scripts were attempting to check out git submodules when required, if they have not already been checked out. The code that does this was cut and pasted, and was getting unwieldy.

To minimise the responsibilities of CMake, this PR

removes calls to git
introduces a function check_git_submodule that can be used to test if a git submodule is installed, and print a helpful message that informs the user how to check it out if needed.
introduces a function add_error_target that makes a target that prints a message then quits with an error. This can be used to generate a proxy target when a problem is detected during CMake setup. This means that an error is only generated when building a target with a missing dependency, instead of an error during CMake setup.
refactors the CMake setup for the docs and ubenches targets to use these new features.

4c66432f

Mar 15, 2018

Improve TBB vs. CMake (#451) · 459d6562

Ben Cumming authored 7 years ago

This replaces the CMake templates provided by TBB with a much more sane alternative!

The TBB CMake templates had a very strange workflow, that involved downloading the TBB source and compiling it, which made it impossible to configure the TBB build, and caused problems on systems without connection to the internet.

We replace this with a fork of the TBB repository maintained by Github user @wjakob:
https://github.com/wjakob/tbb
This fork provides a sane CMakeLists.txt that can be configured from our CMake setup.
It is added as a git submodule, so it can be downloaded with the rest of the repository, hence not requiring connection to the internet during CMake configuration.

It could be extended to use a user-provided build of TBB to use instead of building it.

fixes #332.

459d6562

Dec 20, 2017

Move miniapps path to 'example/' (#423) · f94f0eab

Ben Cumming authored 7 years ago and

Sam Yates committed 7 years ago

* Rename `miniapps` subdirectory to `example`.
* Have all example executables be built under `example` in the build directory.
* Update Travis CI to run miniapp from new path.

f94f0eab

Add granule cell mechanisms (#421) · a80df6fa

Ben Cumming authored 7 years ago and

Sam Yates committed 7 years ago

* Add three new mechanisms: `nax.mod`, `kdrmt.mod` and `kamt.mod`.
* Add new built-in math operators to `modcc`: `min`, `max`, `abs` and `exprelr`. `exprelr` is defined as the reciprocal of the 'exprel' function, exprel(x)=x/(exp(x)-1), exprel(1)=1. This function occurs frequently in HH-style mechanisms, and having a built-in operator avoids the ad hoc `vtrap` functions found in NMODL files in the wild.
* Split Arbor SIMD intrinsics support into AVX2- and AVX512-specific files.
* Add unit tests for new maths operators for C++, SIMD and CUDA implementations.

a80df6fa

Nov 30, 2017

Move miniapp sources to a miniapps directory (#408) · 3b5b0386

Wouter Klijn authored 7 years ago

Restructure the miniapp in such a way that we have to option to have multiple parallel mini-applications.

Move the original miniapp directory to a miniapps directory.
Output executable also in a nested miniapps directory.
Update the Travis to point to the new executable location.

3b5b0386

Nov 28, 2017

Tidy `modcc` driver, remove optimize flag. (#404) · 998ee724

Sam Yates authored 7 years ago

* Remove optimization option (use SIMD options for vectorization).
* Remove arbor utility library dependencies from modcc (pending separation of utility lib from arbor lib source).
* Split target (cpu, gpu) specification from vectorization architecture (avx2, avx512).
* Remove `Options` singleton; replace with structure local to `modcc.cpp`.
* Tidy `modcc` option parsing and main function; allow a single invocation of `modcc` to generate code for multiple backends.
* Rename generated sources to include backend target in filename.
* Always run a constant simplification pass on generated procedures.
* Remove file i/o code from `Module` and `modcc` main function; move functionality to new functions in `io` namespace. (Note: in on-going mechanism revamp, other i/o utility code will reside in the `io` namespace and subdirectory.)
* Remove classes `ConstantFolderVisitor` and `ExpressionClassifierVisitor` that are no longer used.
* Modify CMakeLists.txt files, `backends/*/fvm.cpp` to reflect the new filenames of generated sources.
* Small formatting changes in `modcc` source to reflect coding guidelines (incomplete).

998ee724

Sep 28, 2017

Rename NestMC references, names etc. to Arbor. (#363) · d9f38b2a

Sam Yates authored 7 years ago

* Use ARB_ and arb_ as variable prefixes in place of NMC_ and nmc_.
* Replace references to 'NestMC' and 'NEST MC' to refer instead to Arbor.
* Use 'arbor' as the sim name in generated validation data.
* Reflow long-line paragraphs in `tests/ubench/README.md`.
* Change names of CUDA mechanism and CUDA kernel libraries to include arbor name.

d9f38b2a

Sep 21, 2017

Seperable compilation of mechanism kernels on GPU (#353) · 3c283219

Ben Cumming authored 7 years ago and

Sam Yates committed 7 years ago

Separable compilation of the CUDA kernels generated by modcc from NMODL files.

CMake scripts:
* Update the `build_modules()` helper function to cleanly handle calls to modcc that generate multiple output files.
* Add a new library target `gpu_mechanisms` for the separately compiled CUDA kernels and the implementation of their C wrappers.
* Reduce verbosity of compilation messages.

* Simplify mechanism C++ namespace use: move everything in nest::mc::mechanisms::gpu::_mechanism-name_ into `nest::mc::gpu`, and similarly for multicore mechanism implementations, ions.
* Remove template parameters for `value_type` and `size_type` from all of the 
mechanism implementations, and use `fvm_value_type` and `fvm_size_type` everywhere instead.

modcc changes:
* Modify `CUDAPrinter` to keep track of 3 text buffers, one each for 
  "implementation", "interface" and "implementation interface":
* Write the CUDA implementation interface to `X_impl.hpp`, comprising the definition of the mechanism-specific 'X_ParamParck' struct used to pass function arguments to the CUDA kernels.
* Write the CUDA kernels and C wrappers to `X_impl.cu`.
* Write the public C++ mechanism interface (with calls to implementation wrappers) to `X.hpp`.
* Modify modcc driver to support multiple generated output files.

3c283219

Sep 20, 2017

Stand alone CUDA compilation for threshold_watcher in gpu backend (#345) · 180a7ace

Ben Cumming authored 7 years ago and

Sam Yates committed 7 years ago

Refactor the threshold_watcher and stack data structures in the gpu backend so that they are amenable to separable compilation.

* Make `gpu::stack<T>` have a host-only interface that wraps a POD type `gpu::stack_base<T>`.
* Implement a `push_back(stack_base, value)` method in `backends/gpu/kernels/stack.hpp` that is visible only to device code.
* Move `test_thresholds` kernel to a .cu file, replacing template parameters with types provided by `backends/fvm_types.hpp`.
* Add a simple C function interface, callable from host side code, defined in `backends/gpu/threshold_common.hpp`.
* Simplify the `gpu::impl::padded_size` function (both to read and in terms of efficiency).
* Use `typeid` as the default for pretty-printing types in the memory back end.
* Update the `test_gpu_stack` unit test to support new gpu stack interface.
* Fix bug in the `test_spikes` unit test, which was not running the GPU back end in the cuda unit tests.

180a7ace

Sep 11, 2017

Basic CI support with TravisCI (#340) · 137c5b5f

Ben Cumming authored 7 years ago

Add support for continuous integration with Travis CI.
This implements bare bones support that can be extended over time.

Travis CI test environments:

    All use gcc 5.
    Test the serial distributed back end with serial and cthread threading backends.
    Test mpi with cthread.
    The tbb test failed sporadically because CMake, so it is disabled for now.

The test script:

    Builds the unit tests, global_communication tests and miniapp.
    Asserts that all unit and global_communication tests pass.
    Asserts that the miniapp runs successfully.
        does not test miniapp output for now.

There is plenty of scope for improving the tests.
A key improvement will be to use validated output for the validation and miniapp
to provide some validation.

There were some small fixes required to make the tests pass on Travis

    communication/mpi.hpp now sets default size and rank values of 1 and 0 respectively
    to allow all unit tests to pass when built with MPI.
    The wrappers around MPI API calls use const_cast to support MPI implementations that
    are not "const aware".
    A missing header was added to tests/unit/test_range to make std::unordered_multimap
    available.`

137c5b5f

Aug 24, 2017

Basic Sphinx Documentation (#328) · 610fd857

Ben Cumming authored 7 years ago and

Sam Yates committed 7 years ago

Adds support for building documentation with Sphinx from reStructuredText-formatted files in the `doc` subdirectory. Automatic building has been verified with ReadTheDocs.

* Add basic documentation to the `doc` path.
* Use a git submodule and associated CMake to pull in ReadTheDocs theme at configuration time.

610fd857

Aug 18, 2017

Better TBB CMake integration (#331) · 6dce9fa4

Ben Cumming authored 7 years ago and

Sam Yates committed 7 years ago

* Add support for CMake scripts provided by TBB.
* Update required cmake version to 3.0.

* hack to get linking to work on Cray PE

* improve comments and remove redundant include in CMakeLists

* firewall the tbb cmake files

* tbb threading back end to_string includes version number

6dce9fa4

Jun 15, 2017

AVX512 CMake target (#288) · 153aeaee

Vasileios Karakasis authored 7 years ago

Adds a new AVX512 target for processors supporting only the core AVX512 functionality, which currently means SkyLake Xeon processors.

153aeaee

May 15, 2017

Fix incorrect GPU backend determination · 18098783

Ben Cumming authored 7 years ago and

Sam Yates committed 7 years ago

Fixes #266.

Use CUDA to compile the `cell_group_factory` so that the CUDA back end is compiled correctly, instead of the null back end proxy.
  * Added bonus: the miniapp is now compiled using host C++ compiler instead of `nvcc`.

This is a little bit hacky, because this is a stop gap until we have separate compilation of CUDA code.

18098783

Apr 18, 2017

Add power meter and refactor meter interfaces. · 99a0b1c8

Ben Cumming authored 7 years ago and

Sam Yates committed 7 years ago

Fixes #190.

The final piece in the metering features.

* Add a `power_meter` which currently records energy used on each node of a Cray XC{30,40,50} systems, which all have built in `pm_counters` interface to power measurement.
* Add information about which node each MPI rank runs on to the metering output in `meters.json`, which is needed to analyse energy recordings, which are per node, not per MPI rank.
* Refactor collation of measurements: now the responsibility of the meter manager.
* Add support for `gather` with `std::string` to the global communication policy, which required a back end MPI implementation and corresponding unit test.
* Add `src/util/config.hpp` that populate the `nest::mc::config` namespace with `constexpr bool` flags describing system or environment capabilities.

99a0b1c8

Mar 29, 2017

Change default threading model from serial to cthread (#214) · 9a6a551e

Ben Cumming authored 8 years ago and

Sam Yates committed 8 years ago

Fixes #212.

* Update the main `CMakeLists.txt` file to select the cthread back end by default, and present the threading options in the order: cthread, tbb, serial.

9a6a551e