Commits · b068a9d859c9365d08703bc0a69bc42a40e2f09c · arbor-sim / arbor

Jul 10, 2018
- Do not build/use local modcc if ARB_MODCC set (#527) · b068a9d8
  Sam Yates authored 6 years ago and Benjamin Cumming committed 6 years ago
```
Fixes #526.
```
  b068a9d8
Jul 06, 2018

Migrate source/build to c++14 ... · 3ee79191

Sam Yates authored 6 years ago and

Benjamin Cumming committed 6 years ago

Migrate source/build to c++14                                                                                                                    (#522)

* Update `CMakeLists.txt` for C++14 option.
* Update to gcc 6 minimum.
* Update travis CI from gcc-5 to gcc-6
* Use `std::..._t` style type traits, replacing `util::` aliases.
* Use `std::cbegin`, `std::cend`, and `std::make_unique`, replacing `util::` versions.
* Remove `DEDUCED_RETURN_TYPE` macros.
* Remove redundant return type specifications.
* Use correct ADL for `begin` and `end` in (almost all) the range utilities.
* Remove redundant `mechinfo` ctor (aggregate initialization suffices).
* Use lambda capture initializers where appropriate.
* Use generic `std::equal_to`.
* Use variable templates for `math::infinity` and `math::pi`.
* Remove `enum_hash` workaround.
* Use `""s` string literals where we were using our own `""_s` construction.
* Use generic lambda for recursive lambda instead of `std::function` wrapper.
* Use generic lambda for generic arithmetic tests.

Fixes #358.

3ee79191

Jul 03, 2018

Move cell description types to public includes. (#508) · a1894edc

Sam Yates authored 6 years ago and

Benjamin Cumming committed 6 years ago

Further work to public install target.

* Move SIMD classes, cell description classes, simple sampler to public include.
* Rename `cell` to `mc_cell`, `segment` to `mc_segment`, and remove `_description` from cell description class names and includes.
* Move `compartment_model` out of `mc_cell` interface and use only in `fvm_layout.cpp`.
* (Provisionally) remove area/volume methods on `mc_cell` and `mc_segment`.

a1894edc

Jun 25, 2018

Feature/lib install target part i (#506) · ad1c78ab

Sam Yates authored 6 years ago and

Benjamin Cumming committed 6 years ago

CMake and build refactoring

*   Use CUDA as first-class language (leading to CMake 3.9 minimum version requirement).

*   Use 'modern CMake' interface libraries for compiler options, include file and library dependency tracking. Interface library targets:
    * `arbor-deps`: compiler options and library requirements for the `libarbor.a` static library, as governed by configure-time options and environment.
    * `arbor-private-headers`: include path for non-installed headers, as required by unit tests and arbor itself.
    * `arbor-aux`: helper classes and utilities used across tests and examples.
    * `ext-json`, `ext-tclap`, `ext-tbb`, `ext-benchmark`, `ext-sphinx_rtd_theme`: externally maintained software that we include (directly or via submodule) in the `ext/` subdirectory.
 
*   Single static library `libarbor.a` includes all built-in modules and CUDA objects.

*   Simply configuration options:
    *  `ARB_WITH_TRACE`, `ARB_AUTORUN_MODCC_ON_CHA...

ad1c78ab

Jun 07, 2018

profile multicore mechanism state and current calls individually (#492) · 5e65a939

Benjamin Cumming authored 6 years ago

The built in profiler generates timings for state and current for individual multicore mechanisms.

Modcc generates and PE(advance_integrate_{state,current}_X) profiler calls (along with corresponding PL() for calls to multicore mechanism nrn_state and nrn_current API calls.

No timings are made for the gpu back end, which is not properly supported by the current profiling tools.

5e65a939

Jun 04, 2018

Simd partition by constraint (#494) · 64171e43

noraabiakar authored 6 years ago and

Benjamin Cumming committed 6 years ago

Changes have been made to the simd implementation of mechansim functions:

- The node_index array (array of indices that specifies for each mechanism the CVs where it is present), is now partitioned into 4 arrays according to the constraint on each simd_vector in node_index:
1. contiguous array: contains the indices of all simd_vectors in node_index where the elements in simd_vector are contiguous
2. constant array: contains the indices of all simd_vectors in node_index where the elements in simd_vector are identical
3. independent array: contains the indices of all simd_vectors in node_index where the elements in simd_vector are independent (no repetitions) but not contiguous
4. none array: contains the indices of all simd_vectors in node_index where the none of the above constraints apply

When mechanism functions are executed, they loop over each of the 4 arrays separately. This allows for optimizations in every category.

- The modcc compiler was modified to generate code for the previous changes, including the optimizations per constraint:
1. contiguous array: we use vector load/store and vector arithmetic.
2. constant array: we load only one element and broadcast it into a simd_vector; we use vector arithmetic; we reduce the result; we store one element.
3. indepndent array: we use vector scatter/gather and vector arithmetic.
4. none array: we cannot operate on the simd_vector in parallel, we loop over the elements to read, perform arithmetic and write back

- Added a mechanism benchmark for pas, hh and expsyn

- Moved/modified some functions in simd.hpp to ensure that the correct implementation of a function is being called.

64171e43

May 15, 2018

Generic external variables for CUDA printer. (#490) · b07875e1

Sam Yates authored 6 years ago and

Benjamin Cumming committed 6 years ago

Replace hard-coded index variable names in modcc cuda printer with ones derived from the external variable.

Uses `decode_indexed_variable`.

b07875e1

May 09, 2018

CUDA back end for the new mechanism infrastructure (#487) · e0f0b5d7

Benjamin Cumming authored 6 years ago and

Sam Yates committed 6 years ago

Completes CUDA printing in modcc.
* Add CudaPrinter visitor, overriding CPrinter.
* Add `ostream` `operator<<` overloads for `arb::gpu::shared_state` and `device_view` for debugging.
* Fix GPU back-end bugs.

e0f0b5d7

Mechanism Refactor: multicore and simd (#484) · 68135148

Sam Yates authored 6 years ago

First commit of two for mechanism refactor work (refer to PR #484 and PR #483).

FVM/mechanism code:
* Refactor mechanism data structures to decouple backend-specific implementations and mechanism metadata.
* Add mechanism catalogue for managing mechanism metadata and concrete implementation prototypes.
* Add fingerprint-checking to mechanism metadata and implementations to confirm they come from the same NMODL source (fingerprint is not yet computed, but tests are in place).
* Split FVM discretization work out from FVM integrator code.
* Use abstract base class over backend-templated FVM integrator class `fvm_lowered_cell_impl` to allow separate compilation of `mc_cell_group` and to remove the dummy backend code.
* Add a new FVM-specific scalar type `fvm_index_type` that is an alias for `int` to replace
`fvm_size_type` in fvm layouts and mechanisms. This was chosen as an alternative
to making `unsigned` versions of all our SIMD implementation classes.
* Extend `cable1d_neuron` global data to encompass: mechanism catalogue; default ion concentrations and charges; global temperature (only for Nernst); initial membrane potential.

Modcc:
* Collect printer sources in modcc under `printer/`.
* Move common functionality across printers into `printer/printerutil.{hpp,cpp}`.
* Add string to file I/O implemented in routines read_all and write_all in `io/bulkio.hpp`.
* Implement indent-friendly source code generation via a `std::streambuf` filter `io::prefixbuf` defined in `io/prefixbuf.hpp`, together with manipulators and a corresponding std::ostream-derived wrapper.
* Rewrite printers to use new infrastructure: cpu target incorporates SIMD printing options; CUDA printer at this point produces only stubs for CUDA kernel wrappers.
* Modify SIMD printing command line options for modcc: `-s` enables explicit vectorization using the SIMD classes;  `-S <N>` allows a specific data width to be prescribed.
* Fix problem in `test_ca.mod` with uninitialized ion current.
* Add infrastructure support to allow future pre-computation of SIMD index conflict cases for (hopefully) faster scatters and updates.
* Simplify `IndexedVariable` expressions in the AST, making data source explicit via a `sourceKind` enum, and leaving the indexing method and index names up to the printers.
* Allow state variables in the AST to 'shadow' an ion concentration — these are assigned in the
generated `write_ions` method.

SIMD classes:
* Add `simd_cast` operation between SIMD value types of the same width, and with `std::array`. (Note: this was tested and used in an early development version of the code, but not in this version. It was still a lacuna in the original SIMD wrappers, so it has been left in.)
* Restructure SIMD gather/scatter API to use a `simd::indirect` expression,  which encapsulates a pointer and SIMD offset.
* Add `simd::index_constraint` scoped enum to describe knowledge of contention in indirect indices, so that we can branch on this to the appropriate implementation.
* Add SIMD concrete implementation routines `reduce_add` for horizontal reduction and `element0` for access to first lane scalar value.
* Add SIMD value method `sum()` that exposes implementation `reduce_add`.
* Add SIMD concrete implementation routine `compound_indexed_add` that provides the implementation for `indirect(p, simd_indices) += simd_value` construction.
* Fix SIMD `implbase` bug where some static methods were using the `implbase` fall-back functions instead of the derived class specialized implementations.
* Move SIMD mathematical functions into friend routines of `simd_impl` in order to resolve implicit conversions from scalars in mixed SIMD-scalar operations.
* Use a templated `tag` class to dispatch on SIMD concrete implementation types, to avoid problems with incomplete types in method signatures.
* Remove old SIMD intrinsics.

CMake infrastructure:
* Downcase some variables in `CMakeLists.txt` files to  distinguish them visually from CMake keywords and variables.
* Split arbor modcc vectorization option (now `ARB_VECTORIZE`) and target-architecture optimization (now `ARB_ARCH`).
* For `arbor` and `arbormech` targets, and in particular not the `modcc` target, use `ARB_ARCH` to generate corresponding target-appropriate binaries, including, for example, appropriate SIMD support.
* Extend `CompilerOptions.cmake` to map as best as able between the various target architecture names (we use the gcc names) and the correct option to pass to the compiler based on the compiler and platform.
* Add work-around for misidentification by CMake of XL C as Clang.
* As a temporary work-around, include `arbormech` library twice on link line to resolve circular arbor–arbormech dependencies.

Unit tests:
* Extend repertoire of generic sequence equality/near equality testing support  in `common.hpp`.
* Add warning suppression for icc for the malloc instrumentation code.
* SIMD unit tests for indirect expressions, compound indirect add, reduction.
* Make some exact tests into floating point 'near' tests when comparing computed areas and lengths in swc and fvm layout tests, to account for compiler (e.g. icc) performing semantically inequivalent floating point operation reordering or fusion at `-O3`.
* Split out some of the CUDA tests into separate .cpp/.cu files for  separate-compilation purposes.

Other:
* The `padded_allocator` has been modified to propagate alignment/padding on move and copy (these semantics make their use much easier and safer in the multicore mechanism instantiation code).
* Map/table searching utilities in `util/maputil.hpp`.
* Fixes for correct sequence type categorization and `begin/end` ADL.
* Fixes for type guards for range methods that take universal references.
* Removal of some redundant code in range utilities through the use of universal references.
* Add new range view `reverse_view` for ranges delineated by bidirectional iterators.
* Add single argument form of `make_span` to count up from zero, and associated helper `count_along` that gives a span that indexes a supplied container.
* Moved `prefixbuf` to `modcc` source.
* Make sequence positive and negative tests in algorithms generic.
* Add `private`-subverting helper code/macro to `tests/unit/common.hpp` to reduce the number of public testing-only interfaces in the library code.
* Add virtual destructors for virtual base classes.
* Add new arb::math:: functions: `next_pow2` for unsigned integral types, `round_up` to round a number away from zero to next largest magnitude multiple.
* New `index_into` implementation that supports bidirectional access (moved to `util::` namespace).
* Fix problem in `test_ca.mod` with uninitialized ion current.
* Rework dangerous `memory::array(Iter, Iter)` constructor to be less dangerous (and do the expected thing).
* Allow ranges to be constructed from other ranges if the iterators are compatible.

68135148

Dec 21, 2017

Fix indirection in ion concentration write. (#425) · 17f7db98

Sam Yates authored 7 years ago

* Fix indirection in ion concentration write.
* Remove second indirection in ion write assignment.
* Extend ion write unit test to cover non-contiguous ion CV cases and verify correct ion concentration averaging.

Fixes #424.

17f7db98

Dec 20, 2017

Add granule cell mechanisms (#421) · a80df6fa

Ben Cumming authored 7 years ago and

Sam Yates committed 7 years ago

* Add three new mechanisms: `nax.mod`, `kdrmt.mod` and `kamt.mod`.
* Add new built-in math operators to `modcc`: `min`, `max`, `abs` and `exprelr`. `exprelr` is defined as the reciprocal of the 'exprel' function, exprel(x)=x/(exp(x)-1), exprel(1)=1. This function occurs frequently in HH-style mechanisms, and having a built-in operator avoids the ad hoc `vtrap` functions found in NMODL files in the wild.
* Split Arbor SIMD intrinsics support into AVX2- and AVX512-specific files.
* Add unit tests for new maths operators for C++, SIMD and CUDA implementations.

a80df6fa

Dec 19, 2017

Reduce differences between `util::optional` and `std::optional`. (#420) · 80fe2f01

Sam Yates authored 7 years ago

Reduce differences between `util::optional` and `std::optional`.

* Rename `util::nothing` to `util::nullopt`.
* Replace `util::get()` by`util::optional::value()`.
* Add correct move semantics to `util::optional::value()`.
* Add `util::optional::value_or()` method.
* Remove unused monadic functionality from `util::optional` and `util::uninitialized`.
* Update code formatting to closer adhere to coding guidelines.
* Add convenience ""_s string constructor for unit tests.
* Update unit tests accordingly.

Fixes #419.

80fe2f01

Dec 18, 2017
- Fix Clang warnings with modcc (#412) · 25b0674d
  Ben Cumming authored 7 years ago and Sam Yates committed 7 years ago
```
* Fix various compiler warnings in `modcc` when using Apple Clang.
```
  25b0674d
Dec 05, 2017
- Fix AVX2 compilation regression. (#409) · 6eeba7f0
  Vasileios Karakasis authored 7 years ago and Sam Yates committed 7 years ago
```
Also removes vestigial `intitial` procedure generated from a MOD file with no INITIAL block.
Fixes #407. 
```
  6eeba7f0
Nov 29, 2017

support ion species concentration updates (#398) · fa7d99b6

Ben Cumming authored 7 years ago and

Sam Yates committed 7 years ago

* Add querying methods to modcc ion species type to simplify code generation.
* Add `WriteBack` type that records per-mechanism ion usage metadata.
* Add `write_back` implementation for C and CUDA printers that adds weighted concentration
  contributions to global concentration fields.
* Extend `uses_ion` interface to provide information about which concentration fields are modified by mechanisms.
* Update `update_ion_symbols` lambda that is responsible for adding metadata about interactions between mechanism fields and external ion species fields, creating a write-back when the mechanism field is a state variable or an indexed variable if it is a current/reversal potential, or else an error.
* Add `test_ca.mod` mechanism that writes calcium ion concentration for testing purposes.
* Add back-end callbacks that initialize concentration values.
* Update `mechanism::uses_ion()` to return information regarding mechanism ion concentration updates.
* Add `mechanism::write_back()` method.
* Update stimulus mechanism specialization to use new mechanism interface.
* Update `ion` type to calculate default concentration contributions in CVs where the concentration is only partially determined by mechanisms.
* Update `fvm_multicell` to calculate the default ion concentration weights.
* Add unit test for ion concentration weight determination.

Fixes #373

fa7d99b6

Nov 28, 2017

Tidy `modcc` driver, remove optimize flag. (#404) · 998ee724

Sam Yates authored 7 years ago

* Remove optimization option (use SIMD options for vectorization).
* Remove arbor utility library dependencies from modcc (pending separation of utility lib from arbor lib source).
* Split target (cpu, gpu) specification from vectorization architecture (avx2, avx512).
* Remove `Options` singleton; replace with structure local to `modcc.cpp`.
* Tidy `modcc` option parsing and main function; allow a single invocation of `modcc` to generate code for multiple backends.
* Rename generated sources to include backend target in filename.
* Always run a constant simplification pass on generated procedures.
* Remove file i/o code from `Module` and `modcc` main function; move functionality to new functions in `io` namespace. (Note: in on-going mechanism revamp, other i/o utility code will reside in the `io` namespace and subdirectory.)
* Remove classes `ConstantFolderVisitor` and `ExpressionClassifierVisitor` that are no longer used.
* Modify CMakeLists.txt files, `backends/*/fvm.cpp` to reflect the new filenames of generated sources.
* Small formatting changes in `modcc` source to reflect coding guidelines (incomplete).

998ee724

Fix binary operator parentheses bug. (#400) · 67721f02

Sam Yates authored 7 years ago

Fixes #399.

Update `test_printers.cpp`:
* Fix compilation and scope management issues, and re-include in CMakeLists.txt.
* Update to use `verbose_print` (run-time verbosity).
* Apply expected output tests to `CPrinter` and `CUDAPrinter` outputs.

Address binary operation rendering issue:
* Move C-style expression rendering common to `CPrinter` and `CUDAPrinter` to a specialized `Visitor`, `CExprRenderer`.
* Use operator associativity to determine need for parentheses around sub-expressions of a binary expression.

67721f02

Nov 20, 2017

modcc now enforces derivatives only on state variables (#392) · 87a52776

Ben Cumming authored 7 years ago and

Sam Yates committed 7 years ago

Derivatives should only appear on the left hand side of expressions that describe the time evolution of state variables. Without this check `modcc` segfaulted when processing a derivative expression of a non-state variable.

* Specialize the semantic analysis of `DerivativeExpression` to enforce that derivatives are only applied to state variables.

87a52776

Nov 09, 2017

Convert currents to current densities in FVM (#381) · bca33966

Ben Cumming authored 7 years ago and

Sam Yates committed 7 years ago

Update the FVM formulation to use current densities instead of currents.

Modifications to modcc:
* Update printers to store and use weights for point process mechanisms,
* Scale ion species current contributions by area proportion, similarly to contributions to the accumulated current.

Changes to FVM code:
* Update weights calculation for density and point processes mechanisms:
    * density channels use relative proportion of CV area, i.e. "density",
    * point processes use the reciprocal of the CV area to convert to a density.
* Add `cv_area` parameter for matrix constructor, which is used by matrix assembly to convert current densities to currents.
* Update stimulus implementations (gpu and cpu backends) to contribute current densities.

Other changes:
* Update unit tests to use new interfaces.
* Update units section in LaTeX docs.

Fixes #374.

bca33966

Nov 02, 2017

Add mechanism parameter setting/new implementation. (#377) · b7623d13

Sam Yates authored 7 years ago

Fixes #350 

* Replace parameter_list with mechanism_spec.
* Add prototype for mechanism parameter schema checking.
* Allow mechanism weights to be set after construction.
* Combine range parameters on density mechanisms by linear contribution in CVs.
* Cable segment electrical parameters are now member variables.
* Publish mechanism parameter information through new method `mechanism::field_info`; note this will be replaced/improved in upcoming dynamic mechanism catalog work.
* Access mechanism parameter scalars and range data via `mechanism::field_view_ptr` and `mechanism::field_value_ptr` methods.
* Allow mechanism 'global' parameters to be set via a method of specializing mechanisms (and giving them corresponding aliases).
* Extend recipe interface to allow querying of per-cell-kind global information for use by cell group implementations.
* Add unit tests for above - note that linear density mechanism parameter test is tightly coupled with the FVM discretization scheme.

b7623d13

Nov 01, 2017
- Add support for NMODL files without INITIAL blocks [1] (#372) · 8fbe804a
  Ben Cumming authored 7 years ago and Sam Yates committed 7 years ago
```
Fixes #367.

* Add empty INITIAL block if NMODL file does not supply one.
```
  8fbe804a
Sep 28, 2017

Rename NestMC references, names etc. to Arbor. (#363) · d9f38b2a

Sam Yates authored 7 years ago

* Use ARB_ and arb_ as variable prefixes in place of NMC_ and nmc_.
* Replace references to 'NestMC' and 'NEST MC' to refer instead to Arbor.
* Use 'arbor' as the sim name in generated validation data.
* Reflow long-line paragraphs in `tests/ubench/README.md`.
* Change names of CUDA mechanism and CUDA kernel libraries to include arbor name.

d9f38b2a

Change nest::mc namespaces to arb (#362) · d9f99489
Sam Yates authored 7 years ago
```
Change nest::mc namespaces to arb
```
d9f99489

Sep 25, 2017

Finish Seperable CUDA compilation (#356) · ddbece13

Ben Cumming authored 7 years ago and

Sam Yates committed 7 years ago

Separate compilation for all CUDA code.

* Move all CUDA kernels to their own .cu files, together with C++ function wrappers.
* Compile all CUDA .cu files to a single static library.
* Merge gpu and multicore backend validation tests.
* Simply and clean up cruft from CMakeLists.txt files.

ddbece13

Sep 21, 2017

Seperable compilation of mechanism kernels on GPU (#353) · 3c283219

Ben Cumming authored 7 years ago and

Sam Yates committed 7 years ago

Separable compilation of the CUDA kernels generated by modcc from NMODL files.

CMake scripts:
* Update the `build_modules()` helper function to cleanly handle calls to modcc that generate multiple output files.
* Add a new library target `gpu_mechanisms` for the separately compiled CUDA kernels and the implementation of their C wrappers.
* Reduce verbosity of compilation messages.

* Simplify mechanism C++ namespace use: move everything in nest::mc::mechanisms::gpu::_mechanism-name_ into `nest::mc::gpu`, and similarly for multicore mechanism implementations, ions.
* Remove template parameters for `value_type` and `size_type` from all of the 
mechanism implementations, and use `fvm_value_type` and `fvm_size_type` everywhere instead.

modcc changes:
* Modify `CUDAPrinter` to keep track of 3 text buffers, one each for 
  "implementation", "interface" and "implementation interface":
* Write the CUDA implementation interface to `X_impl.hpp`, comprising the definition of the mechanism-specific 'X_ParamParck' struct used to pass function arguments to the CUDA kernels.
* Write the CUDA kernels and C wrappers to `X_impl.cu`.
* Write the public C++ mechanism interface (with calls to implementation wrappers) to `X.hpp`.
* Modify modcc driver to support multiple generated output files.

3c283219

Sep 20, 2017

AVX2 transcendentals intrinsics (#329) · a2393eea

Vasileios Karakasis authored 7 years ago

Provides the following transcendentals intrinsics:

* `nmc_mm256_exp_pd`
* `nmc_mm256_log_pd`
* `nmc_mm256_pow_pd`
* `nmc_mm256_frexp_pd`

The first three are the equivalent of the corresponding SVML intrinsics without the `nmc` prefix.

The last one is used by the `log` function, but I decided it's nice to have it public.

All results are tested against full precision standard library implementation and provide equal results (except for the `pow` case). `NaN`s and infinities are treated according to the standard.

Limitations:

* Subnormals are treated as zeros by `frexp` and as a result by `log`.

a2393eea

Batched sampling. (#347) · a5ce1d3e

Sam Yates authored 7 years ago

Collect samples in FVM lowered cells across an integration period, prior to delivery to sampler callbacks.

Fixes #283.

* Make `multi_event_stream` a generic data structure, based on the event type. These are now used for both `deliverable_event` objects and `sample_event` objects.
* Add generic accessors for event data, separating access to event metadata (cell index, time) from payload (weight, sampling info etc.).
* Make a flat, CUDA-friendly data structure describing the `multi_event_queue` marked event state, for passing to mechanisms and kernels. This also brings the underlying representations of the `multicore` and `gpu` versions of `multi_event_queue` closer.
* Implement batched sampling kernel for GPU.
* Additional utility function: `util::is_sorted_by`.

Overhead of implementation and sampling process is negligible for CPU, and with no samples less than 1% on GPU. Running the miniapp with 10'000 cells and sampling 1% of the somata at 0.01ms adds approximately 10% to simulation time.

a5ce1d3e

Jul 04, 2017

Improve reduce by key GPU performance. (#301) · 1f188dcd

Ben Cumming authored 7 years ago and

Sam Yates committed 7 years ago

Optimized reduce by key used by the GPU back end when accumulating synapse current contributions to compartment currents. This leads to significant speedup in the miniapp for cells with few compartments and many synapses.

* Implement `gpu::reduce_by_key` device function that uses warp intrinsics to perform reduction between threads in a warp before using a global atomic update to store the result.
* Add unit tests for `reduce_by_key` functionality.
* Add micro benchmarks that compare against using CUDA atomics.
* Modify `CudaPrinter` modcc class to emit `reduce_by_key` in place of `cudaAtomicAdd` functions.

Some improvements to meter reporting:
* Shorten names of metering regions in miniapp to make them easier to grep.
* JSON is no longer used as an intermediate data type when gathering distributed meters into a single report, instead conversion to JSON is performed just before writing to file.
* Add a print function for summarizing meter results to a stream.

1f188dcd

Jun 19, 2017

Implement device-side event delivery. · 9319b302

Sam Yates authored 7 years ago

Finalizes #184 and includes fix for #285. This PR constitutes a brutal rebase of the `devel/async-integration` branch onto current master: git patch, baby.

Implements asynchronous integration of cells within an `fvm_multicell` instance together with the code to implement efficient device-side event delivery.

Summarized patch notes from the development branch:

* Add four new views in mechanism state:
   * `vec_ci`: cv index to cell index (w.r.t. one lowered multicell instance) map.
   * `vec_t`: current or integration starting time, indexed by cell.
   * `vec_t_to`: integration stopping time, indexed by cell.
   * `vec_dt`: pre-computed delta between `vec_t` and `vec_t_to`, to address performance regression on multicore backend.
* Extend notion of indexed view in `modcc` to encompass `CellIndexedVariable` variables, which have a per-cell value.
* Add/change 'built-in' variables `t`, `t_to`, and `dt` for NMODL.
* Additional utility functions:
   * `util...

9319b302

Jun 14, 2017

modcc: AVX2 backend for mechanisms (#282) · 02d41881

Vasileios Karakasis authored 7 years ago

Add AVX2 instrinsics back end for Haswell and Broadwell architectures.

We're still 3.5% and 5% slower than the icc `#pragma` version on Haswell and Broadwell,
respectively.

02d41881

May 19, 2017

Fix consistency issue of the SIMD i/f of modcc (#278) · 56fd0532

Vasileios Karakasis authored 7 years ago

The `emit_gather()` function emitted the "wrong" instruction in terms of its
arguments but the instruction actually generated was correct, because
the `simd_printer` was passing the arguments to `emit_gather()` in a
different order, which was though the correct order for the finally emitted
instruction. Complicated? This commit cleans this up.

56fd0532

May 16, 2017

Padé approximation of exp in 'cnexp' integration (#268) · 5b254146

Ben Cumming authored 7 years ago and

Sam Yates committed 7 years ago

Fixes #265.

In the `modcc`-generated mechanism code, the `cnexp` solver method uses an expensive call to `exp` to integrate dependent variables over one time step. This commit replaces the exponential with a second-order Padé approximation.

  * Modify `modcc` to insert `exp_pade_11` and `exp_pade_22` functions into every module, which define Padé approximations of second and fourth order respectively (m=n=1 and m=n=2).
  * Have `cnexp` solver use `exp_pade_11` instead of the built in `exp` unary operator.

The validation tests pass for both the 2nd and 4th order approximations; the second order approximation will suffices.

5b254146

remove ineffectual compiler directives from nmodl generated code (#270) · 07dd8f35

Ben Cumming authored 7 years ago

Remove some old compiler directives and properties from modcc-generated kernels that were not used by the intel compiler, and were not recognised by gcc.

Now the optimized x86 kernels can be compiled by both gcc and intel compilers.

07dd8f35

Mar 31, 2017

block-interleaved gpu matrix solver (#208) · 15230c69

Ben Cumming authored 8 years ago and

Sam Yates committed 8 years ago

Fixes #185.

Add a new back end GPU Hines matrix solver that uses a block-interleaved storage pattern to improve memory coalescing during the matrix solve.

  * Refactor the `src/backends` path into `src/backends/gpu` and `src/backends/multicore` paths that contain `gpu` and `multicore` implementations.
  * Refactor the matrix state and threshold detection members that were declared inline in the back end specifications to separate files.
  * Add a new interleaved matrix state back end.
  * Refactor all of the GPU kernels that were originally in the one back end header file into their own header files.
  * Write more comprehensive unit tests for the GPU matrix solver back end to test the `interleave` and `reverse_interleave` operations in isolation, as well as ensure that the flat and interleaved back ends produce identical results.
  * Add the GPU versions of the kinetic scheme validation tests.

15230c69

Add more general indirect access view. (#216) · ca328a21

Sam Yates authored 8 years ago

* Implement `indirect_view` for indexed access via `transform_view`.
* Extend `transform_iterator` to permit non-const access to reference-returning functor results.
* Replace use of `indexed_view` with `indirect_view`.
* Fix missing cpu target for vectorized modcc outputs.

ca328a21

Mar 28, 2017

Bug fix for issue #196 (#211) · 1db73767

Sam Yates authored 8 years ago

Fixes #196.

Correct treatment of missing coefficients in `cnexp` solver.

* Extend `EXPECT_EXPR_EQ` functionality with wrapper that works with `Expression *` and `expression_ptr` arguments.
* Replace string comparison checks in `test_symdiff.cpp` with equivalents that use `EXPECT_EXPR_EQ`.
* Check explicitly for missing coefficient in `cnexp` solver, which should be treated equivalently to zero.

1db73767

Mar 20, 2017

modcc: AVX512 vectorisation backend (#154) · c55d9d66

Vasileios Karakasis authored 8 years ago

Basic features:

* Compile with -t avx512
* Automatically set up by CMake if USE_OPTIMIZED_KERNELS is on and VECTORIZE_TARGET is set to KNL 
* Generic SIMD printer that contacts a SIMD backend for emitting the actual SIMD intrinsics

Note: compilation for the avx512 target requires the Intel compiler.

c55d9d66

Mar 15, 2017

Simple `net_receive` device kenels (#193) · 9df68703

Sam Yates authored 8 years ago

Fixes #183

Use a device kernel for net_receive state updates.
Note: very naive, but gives about a 30% speed up on the 1000 cell miniapp test. All the fun optimization will end up under issue #184.

This also incorporates PR #192, so this PR will be amended if that one is rejected.

9df68703

Mar 08, 2017

Use native cuda atomicAdd on Pascal (#174) · 0e0bcd8f

Ben Cumming authored 8 years ago and

Sam Yates committed 8 years ago

Fixes #125

* Add `cuda_atomic_add` and `cuda_atomic_sub` wrappers for atomic addition.
* Choose native atomic add for Pascal and later architectures.
* Choose CAS workaround for devices earlier than Pascal.
* Add unit test for wrappers.
* Change default CUDA architecture target to `sm_60` in `CMakeLists.txt`.

0e0bcd8f

Mar 07, 2017
- Fix compilation with Clang 3.9 (#172) · 2ca1d47f
  Vasileios Karakasis authored 8 years ago and Sam Yates committed 8 years ago
```
* Add missing `<string>` header to `modcc/msparse.hpp`
```
  2ca1d47f