Skip to content
Snippets Groups Projects
  1. Jul 10, 2018
  2. Jul 06, 2018
    • Sam Yates's avatar
      Migrate source/build to c++14 ... · 3ee79191
      Sam Yates authored and Benjamin Cumming's avatar Benjamin Cumming committed
      Migrate source/build to c++14                                                                                                                    (#522)
      
      * Update `CMakeLists.txt` for C++14 option.
      * Update to gcc 6 minimum.
      * Update travis CI from gcc-5 to gcc-6
      * Use `std::..._t` style type traits, replacing `util::` aliases.
      * Use `std::cbegin`, `std::cend`, and `std::make_unique`, replacing `util::` versions.
      * Remove `DEDUCED_RETURN_TYPE` macros.
      * Remove redundant return type specifications.
      * Use correct ADL for `begin` and `end` in (almost all) the range utilities.
      * Remove redundant `mechinfo` ctor (aggregate initialization suffices).
      * Use lambda capture initializers where appropriate.
      * Use generic `std::equal_to`.
      * Use variable templates for `math::infinity` and `math::pi`.
      * Remove `enum_hash` workaround.
      * Use `""s` string literals where we were using our own `""_s` construction.
      * Use generic lambda for recursive lambda instead of `std::function` wrapper.
      * Use generic lambda for generic arithmetic tests.
      
      Fixes #358.
      3ee79191
  3. Jul 03, 2018
    • Sam Yates's avatar
      Move cell description types to public includes. (#508) · a1894edc
      Sam Yates authored and Benjamin Cumming's avatar Benjamin Cumming committed
      Further work to public install target.
      
      * Move SIMD classes, cell description classes, simple sampler to public include.
      * Rename `cell` to `mc_cell`, `segment` to `mc_segment`, and remove `_description` from cell description class names and includes.
      * Move `compartment_model` out of `mc_cell` interface and use only in `fvm_layout.cpp`.
      * (Provisionally) remove area/volume methods on `mc_cell` and `mc_segment`.
      a1894edc
  4. Jun 25, 2018
    • Sam Yates's avatar
      Feature/lib install target part i (#506) · ad1c78ab
      Sam Yates authored and Benjamin Cumming's avatar Benjamin Cumming committed
      CMake and build refactoring
      
      *   Use CUDA as first-class language (leading to CMake 3.9 minimum version requirement).
      
      *   Use 'modern CMake' interface libraries for compiler options, include file and library dependency tracking. Interface library targets:
          * `arbor-deps`: compiler options and library requirements for the `libarbor.a` static library, as governed by configure-time options and environment.
          * `arbor-private-headers`: include path for non-installed headers, as required by unit tests and arbor itself.
          * `arbor-aux`: helper classes and utilities used across tests and examples.
          * `ext-json`, `ext-tclap`, `ext-tbb`, `ext-benchmark`, `ext-sphinx_rtd_theme`: externally maintained software that we include (directly or via submodule) in the `ext/` subdirectory.
       
      *   Single static library `libarbor.a` includes all built-in modules and CUDA objects.
      
      *   Simply configuration options:
          *  `ARB_WITH_TRACE`, `ARB_AUTORUN_MODCC_ON_CHA...
      ad1c78ab
  5. Jun 07, 2018
    • Benjamin Cumming's avatar
      profile multicore mechanism state and current calls individually (#492) · 5e65a939
      Benjamin Cumming authored
      The built in profiler generates timings for state and current for individual multicore mechanisms.
      
      Modcc generates and PE(advance_integrate_{state,current}_X) profiler calls (along with corresponding PL() for calls to multicore mechanism nrn_state and nrn_current API calls.
      
      No timings are made for the gpu back end, which is not properly supported by the current profiling tools.
      5e65a939
  6. Jun 04, 2018
    • noraabiakar's avatar
      Simd partition by constraint (#494) · 64171e43
      noraabiakar authored and Benjamin Cumming's avatar Benjamin Cumming committed
      Changes have been made to the simd implementation of mechansim functions: 
      
      - The node_index array (array of indices that specifies for each mechanism the CVs where it is present), is now partitioned into 4 arrays according to the constraint on each simd_vector in node_index:
          1. contiguous array: contains the indices of all simd_vectors in node_index where the elements in simd_vector are contiguous
          2. constant array: contains the indices of all simd_vectors in node_index where the elements in simd_vector are identical
          3. independent array: contains the indices of all simd_vectors in node_index where the elements in simd_vector are independent (no repetitions) but not contiguous 
          4. none array: contains the indices of all simd_vectors in node_index where the none of the above constraints apply
      
          When mechanism functions are executed, they loop over each of the 4 arrays separately. This allows for optimizations in every category. 
      
      - The modcc compiler was modified to generate code for the previous changes, including the optimizations per constraint:
          1. contiguous array: we use vector load/store and vector arithmetic. 
          2. constant array: we load only one element and broadcast it into a simd_vector; we use vector arithmetic; we reduce the result; we store one element.   
          3. indepndent array: we use vector scatter/gather and vector arithmetic. 
          4. none array: we cannot operate on the simd_vector in parallel, we loop over the elements to read, perform arithmetic and write back 
      
      - Added a mechanism benchmark for pas, hh and expsyn
      
      - Moved/modified some functions in simd.hpp to ensure that the correct implementation of a function is being called. 
      64171e43
  7. May 15, 2018
  8. May 09, 2018
    • Benjamin Cumming's avatar
      CUDA back end for the new mechanism infrastructure (#487) · e0f0b5d7
      Benjamin Cumming authored and Sam Yates's avatar Sam Yates committed
      Completes CUDA printing in modcc.
      * Add CudaPrinter visitor, overriding CPrinter.
      * Add `ostream` `operator<<` overloads for `arb::gpu::shared_state` and `device_view` for debugging.
      * Fix GPU back-end bugs.
      e0f0b5d7
    • Sam Yates's avatar
      Mechanism Refactor: multicore and simd (#484) · 68135148
      Sam Yates authored
      First commit of two for mechanism refactor work (refer to PR #484 and PR #483).
      
      FVM/mechanism code:
      * Refactor mechanism data structures to decouple backend-specific implementations and mechanism metadata.
      * Add mechanism catalogue for managing mechanism metadata and concrete implementation prototypes.
      * Add fingerprint-checking to mechanism metadata and implementations to confirm they come from the same NMODL source (fingerprint is not yet computed, but tests are in place).
      * Split FVM discretization work out from FVM integrator code.
      * Use abstract base class over backend-templated FVM integrator class `fvm_lowered_cell_impl` to allow separate compilation of `mc_cell_group` and to remove the dummy backend code.
      * Add a new FVM-specific scalar type `fvm_index_type` that is an alias for `int` to replace
      `fvm_size_type` in fvm layouts and mechanisms. This was chosen as an alternative
      to making `unsigned` versions of all our SIMD implementation classes.
      * Extend `cable1d_neuron` global data to encompass: mechanism catalogue; default ion concentrations and charges; global temperature (only for Nernst); initial membrane potential.
      
      Modcc:
      * Collect printer sources in modcc under `printer/`.
      * Move common functionality across printers into `printer/printerutil.{hpp,cpp}`.
      * Add string to file I/O implemented in routines read_all and write_all in `io/bulkio.hpp`.
      * Implement indent-friendly source code generation via a `std::streambuf` filter `io::prefixbuf` defined in `io/prefixbuf.hpp`, together with manipulators and a corresponding std::ostream-derived wrapper.
      * Rewrite printers to use new infrastructure: cpu target incorporates SIMD printing options; CUDA printer at this point produces only stubs for CUDA kernel wrappers.
      * Modify SIMD printing command line options for modcc: `-s` enables explicit vectorization using the SIMD classes;  `-S <N>` allows a specific data width to be prescribed.
      * Fix problem in `test_ca.mod` with uninitialized ion current.
      * Add infrastructure support to allow future pre-computation of SIMD index conflict cases for (hopefully) faster scatters and updates.
      * Simplify `IndexedVariable` expressions in the AST, making data source explicit via a `sourceKind` enum, and leaving the indexing method and index names up to the printers.
      * Allow state variables in the AST to 'shadow' an ion concentration — these are assigned in the
      generated `write_ions` method.
      
      SIMD classes:
      * Add `simd_cast` operation between SIMD value types of the same width, and with `std::array`. (Note: this was tested and used in an early development version of the code, but not in this version. It was still a lacuna in the original SIMD wrappers, so it has been left in.)
      * Restructure SIMD gather/scatter API to use a `simd::indirect` expression,  which encapsulates a pointer and SIMD offset.
      * Add `simd::index_constraint` scoped enum to describe knowledge of contention in indirect indices, so that we can branch on this to the appropriate implementation.
      * Add SIMD concrete implementation routines `reduce_add` for horizontal reduction and `element0` for access to first lane scalar value.
      * Add SIMD value method `sum()` that exposes implementation `reduce_add`.
      * Add SIMD concrete implementation routine `compound_indexed_add` that provides the implementation for `indirect(p, simd_indices) += simd_value` construction.
      * Fix SIMD `implbase` bug where some static methods were using the `implbase` fall-back functions instead of the derived class specialized implementations.
      * Move SIMD mathematical functions into friend routines of `simd_impl` in order to resolve implicit conversions from scalars in mixed SIMD-scalar operations.
      * Use a templated `tag` class to dispatch on SIMD concrete implementation types, to avoid problems with incomplete types in method signatures.
      * Remove old SIMD intrinsics.
      
      CMake infrastructure:
      * Downcase some variables in `CMakeLists.txt` files to  distinguish them visually from CMake keywords and variables.
      * Split arbor modcc vectorization option (now `ARB_VECTORIZE`) and target-architecture optimization (now `ARB_ARCH`).
      * For `arbor` and `arbormech` targets, and in particular not the `modcc` target, use `ARB_ARCH` to generate corresponding target-appropriate binaries, including, for example, appropriate SIMD support.
      * Extend `CompilerOptions.cmake` to map as best as able between the various target architecture names (we use the gcc names) and the correct option to pass to the compiler based on the compiler and platform.
      * Add work-around for misidentification by CMake of XL C as Clang.
      * As a temporary work-around, include `arbormech` library twice on link line to resolve circular arbor–arbormech dependencies.
      
      Unit tests:
      * Extend repertoire of generic sequence equality/near equality testing support  in `common.hpp`.
      * Add warning suppression for icc for the malloc instrumentation code.
      * SIMD unit tests for indirect expressions, compound indirect add, reduction.
      * Make some exact tests into floating point 'near' tests when comparing computed areas and lengths in swc and fvm layout tests, to account for compiler (e.g. icc) performing semantically inequivalent floating point operation reordering or fusion at `-O3`.
      * Split out some of the CUDA tests into separate .cpp/.cu files for  separate-compilation purposes.
      
      Other:
      * The `padded_allocator` has been modified to propagate alignment/padding on move and copy (these semantics make their use much easier and safer in the multicore mechanism instantiation code).
      * Map/table searching utilities in `util/maputil.hpp`.
      * Fixes for correct sequence type categorization and `begin/end` ADL.
      * Fixes for type guards for range methods that take universal references.
      * Removal of some redundant code in range utilities through the use of universal references.
      * Add new range view `reverse_view` for ranges delineated by bidirectional iterators.
      * Add single argument form of `make_span` to count up from zero, and associated helper `count_along` that gives a span that indexes a supplied container.
      * Moved `prefixbuf` to `modcc` source.
      * Make sequence positive and negative tests in algorithms generic.
      * Add `private`-subverting helper code/macro to `tests/unit/common.hpp` to reduce the number of public testing-only interfaces in the library code.
      * Add virtual destructors for virtual base classes.
      * Add new arb::math:: functions: `next_pow2` for unsigned integral types, `round_up` to round a number away from zero to next largest magnitude multiple.
      * New `index_into` implementation that supports bidirectional access (moved to `util::` namespace).
      * Fix problem in `test_ca.mod` with uninitialized ion current.
      * Rework dangerous `memory::array(Iter, Iter)` constructor to be less dangerous (and do the expected thing).
      * Allow ranges to be constructed from other ranges if the iterators are compatible.
      68135148
  9. Dec 21, 2017
    • Sam Yates's avatar
      Fix indirection in ion concentration write. (#425) · 17f7db98
      Sam Yates authored
      * Fix indirection in ion concentration write.
      * Remove second indirection in ion write assignment.
      * Extend ion write unit test to cover non-contiguous ion CV cases and verify correct ion concentration averaging.
      
      Fixes #424.
      17f7db98
  10. Dec 20, 2017
    • Ben Cumming's avatar
      Add granule cell mechanisms (#421) · a80df6fa
      Ben Cumming authored and Sam Yates's avatar Sam Yates committed
      * Add three new mechanisms: `nax.mod`, `kdrmt.mod` and `kamt.mod`.
      * Add new built-in math operators to `modcc`: `min`, `max`, `abs` and `exprelr`. `exprelr` is defined as the reciprocal of the 'exprel' function, exprel(x)=x/(exp(x)-1), exprel(1)=1. This function occurs frequently in HH-style mechanisms, and having a built-in operator avoids the ad hoc `vtrap` functions found in NMODL files in the wild.
      * Split Arbor SIMD intrinsics support into AVX2- and AVX512-specific files.
      * Add unit tests for new maths operators for C++, SIMD and CUDA implementations.
      a80df6fa
  11. Dec 19, 2017
    • Sam Yates's avatar
      Reduce differences between `util::optional` and `std::optional`. (#420) · 80fe2f01
      Sam Yates authored
      Reduce differences between `util::optional` and `std::optional`.
      
      * Rename `util::nothing` to `util::nullopt`.
      * Replace `util::get()` by`util::optional::value()`.
      * Add correct move semantics to `util::optional::value()`.
      * Add `util::optional::value_or()` method.
      * Remove unused monadic functionality from `util::optional` and `util::uninitialized`.
      * Update code formatting to closer adhere to coding guidelines.
      * Add convenience ""_s string constructor for unit tests.
      * Update unit tests accordingly.
      
      Fixes #419.
      80fe2f01
  12. Dec 18, 2017
  13. Dec 05, 2017
  14. Nov 29, 2017
    • Ben Cumming's avatar
      support ion species concentration updates (#398) · fa7d99b6
      Ben Cumming authored and Sam Yates's avatar Sam Yates committed
      * Add querying methods to modcc ion species type to simplify code generation.
      * Add `WriteBack` type that records per-mechanism ion usage metadata.
      * Add `write_back` implementation for C and CUDA printers that adds weighted concentration
        contributions to global concentration fields.
      * Extend `uses_ion` interface to provide information about which concentration fields are modified by mechanisms.
      * Update `update_ion_symbols` lambda that is responsible for adding metadata about interactions between mechanism fields and external ion species fields, creating a write-back when the mechanism field is a state variable or an indexed variable if it is a current/reversal potential, or else an error.
      * Add `test_ca.mod` mechanism that writes calcium ion concentration for testing purposes.
      * Add back-end callbacks that initialize concentration values.
      * Update `mechanism::uses_ion()` to return information regarding mechanism ion concentration updates.
      * Add `mechanism::write_back()` method.
      * Update stimulus mechanism specialization to use new mechanism interface.
      * Update `ion` type to calculate default concentration contributions in CVs where the concentration is only partially determined by mechanisms.
      * Update `fvm_multicell` to calculate the default ion concentration weights.
      * Add unit test for ion concentration weight determination.
      
      Fixes #373 
      fa7d99b6
  15. Nov 28, 2017
    • Sam Yates's avatar
      Tidy `modcc` driver, remove optimize flag. (#404) · 998ee724
      Sam Yates authored
      * Remove optimization option (use SIMD options for vectorization).
      * Remove arbor utility library dependencies from modcc (pending separation of utility lib from arbor lib source).
      * Split target (cpu, gpu) specification from vectorization architecture (avx2, avx512).
      * Remove `Options` singleton; replace with structure local to `modcc.cpp`.
      * Tidy `modcc` option parsing and main function; allow a single invocation of `modcc` to generate code for multiple backends.
      * Rename generated sources to include backend target in filename.
      * Always run a constant simplification pass on generated procedures.
      * Remove file i/o code from `Module` and `modcc` main function; move functionality to new functions in `io` namespace. (Note: in on-going mechanism revamp, other i/o utility code will reside in the `io` namespace and subdirectory.)
      * Remove classes `ConstantFolderVisitor` and `ExpressionClassifierVisitor` that are no longer used.
      * Modify CMakeLists.txt files, `backends/*/fvm.cpp` to reflect the new filenames of generated sources.
      * Small formatting changes in `modcc` source to reflect coding guidelines (incomplete).
      998ee724
    • Sam Yates's avatar
      Fix binary operator parentheses bug. (#400) · 67721f02
      Sam Yates authored
      Fixes #399.
      
      Update `test_printers.cpp`:
      * Fix compilation and scope management issues, and re-include in CMakeLists.txt.
      * Update to use `verbose_print` (run-time verbosity).
      * Apply expected output tests to `CPrinter` and `CUDAPrinter` outputs.
      
      Address binary operation rendering issue:
      * Move C-style expression rendering common to `CPrinter` and `CUDAPrinter` to a specialized `Visitor`, `CExprRenderer`.
      * Use operator associativity to determine need for parentheses around sub-expressions of a binary expression.
      67721f02
  16. Nov 20, 2017
    • Ben Cumming's avatar
      modcc now enforces derivatives only on state variables (#392) · 87a52776
      Ben Cumming authored and Sam Yates's avatar Sam Yates committed
      Derivatives should only appear on the left hand side of expressions that describe the time evolution of state variables. Without this check `modcc` segfaulted when processing a derivative expression of a non-state variable.
      
      * Specialize the semantic analysis of `DerivativeExpression` to enforce that derivatives are only applied to state variables.
      87a52776
  17. Nov 09, 2017
    • Ben Cumming's avatar
      Convert currents to current densities in FVM (#381) · bca33966
      Ben Cumming authored and Sam Yates's avatar Sam Yates committed
      Update the FVM formulation to use current densities instead of currents.
      
      Modifications to modcc:
      * Update printers to store and use weights for point process mechanisms,
      * Scale ion species current contributions by area proportion, similarly to contributions to the accumulated current.
      
      Changes to FVM code:
      * Update weights calculation for density and point processes mechanisms:
          * density channels use relative proportion of CV area, i.e. "density",
          * point processes use the reciprocal of the CV area to convert to a density.
      * Add `cv_area` parameter for matrix constructor, which is used by matrix assembly to convert current densities to currents.
      * Update stimulus implementations (gpu and cpu backends) to contribute current densities.
      
      Other changes:
      * Update unit tests to use new interfaces.
      * Update units section in LaTeX docs.
      
      Fixes #374.
      bca33966
  18. Nov 02, 2017
    • Sam Yates's avatar
      Add mechanism parameter setting/new implementation. (#377) · b7623d13
      Sam Yates authored
      Fixes #350 
      
      * Replace parameter_list with mechanism_spec.
      * Add prototype for mechanism parameter schema checking.
      * Allow mechanism weights to be set after construction.
      * Combine range parameters on density mechanisms by linear contribution in CVs.
      * Cable segment electrical parameters are now member variables.
      * Publish mechanism parameter information through new method `mechanism::field_info`; note this will be replaced/improved in upcoming dynamic mechanism catalog work.
      * Access mechanism parameter scalars and range data via `mechanism::field_view_ptr` and `mechanism::field_value_ptr` methods.
      * Allow mechanism 'global' parameters to be set via a method of specializing mechanisms (and giving them corresponding aliases).
      * Extend recipe interface to allow querying of per-cell-kind global information for use by cell group implementations.
      * Add unit tests for above - note that linear density mechanism parameter test is tightly coupled with the FVM discretization scheme.
      b7623d13
  19. Nov 01, 2017
  20. Sep 28, 2017
    • Sam Yates's avatar
      Rename NestMC references, names etc. to Arbor. (#363) · d9f38b2a
      Sam Yates authored
      * Use ARB_ and arb_ as variable prefixes in place of NMC_ and nmc_.
      * Replace references to 'NestMC' and 'NEST MC' to refer instead to Arbor.
      * Use 'arbor' as the sim name in generated validation data.
      * Reflow long-line paragraphs in `tests/ubench/README.md`.
      * Change names of CUDA mechanism and CUDA kernel libraries to include arbor name.
      d9f38b2a
    • Sam Yates's avatar
      Change nest::mc namespaces to arb (#362) · d9f99489
      Sam Yates authored
      Change nest::mc namespaces to arb
      d9f99489
  21. Sep 25, 2017
    • Ben Cumming's avatar
      Finish Seperable CUDA compilation (#356) · ddbece13
      Ben Cumming authored and Sam Yates's avatar Sam Yates committed
      Separate compilation for all CUDA code.
      
      * Move all CUDA kernels to their own .cu files, together with C++ function wrappers.
      * Compile all CUDA .cu files to a single static library.
      * Merge gpu and multicore backend validation tests.
      * Simply and clean up cruft from CMakeLists.txt files.
      ddbece13
  22. Sep 21, 2017
    • Ben Cumming's avatar
      Seperable compilation of mechanism kernels on GPU (#353) · 3c283219
      Ben Cumming authored and Sam Yates's avatar Sam Yates committed
      Separable compilation of the CUDA kernels generated by modcc from NMODL files.
      
      CMake scripts:
      * Update the `build_modules()` helper function to cleanly handle calls to modcc that generate multiple output files.
      * Add a new library target `gpu_mechanisms` for the separately compiled CUDA kernels and the implementation of their C wrappers.
      * Reduce verbosity of compilation messages.
      
      * Simplify mechanism C++ namespace use: move everything in nest::mc::mechanisms::gpu::_mechanism-name_ into `nest::mc::gpu`, and similarly for multicore mechanism implementations, ions.
      * Remove template parameters for `value_type` and `size_type` from all of the 
      mechanism implementations, and use `fvm_value_type` and `fvm_size_type` everywhere instead.
      
      modcc changes:
      * Modify `CUDAPrinter` to keep track of 3 text buffers, one each for 
        "implementation", "interface" and "implementation interface":
      * Write the CUDA implementation interface to `X_impl.hpp`, comprising the definition of the mechanism-specific 'X_ParamParck' struct used to pass function arguments to the CUDA kernels.
      * Write the CUDA kernels and C wrappers to `X_impl.cu`.
      * Write the public C++ mechanism interface (with calls to implementation wrappers) to `X.hpp`.
      * Modify modcc driver to support multiple generated output files.
      3c283219
  23. Sep 20, 2017
    • Vasileios Karakasis's avatar
      AVX2 transcendentals intrinsics (#329) · a2393eea
      Vasileios Karakasis authored
      Provides the following transcendentals intrinsics:
      
      * `nmc_mm256_exp_pd`
      * `nmc_mm256_log_pd`
      * `nmc_mm256_pow_pd`
      * `nmc_mm256_frexp_pd`
      
      The first three are the equivalent of the corresponding SVML intrinsics without the `nmc` prefix.
      
      The last one is used by the `log` function, but I decided it's nice to have it public.
      
      All results are tested against full precision standard library implementation and provide equal results (except for the `pow` case). `NaN`s and infinities are treated according to the standard.
      
      Limitations:
      
      * Subnormals are treated as zeros by `frexp` and as a result by `log`.
      a2393eea
    • Sam Yates's avatar
      Batched sampling. (#347) · a5ce1d3e
      Sam Yates authored
      Collect samples in FVM lowered cells across an integration period, prior to delivery to sampler callbacks.
      
      Fixes #283.
      
      * Make `multi_event_stream` a generic data structure, based on the event type. These are now used for both `deliverable_event` objects and `sample_event` objects.
      * Add generic accessors for event data, separating access to event metadata (cell index, time) from payload (weight, sampling info etc.).
      * Make a flat, CUDA-friendly data structure describing the `multi_event_queue` marked event state, for passing to mechanisms and kernels. This also brings the underlying representations of the `multicore` and `gpu` versions of `multi_event_queue` closer.
      * Implement batched sampling kernel for GPU.
      * Additional utility function: `util::is_sorted_by`.
      
      Overhead of implementation and sampling process is negligible for CPU, and with no samples less than 1% on GPU. Running the miniapp with 10'000 cells and sampling 1% of the somata at 0.01ms adds approximately 10% to simulation time. 
      a5ce1d3e
  24. Jul 04, 2017
    • Ben Cumming's avatar
      Improve reduce by key GPU performance. (#301) · 1f188dcd
      Ben Cumming authored and Sam Yates's avatar Sam Yates committed
      Optimized reduce by key used by the GPU back end when accumulating synapse current contributions to compartment currents. This leads to significant speedup in the miniapp for cells with few compartments and many synapses.
      
      * Implement `gpu::reduce_by_key` device function that uses warp intrinsics to perform reduction between threads in a warp before using a global atomic update to store the result.
      * Add unit tests for `reduce_by_key` functionality.
      * Add micro benchmarks that compare against using CUDA atomics.
      * Modify `CudaPrinter` modcc class to emit `reduce_by_key` in place of `cudaAtomicAdd` functions.
      
      Some improvements to meter reporting:
      * Shorten names of metering regions in miniapp to make them easier to grep.
      * JSON is no longer used as an intermediate data type when gathering distributed meters into a single report, instead conversion to JSON is performed just before writing to file.
      * Add a print function for summarizing meter results to a stream.
      1f188dcd
  25. Jun 19, 2017
    • Sam Yates's avatar
      Implement device-side event delivery. · 9319b302
      Sam Yates authored
      Finalizes #184 and includes fix for #285. This PR constitutes a brutal rebase of the `devel/async-integration` branch onto current master: git patch, baby.
      
      Implements asynchronous integration of cells within an `fvm_multicell` instance together with the code to implement efficient device-side event delivery.
      
      Summarized patch notes from the development branch:
      
      * Add four new views in mechanism state:
         * `vec_ci`: cv index to cell index (w.r.t. one lowered multicell instance) map.
         * `vec_t`: current or integration starting time, indexed by cell.
         * `vec_t_to`: integration stopping time, indexed by cell.
         * `vec_dt`: pre-computed delta between `vec_t` and `vec_t_to`, to address performance regression on multicore backend.
      * Extend notion of indexed view in `modcc` to encompass `CellIndexedVariable` variables, which have a per-cell value.
      * Add/change 'built-in' variables `t`, `t_to`, and `dt` for NMODL.
      * Additional utility functions:
         * `util...
      9319b302
  26. Jun 14, 2017
  27. May 19, 2017
    • Vasileios Karakasis's avatar
      Fix consistency issue of the SIMD i/f of modcc (#278) · 56fd0532
      Vasileios Karakasis authored
      The `emit_gather()` function emitted the "wrong" instruction in terms of its
      arguments but the instruction actually generated was correct, because
      the `simd_printer` was passing the arguments to `emit_gather()` in a
      different order, which was though the correct order for the finally emitted
      instruction. Complicated? This commit cleans this up.
      56fd0532
  28. May 16, 2017
    • Ben Cumming's avatar
      Padé approximation of exp in 'cnexp' integration (#268) · 5b254146
      Ben Cumming authored and Sam Yates's avatar Sam Yates committed
      Fixes #265.
      
      In the `modcc`-generated mechanism code, the `cnexp` solver method uses an expensive call to `exp` to integrate dependent variables over one time step. This commit replaces the exponential with a second-order Padé approximation.
      
        * Modify `modcc` to insert `exp_pade_11` and `exp_pade_22` functions into every module, which define Padé approximations of second and fourth order respectively (m=n=1 and m=n=2).
        * Have `cnexp` solver use `exp_pade_11` instead of the built in `exp` unary operator.
      
      The validation tests pass for both the 2nd and 4th order approximations; the second order approximation will suffices.
      5b254146
    • Ben Cumming's avatar
      remove ineffectual compiler directives from nmodl generated code (#270) · 07dd8f35
      Ben Cumming authored
      Remove some old compiler directives and properties from modcc-generated kernels that were not used by the intel compiler, and were not recognised by gcc.
      
      Now the optimized x86 kernels can be compiled by both gcc and intel compilers.
      07dd8f35
  29. Mar 31, 2017
    • Ben Cumming's avatar
      block-interleaved gpu matrix solver (#208) · 15230c69
      Ben Cumming authored and Sam Yates's avatar Sam Yates committed
      Fixes #185.
      
      Add a new back end GPU Hines matrix solver that uses a block-interleaved storage pattern to improve memory coalescing during the matrix solve.
      
        * Refactor the `src/backends` path into `src/backends/gpu` and `src/backends/multicore` paths that contain `gpu` and `multicore` implementations.
        * Refactor the matrix state and threshold detection members that were declared inline in the back end specifications to separate files.
        * Add a new interleaved matrix state back end.
        * Refactor all of the GPU kernels that were originally in the one back end header file into their own header files.
        * Write more comprehensive unit tests for the GPU matrix solver back end to test the `interleave` and `reverse_interleave` operations in isolation, as well as ensure that the flat and interleaved back ends produce identical results.
        * Add the GPU versions of the kinetic scheme validation tests.
      15230c69
    • Sam Yates's avatar
      Add more general indirect access view. (#216) · ca328a21
      Sam Yates authored
      * Implement `indirect_view` for indexed access via `transform_view`.
      * Extend `transform_iterator` to permit non-const access to reference-returning functor results.
      * Replace use of `indexed_view` with `indirect_view`.
      * Fix missing cpu target for vectorized modcc outputs.
      ca328a21
  30. Mar 28, 2017
    • Sam Yates's avatar
      Bug fix for issue #196 (#211) · 1db73767
      Sam Yates authored
      Fixes #196.
      
      Correct treatment of missing coefficients in `cnexp` solver.
      
      * Extend `EXPECT_EXPR_EQ` functionality with wrapper that works with `Expression *` and `expression_ptr` arguments.
      * Replace string comparison checks in `test_symdiff.cpp` with equivalents that use `EXPECT_EXPR_EQ`.
      * Check explicitly for missing coefficient in `cnexp` solver, which should be treated equivalently to zero.
      1db73767
  31. Mar 20, 2017
    • Vasileios Karakasis's avatar
      modcc: AVX512 vectorisation backend (#154) · c55d9d66
      Vasileios Karakasis authored
      Basic features:
      
      * Compile with -t avx512
      * Automatically set up by CMake if USE_OPTIMIZED_KERNELS is on and VECTORIZE_TARGET is set to KNL 
      * Generic SIMD printer that contacts a SIMD backend for emitting the actual SIMD intrinsics
      
      Note: compilation for the avx512 target requires the Intel compiler. 
      c55d9d66
  32. Mar 15, 2017
    • Sam Yates's avatar
      Simple `net_receive` device kenels (#193) · 9df68703
      Sam Yates authored
      Fixes #183
      
      Use a device kernel for net_receive state updates.
      Note: very naive, but gives about a 30% speed up on the 1000 cell miniapp test. All the fun optimization will end up under issue #184.
      
      This also incorporates PR #192, so this PR will be amended if that one is rejected.
      9df68703
  33. Mar 08, 2017
    • Ben Cumming's avatar
      Use native cuda atomicAdd on Pascal (#174) · 0e0bcd8f
      Ben Cumming authored and Sam Yates's avatar Sam Yates committed
      Fixes #125
      
      * Add `cuda_atomic_add` and `cuda_atomic_sub` wrappers for atomic addition.
      * Choose native atomic add for Pascal and later architectures.
      * Choose CAS workaround for devices earlier than Pascal.
      * Add unit test for wrappers.
      * Change default CUDA architecture target to `sm_60` in `CMakeLists.txt`.
      0e0bcd8f
  34. Mar 07, 2017