Skip to content
Snippets Groups Projects
  1. Aug 18, 2017
    • Ben Cumming's avatar
      Better TBB CMake integration (#331) · 6dce9fa4
      Ben Cumming authored and Sam Yates's avatar Sam Yates committed
      * Add support for CMake scripts provided by TBB.
      * Update required cmake version to 3.0.
      
      * hack to get linking to work on Cray PE
      
      * improve comments and remove redundant include in CMakeLists
      
      * firewall the tbb cmake files
      
      * tbb threading back end to_string includes version number
      6dce9fa4
    • Ben Cumming's avatar
      Refactor domain decomposition for arbitrary gid distribution. (#326) · cba9d458
      Ben Cumming authored and Sam Yates's avatar Sam Yates committed
      Changes to `domain_decomposition`:
        * `domain_decomposition` performs two pass load balancing in constructor:
            1. first pass performs global load balance,
            2. second pass distributes cells locally between cpu and gpu cell_groups.
          The current logic for this is very simple and naive, and will be replaced with a load balancer which returns a lighter domain decomposition description in a follow up pull request.
        * Provides a simple `group_description` type that contains gid, `cell_kind` and target backend information for `cell_group_factory`.
      
      Changes to `communicator`:
        * Constructor takes a `domain_decomposition` and recipe.
        * The interface for adding connections and constructing connection table has been removed, as this is now performed within the constructor.
        * Construction is more complicated, as connections are partitioned by source gid which requires multiple passes over the connection information in the recipe.
        * `make_event_queues` updated: spikes and connections are now partitioned by source domain, and an optimization dynamically chooses to iterate over either connection or spike list, whichever is shorter.
        * The `exchange` method now sorts `local_spikes` before global gather to facilitate the optimized spike/connection searching.
      
      Changes to `miniapp`:
        * Automatically use gpu if available and compiled with gpu support.
        * Banner prints out useful information about number of cores, gpus and ranks.
        * Remove -g cell group size flag.
      
      Changes to `cell_group`:
        * `cell_group` interface take a list of gid values instead of a range.
        * Updated internal `cell_group` logic to convert between gid and local indices: use a vector for local index to gid map, and a hash table for gid to local index in `cell_group` implementations that need this lookup.
      
      changes to unit tests
        * tests for the domain decomposition
        * tests for the communicator that test ring and all2all networks
      cba9d458
  2. Jul 29, 2017
  3. Jul 18, 2017
    • Ben Cumming's avatar
      Feature/node description (#325) · ecb1b049
      Ben Cumming authored and Sam Yates's avatar Sam Yates committed
      Refactor hardware-querying code.
      
      Comprises part of the work on issue #318.
      
      * Create `src/hardware` path for hardware-querying code.
      * Create `nest::mc::hw` namespace for hardware-querying code.
      * Move memory, affinity and power code from util and threading code to the new diretory and namespace.
      * Add `nest::mc::threading::num_threads` function that attempts to determine the number of threads to use according to an environment variable, or by using the low level hardware querying if no environment variable is set.
      * Add hardware query for counting the number of available GPUs.
      ecb1b049
  4. Jul 07, 2017
    • Wouter Klijn's avatar
      306 spike from file (#314) · 90b7966a
      Wouter Klijn authored
      Created a new cell kind that injects a user-suplied stream of spikes, along with corresponding cell_group implementation called dss_cell_group.
      
      The miniapp is updated to test this functionality
        * An ostream spike time parser
        * A new command line argument -I with a Path.
      90b7966a
  5. Jul 05, 2017
  6. Jul 04, 2017
    • Ben Cumming's avatar
      Improve reduce by key GPU performance. (#301) · 1f188dcd
      Ben Cumming authored and Sam Yates's avatar Sam Yates committed
      Optimized reduce by key used by the GPU back end when accumulating synapse current contributions to compartment currents. This leads to significant speedup in the miniapp for cells with few compartments and many synapses.
      
      * Implement `gpu::reduce_by_key` device function that uses warp intrinsics to perform reduction between threads in a warp before using a global atomic update to store the result.
      * Add unit tests for `reduce_by_key` functionality.
      * Add micro benchmarks that compare against using CUDA atomics.
      * Modify `CudaPrinter` modcc class to emit `reduce_by_key` in place of `cudaAtomicAdd` functions.
      
      Some improvements to meter reporting:
      * Shorten names of metering regions in miniapp to make them easier to grep.
      * JSON is no longer used as an intermediate data type when gathering distributed meters into a single report, instead conversion to JSON is performed just before writing to file.
      * Add a print function for summarizing meter results to a stream.
      1f188dcd
  7. Jul 03, 2017
    • Ben Cumming's avatar
      Fix bug with comparison operator for spikes (#316) · 1a58e003
      Ben Cumming authored
      The less than operator for spikes was not in the nest::mc namespace,
      so it was not being picked up by STL algoriths and containers.
      This patch makes it a friend operator of the `nest::mc::basic_spike`,
      and adds some unit tests to varify that STL algorithms can use
      containers of spikes.
      
      fixes #315.
      1a58e003
  8. Jun 28, 2017
  9. Jun 23, 2017
    • Wouter Klijn's avatar
      Issue/241 regular spiking source cell (#287) · 70d1a1b4
      Wouter Klijn authored
      Replace the 20 artificial spikes we inserted until now with spikes generated with a frequency spiking cell.
      This is a cell that spikes regularly for a set period.
      This connections outgoing from this cell mirror all the outgoing connections of cells with gid % 20 == 0.
      This means that 5% extra connections have been added to the model to implement the equivalent behaviour with an 'implemented' neuron.
      
      To implement this behavoir a new cell_group type has been introduced.
      The fs_cell is extremely small and is not implemented on the backends.
      
      Includes:
      Unit test for the rss_cell functionality
      70d1a1b4
  10. Jun 20, 2017
  11. Jun 19, 2017
    • Sam Yates's avatar
      Implement device-side event delivery. · 9319b302
      Sam Yates authored
      Finalizes #184 and includes fix for #285. This PR constitutes a brutal rebase of the `devel/async-integration` branch onto current master: git patch, baby.
      
      Implements asynchronous integration of cells within an `fvm_multicell` instance together with the code to implement efficient device-side event delivery.
      
      Summarized patch notes from the development branch:
      
      * Add four new views in mechanism state:
         * `vec_ci`: cv index to cell index (w.r.t. one lowered multicell instance) map.
         * `vec_t`: current or integration starting time, indexed by cell.
         * `vec_t_to`: integration stopping time, indexed by cell.
         * `vec_dt`: pre-computed delta between `vec_t` and `vec_t_to`, to address performance regression on multicore backend.
      * Extend notion of indexed view in `modcc` to encompass `CellIndexedVariable` variables, which have a per-cell value.
      * Add/change 'built-in' variables `t`, `t_to`, and `dt` for NMODL.
      * Additional utility functions:
         * `util::minmax_value`: return one-pass reduction for minimum and maximum.
         * `util::fill`: range-compatible wrapper for `std::fill`.
         * `util::empty` now works across arbitrary sequences (defaults to calling `empty()` method if it exists).
         * Extends C++17-like functionality for path testing and manipulation (c.f. PR #233).
      * Use a cached copy of the per-cell time vector for queries in `fvm_multicell`.
      * Modifies back-end matrix assembly, threshold detection and stimulus implementations to be asynchronous-friendly.
      * Rename some back-end mechanism variables for clarity, now that there are multiple sorts of cell index vectors present. `cell_index` for example has been renamed to `cell_cv_divs` where it acts as a partition division of cv indices (length 1+#cells), and `cell_to_cv` where it acts as a map from cell to index of first cv for that cell (length #cells).
      * Move common event-delivery types (`target_handle`, `deliverable_event`) to `src/backends/event.hpp`.
      * Add `multi_event_stream` data structure for `gpu` and `multicore` backends, which handles multiple streams of events, bulk loaded but individually popped.
      * Add unit tests for `multi_event_stream`.
      * Document event delivery process and abstraction in `src/backends/event_delivery.md`.
      * Virtualize `mechanism::deliver_events`.
      * Perform event delivery in `fvm_multicell::step_integration()`, using the backend-provided interfaces.
      * Add zero dt check/support to matrix state implementations.
      * Add zero dt unit tests for multicore and gpu back-end matrices.
      * Add debugging helper `util::sepval` for printing/tracing container values.
      * Add trace csv output option to miniapp.
      * Add GPU kernel and unit test for end-of-integration time step test. By default, keep using the copy-to-host-and-test method, as it is faster for cell counts up to circa 10k; investigate adaptive/threshold solutions in the future.
      * Explicitly compute and store per-cell and per-compartment `dt` from integration time bounds.
      * Determine lower bound on number of integration steps per interval in order to avoid explicit checking of minimum cell times each step.
      * Avoid any time value checking for samplers in the integration loop if none of them could be triggered in the interval.
      9319b302
  12. Jun 15, 2017
  13. Jun 14, 2017
  14. May 24, 2017
    • Sam Yates's avatar
      Add required thread synchronization to matrix kernel. (#280) · bd1e56a5
      Sam Yates authored
      There is a potential data race in the `assemble_matrix_interleaved` kernel, where threads in a different warp can overwrite the `buffer_v` and `buffer_i` values before they are used to update the `d` and `rhs` vectors.
      
      This race has been exercised in the asynchronous event delivery branch.
      
      * Add `__syncthreads()` to assemble matrix interleaved kernel after `d` and `rhs` update.
      bd1e56a5
  15. May 19, 2017
    • Vasileios Karakasis's avatar
      Fix consistency issue of the SIMD i/f of modcc (#278) · 56fd0532
      Vasileios Karakasis authored
      The `emit_gather()` function emitted the "wrong" instruction in terms of its
      arguments but the instruction actually generated was correct, because
      the `simd_printer` was passing the arguments to `emit_gather()` in a
      different order, which was though the correct order for the finally emitted
      instruction. Complicated? This commit cleans this up.
      56fd0532
    • Sam Yates's avatar
      Fix target to handle synapse assignment. (#277) · bf286e63
      Sam Yates authored
      Fixes #273.
      
      Note: this incorporates the unit test patch in PR #275.
      
      * Store index into targets collection with synapse CV before sorting, and use this target index to store the permuted instance index in the correct target slot.
      bf286e63
    • Sam Yates's avatar
      Fix bug in target handle unit test. (#275) · 6067f69f
      Sam Yates authored
      Unit test did not take into account that the target handles set by the lowered cell will be grouped by cell.
      6067f69f
  16. May 18, 2017
  17. May 16, 2017
    • Ben Cumming's avatar
      Padé approximation of exp in 'cnexp' integration (#268) · 5b254146
      Ben Cumming authored and Sam Yates's avatar Sam Yates committed
      Fixes #265.
      
      In the `modcc`-generated mechanism code, the `cnexp` solver method uses an expensive call to `exp` to integrate dependent variables over one time step. This commit replaces the exponential with a second-order Padé approximation.
      
        * Modify `modcc` to insert `exp_pade_11` and `exp_pade_22` functions into every module, which define Padé approximations of second and fourth order respectively (m=n=1 and m=n=2).
        * Have `cnexp` solver use `exp_pade_11` instead of the built in `exp` unary operator.
      
      The validation tests pass for both the 2nd and 4th order approximations; the second order approximation will suffices.
      5b254146
    • Ben Cumming's avatar
      remove ineffectual compiler directives from nmodl generated code (#270) · 07dd8f35
      Ben Cumming authored
      Remove some old compiler directives and properties from modcc-generated kernels that were not used by the intel compiler, and were not recognised by gcc.
      
      Now the optimized x86 kernels can be compiled by both gcc and intel compilers.
      07dd8f35
    • Sam Yates's avatar
      Add micro-benchmark for device vector comparison. (#262) · 0bd89928
      Sam Yates authored
      * Add micro-benchmark `cuda_compare_and_reduce`.
      * Add support for `.cu` cuda benchmarks in `tests/ubench/CMakeLists.txt`.
      * Update `tests/ubench/README.md` with benchmark summary and results.
      * Updates the version of google benchmark library, for fixed benchmark iteration support.
      0bd89928
  18. May 15, 2017
    • Ben Cumming's avatar
      Fix incorrect GPU backend determination · 18098783
      Ben Cumming authored and Sam Yates's avatar Sam Yates committed
      Fixes #266.
      
      Use CUDA to compile the `cell_group_factory` so that the CUDA back end is compiled correctly, instead of the null back end proxy.
        * Added bonus: the miniapp is now compiled using host C++ compiler instead of `nvcc`.
      
      This is a little bit hacky, because this is a stop gap until we have separate compilation of CUDA code.
      18098783
  19. May 11, 2017
  20. May 10, 2017
    • Ben Cumming's avatar
      Feature/generic cell groups (#259) · 2b2f89c4
      Ben Cumming authored
      Refactor model and recipe to build models that have different cell types.
      
          Refactor recipe::get_cell to return unique_any so that.
              All recipe definitions in tests and miniap had to be updated to use
              the new interface.
          Make a cell_group_factory that forwards arguments for building a
          cell group to the appropriate cell_group constructor.
          Refactor model to use generic cell types
              Constructor now delegates cell_group generation to the
              cell_group_factory.
              Add an implementation file model.cpp for model to reduce compilation
              times (by 2-7 seconds on my desktop).
          Refactor probe enumeration code in model and cell_group
              add interface to cell_group for querying enumeration of probes
              in a cell_group
              use this interface instead of directly computing enumeration in
              model constructor, which no longer has easy access to probe
              information.
      2b2f89c4
  21. May 09, 2017
    • Ben Cumming's avatar
      util::unique_any: a non-copyable util::any (#257) · 23e94eed
      Ben Cumming authored and Sam Yates's avatar Sam Yates committed
      Add a non-copyable variant of `util::any`.
      
      The two main use cases for such a container are:
       1. Storing types that are not copyable.
       2. Ensuring that no copies are made of copyable types that have to be stored in a type-erased container.                                                                           
                                                                                                                
      `unique_any` has the same semantics as `any` with the exception of copy construction and copy assignment, which are explicitly forbidden. The requirement that the contained type be copy-constructable has also been relaxed.
      
      The `any_cast` non-member functions have been overridden for `unique_any`, with the same semantics as for `any`.
      23e94eed
  22. May 08, 2017
    • Ben Cumming's avatar
      Implement `util::any` · c2633503
      Ben Cumming authored and Sam Yates's avatar Sam Yates committed
      Partial implementation of `std::any` from C++17 standard. See: http://en.cppreference.com/w/cpp/utility/any
      
      The implementation is in the `util` library as `util::any`.
      
      Deviations from the standards description of `std::any`:
        * Does not avoid dynamic allocation of small objects.
        * Does not implement the `in_place_type<T>` constructors from the standard.
        * Does not implement the `emplace` modifier from the standard.
      c2633503
  23. May 04, 2017
    • Ben Cumming's avatar
      modularize domain decomposition (#244) · d0b6b475
      Ben Cumming authored
      The domain decomposition, whereby cells were partitioned across domains/MPI
      ranks and then grouped together, was performed in an ad-hoc manner.
      This PR modularizes the domain decomposition.
      
      * A `domain_decomposition` class performs the cell partitioning given the recipe
        and some parameters.
      * This implementation has a single `domain_decomposition` class and flags are 
        provided in a `group_rules` struct
        * The `domain_decomposition` could be specialized, or the simple rules struct
          could provide some sort of policy implementation in the future when the 
          need arises
      * The domain_decomposition class is initialized inside the model constructor
        and is maintained as state of the model.
      * The cell model constructor interface has been simplified to a single constructor
        that takes a recipe and a reference to a `domain_decomposition`.
        * In the future we might pass in only a moveable subset of the 
          `domain_decomposition` information required to build the model.                                      
      
      fixes #242.
      d0b6b475
    • Sam Yates's avatar
      Apply different, simpler work-around for icc bug. (#251) · bb19add0
      Sam Yates authored
      Refer to issue #247 and pr #248.
      
      * Replace `compat::sink()` usage with a simpler `compat::barrier_if_icc_leq`, as the icc bug also could be tickled depending on how the `sink()` code was structured.
      bb19add0
  24. May 03, 2017
    • w-klijn's avatar
      Use cmake source directory instead of current source dir as working dir. (#250) · dd214f13
      w-klijn authored and Sam Yates's avatar Sam Yates committed
      Fixes #249.
      
      * Use CMake source directory instead of current source directory as the working directory in git submodule update for `google-benchmark`, in order to support git versions prior to 1.8.4.
      dd214f13
    • Sam Yates's avatar
      Work-around for inlining bug in icc 16.0.3 (#248) · 208ca28e
      Sam Yates authored
      Fixes #247.
      
      Forces evaluation of partition bounds call within the `div_component_sampler` constructor, which for subtle and obscure reasons is mis- or un-computed with icpc and the `-xMIC-AVX512` target option.
      
      * Add `compat::sink()` function that forces evaluation of the argument.
      * Add `compat::sink_if_icc_leq(version, const X&)` function that performs this only when run with the Intel compiler subject to version bound.
      * Use the `compat::sink_if_icc_leq` function in the `div_component_sampler` constructor.
      208ca28e
    • Sam Yates's avatar
      Fix warnings and errors with icpc build. (#246) · a16b9dc1
      Sam Yates authored
      Fixes #245.
      
          Add missing header <system_error> to src/util/strprintf.hpp
          Remove redundant const in cell_kind returns.
      a16b9dc1
  25. Apr 28, 2017
    • w-klijn's avatar
      Issue/238 get cell kind (#240) · 282d1eee
      w-klijn authored
      * Add cell-kind function on the cell_group
      * Move the cell_kind to common types
      * Have the cell also return its kind when requested
      282d1eee
  26. Apr 21, 2017
    • Ben Cumming's avatar
      Virtualize cell_group (#236) · 6beb1ce0
      Ben Cumming authored and Sam Yates's avatar Sam Yates committed
      Virtualization of the `cell_group` interface is necessary for support of other (i.e. non-multicompartment) cell models, including e.g. Poisson spike generators.
      
      * Make `cell_group` an abstract base class; the previous `cell_group` class that is parameterized by the back-end FVM implementation is renamed `mc_cell_group` ('mc' stands for 'multi-compartment') and derives from the abstract `cell_group`.
      * Remove template parameter `Cell` from `model` type: a `model` can in principle now manage multiple types of concrete objects derived from `cell_group`.
      * Extend `model` constructor to take a hint about which back-end to use when constructing cell groups: `use_multicore` or `prefer_gpu`. This is a placeholder for a more sophisticated implementation once we have the requirements for a richer "ecosystem" of cell types.
      * Simplified some generic types to remove template dependencies between front and back ends:
          * Define a global `using time_type=float` in `common_types.hpp`.
          * Define a concrete `sampler_function` type alias for the `std::function<...>` type used for samplers.
      * Use a `null` back-end fallback for GPU if support is not there at compile time.
      6beb1ce0
    • Sam Yates's avatar
      Try more places for validation data; workaround CMake FindCUDA bug. (#233) · 5f85bd7d
      Sam Yates authored
      Fixes #232.
      
      * Try `NMC_DATADIR` environment variable for validation data path, or else if the #defined `NMC_DATADIR` does not point to a directory, try `./validation/data` and `../validation/data`.
      * Don't define `NMC_DATADIR` if CMake version 3.7 or 3.8.
      * Extend C++17 filesystem emulation with (POSIX) implementation of `is_directory` and supporting classes, functions and enums `filesystem_error`, `file_status`, `status(..)`, `file_type` and `perms`.
      5f85bd7d
  27. Apr 18, 2017
    • Sam Yates's avatar
      Fail gracefully if git missing or broken. (#235) · 2b1ba7b7
      Sam Yates authored
      Fixes #234.
      
      * Adds dummy `DOWNLOAD_COMMAND` to `ExternalProject_Add` invocation.
      * More thorough status/warning messages regarding git submodule update
      process.
      2b1ba7b7
    • Ben Cumming's avatar
      Add power meter and refactor meter interfaces. · 99a0b1c8
      Ben Cumming authored and Sam Yates's avatar Sam Yates committed
      Fixes #190.
      
      The final piece in the metering features.
      
      * Add a `power_meter` which currently records energy used on each node of a Cray XC{30,40,50} systems, which all have built in `pm_counters` interface to power measurement.
      * Add information about which node each MPI rank runs on to the metering output in `meters.json`, which is needed to analyse energy recordings, which are per node, not per MPI rank.
      * Refactor collation of measurements: now the responsibility of the meter manager.
      * Add support for `gather` with `std::string` to the global communication policy, which required a back end MPI implementation and corresponding unit test.
      * Add `src/util/config.hpp` that populate the `nest::mc::config` namespace with `constexpr bool` flags describing system or environment capabilities.
      99a0b1c8
  28. Apr 13, 2017
    • Ben Cumming's avatar
      Remove generic event interface on cell_group · a0640a11
      Ben Cumming authored and Sam Yates's avatar Sam Yates committed
      `cell_group` had a template method `cell_group::enqueue_events()`, which was parameterized on the type of container used to pass the set of events to enqueue. This PR removes the template, and makes `time_type` a globally defined type in `common_types.hpp`.
      
      The `time_type` that permeated the code is taken from `spike`, which is itself a specialized type alias of `basic_spike`. This is not an intuitive location to define the `time_type`, and hides the fact that as implemented it was effectively a global typedef.
      
        * Define the default time type in `common_types.hpp`: `using time_type = float`.
        * Use this global `time_type` in the definition of `spike` and `postsynaptic_spike_event`.
        * Replace generic `cell_group::enqueue_events` method with concrete `cell_group::enque_events(const std::vector<post_synaptic_event>&)`.
      a0640a11
    • Ben Cumming's avatar
      Simplify cell_group interface · 2e57545c
      Ben Cumming authored and Sam Yates's avatar Sam Yates committed
      Preparatory work for abstract cell group interface.
      
      * Remove `cell_group` public member functions that are not needed as part of the interface: `clear_events`, `remove_samplers`, `probe` and `reset_samplers`.
      * Remove value_type from model, and declare explicitly that samplers receive values that are doubles.
      2e57545c