- Jul 10, 2018
-
-
Fixes #526.
-
- Jul 06, 2018
-
-
Migrate source/build to c++14 (#522) * Update `CMakeLists.txt` for C++14 option. * Update to gcc 6 minimum. * Update travis CI from gcc-5 to gcc-6 * Use `std::..._t` style type traits, replacing `util::` aliases. * Use `std::cbegin`, `std::cend`, and `std::make_unique`, replacing `util::` versions. * Remove `DEDUCED_RETURN_TYPE` macros. * Remove redundant return type specifications. * Use correct ADL for `begin` and `end` in (almost all) the range utilities. * Remove redundant `mechinfo` ctor (aggregate initialization suffices). * Use lambda capture initializers where appropriate. * Use generic `std::equal_to`. * Use variable templates for `math::infinity` and `math::pi`. * Remove `enum_hash` workaround. * Use `""s` string literals where we were using our own `""_s` construction. * Use generic lambda for recursive lambda instead of `std::function` wrapper. * Use generic lambda for generic arithmetic tests. Fixes #358.
-
- Jul 03, 2018
-
-
Further work to public install target. * Move SIMD classes, cell description classes, simple sampler to public include. * Rename `cell` to `mc_cell`, `segment` to `mc_segment`, and remove `_description` from cell description class names and includes. * Move `compartment_model` out of `mc_cell` interface and use only in `fvm_layout.cpp`. * (Provisionally) remove area/volume methods on `mc_cell` and `mc_segment`.
-
- Jun 25, 2018
-
-
CMake and build refactoring * Use CUDA as first-class language (leading to CMake 3.9 minimum version requirement). * Use 'modern CMake' interface libraries for compiler options, include file and library dependency tracking. Interface library targets: * `arbor-deps`: compiler options and library requirements for the `libarbor.a` static library, as governed by configure-time options and environment. * `arbor-private-headers`: include path for non-installed headers, as required by unit tests and arbor itself. * `arbor-aux`: helper classes and utilities used across tests and examples. * `ext-json`, `ext-tclap`, `ext-tbb`, `ext-benchmark`, `ext-sphinx_rtd_theme`: externally maintained software that we include (directly or via submodule) in the `ext/` subdirectory. * Single static library `libarbor.a` includes all built-in modules and CUDA objects. * Simply configuration options: * `ARB_WITH_TRACE`, `ARB_AUTORUN_MODCC_ON_CHA...
-
- Jun 07, 2018
-
-
Benjamin Cumming authored
The built in profiler generates timings for state and current for individual multicore mechanisms. Modcc generates and PE(advance_integrate_{state,current}_X) profiler calls (along with corresponding PL() for calls to multicore mechanism nrn_state and nrn_current API calls. No timings are made for the gpu back end, which is not properly supported by the current profiling tools.
-
- Jun 04, 2018
-
-
Changes have been made to the simd implementation of mechansim functions: - The node_index array (array of indices that specifies for each mechanism the CVs where it is present), is now partitioned into 4 arrays according to the constraint on each simd_vector in node_index: 1. contiguous array: contains the indices of all simd_vectors in node_index where the elements in simd_vector are contiguous 2. constant array: contains the indices of all simd_vectors in node_index where the elements in simd_vector are identical 3. independent array: contains the indices of all simd_vectors in node_index where the elements in simd_vector are independent (no repetitions) but not contiguous 4. none array: contains the indices of all simd_vectors in node_index where the none of the above constraints apply When mechanism functions are executed, they loop over each of the 4 arrays separately. This allows for optimizations in every category. - The modcc compiler was modified to generate code for the previous changes, including the optimizations per constraint: 1. contiguous array: we use vector load/store and vector arithmetic. 2. constant array: we load only one element and broadcast it into a simd_vector; we use vector arithmetic; we reduce the result; we store one element. 3. indepndent array: we use vector scatter/gather and vector arithmetic. 4. none array: we cannot operate on the simd_vector in parallel, we loop over the elements to read, perform arithmetic and write back - Added a mechanism benchmark for pas, hh and expsyn - Moved/modified some functions in simd.hpp to ensure that the correct implementation of a function is being called.
-
- May 15, 2018
-
-
Replace hard-coded index variable names in modcc cuda printer with ones derived from the external variable. Uses `decode_indexed_variable`.
-
- May 09, 2018
-
-
Completes CUDA printing in modcc. * Add CudaPrinter visitor, overriding CPrinter. * Add `ostream` `operator<<` overloads for `arb::gpu::shared_state` and `device_view` for debugging. * Fix GPU back-end bugs.
-
Sam Yates authored
First commit of two for mechanism refactor work (refer to PR #484 and PR #483). FVM/mechanism code: * Refactor mechanism data structures to decouple backend-specific implementations and mechanism metadata. * Add mechanism catalogue for managing mechanism metadata and concrete implementation prototypes. * Add fingerprint-checking to mechanism metadata and implementations to confirm they come from the same NMODL source (fingerprint is not yet computed, but tests are in place). * Split FVM discretization work out from FVM integrator code. * Use abstract base class over backend-templated FVM integrator class `fvm_lowered_cell_impl` to allow separate compilation of `mc_cell_group` and to remove the dummy backend code. * Add a new FVM-specific scalar type `fvm_index_type` that is an alias for `int` to replace `fvm_size_type` in fvm layouts and mechanisms. This was chosen as an alternative to making `unsigned` versions of all our SIMD implementation classes. * Extend `cable1d_neuron` global data to encompass: mechanism catalogue; default ion concentrations and charges; global temperature (only for Nernst); initial membrane potential. Modcc: * Collect printer sources in modcc under `printer/`. * Move common functionality across printers into `printer/printerutil.{hpp,cpp}`. * Add string to file I/O implemented in routines read_all and write_all in `io/bulkio.hpp`. * Implement indent-friendly source code generation via a `std::streambuf` filter `io::prefixbuf` defined in `io/prefixbuf.hpp`, together with manipulators and a corresponding std::ostream-derived wrapper. * Rewrite printers to use new infrastructure: cpu target incorporates SIMD printing options; CUDA printer at this point produces only stubs for CUDA kernel wrappers. * Modify SIMD printing command line options for modcc: `-s` enables explicit vectorization using the SIMD classes; `-S <N>` allows a specific data width to be prescribed. * Fix problem in `test_ca.mod` with uninitialized ion current. * Add infrastructure support to allow future pre-computation of SIMD index conflict cases for (hopefully) faster scatters and updates. * Simplify `IndexedVariable` expressions in the AST, making data source explicit via a `sourceKind` enum, and leaving the indexing method and index names up to the printers. * Allow state variables in the AST to 'shadow' an ion concentration — these are assigned in the generated `write_ions` method. SIMD classes: * Add `simd_cast` operation between SIMD value types of the same width, and with `std::array`. (Note: this was tested and used in an early development version of the code, but not in this version. It was still a lacuna in the original SIMD wrappers, so it has been left in.) * Restructure SIMD gather/scatter API to use a `simd::indirect` expression, which encapsulates a pointer and SIMD offset. * Add `simd::index_constraint` scoped enum to describe knowledge of contention in indirect indices, so that we can branch on this to the appropriate implementation. * Add SIMD concrete implementation routines `reduce_add` for horizontal reduction and `element0` for access to first lane scalar value. * Add SIMD value method `sum()` that exposes implementation `reduce_add`. * Add SIMD concrete implementation routine `compound_indexed_add` that provides the implementation for `indirect(p, simd_indices) += simd_value` construction. * Fix SIMD `implbase` bug where some static methods were using the `implbase` fall-back functions instead of the derived class specialized implementations. * Move SIMD mathematical functions into friend routines of `simd_impl` in order to resolve implicit conversions from scalars in mixed SIMD-scalar operations. * Use a templated `tag` class to dispatch on SIMD concrete implementation types, to avoid problems with incomplete types in method signatures. * Remove old SIMD intrinsics. CMake infrastructure: * Downcase some variables in `CMakeLists.txt` files to distinguish them visually from CMake keywords and variables. * Split arbor modcc vectorization option (now `ARB_VECTORIZE`) and target-architecture optimization (now `ARB_ARCH`). * For `arbor` and `arbormech` targets, and in particular not the `modcc` target, use `ARB_ARCH` to generate corresponding target-appropriate binaries, including, for example, appropriate SIMD support. * Extend `CompilerOptions.cmake` to map as best as able between the various target architecture names (we use the gcc names) and the correct option to pass to the compiler based on the compiler and platform. * Add work-around for misidentification by CMake of XL C as Clang. * As a temporary work-around, include `arbormech` library twice on link line to resolve circular arbor–arbormech dependencies. Unit tests: * Extend repertoire of generic sequence equality/near equality testing support in `common.hpp`. * Add warning suppression for icc for the malloc instrumentation code. * SIMD unit tests for indirect expressions, compound indirect add, reduction. * Make some exact tests into floating point 'near' tests when comparing computed areas and lengths in swc and fvm layout tests, to account for compiler (e.g. icc) performing semantically inequivalent floating point operation reordering or fusion at `-O3`. * Split out some of the CUDA tests into separate .cpp/.cu files for separate-compilation purposes. Other: * The `padded_allocator` has been modified to propagate alignment/padding on move and copy (these semantics make their use much easier and safer in the multicore mechanism instantiation code). * Map/table searching utilities in `util/maputil.hpp`. * Fixes for correct sequence type categorization and `begin/end` ADL. * Fixes for type guards for range methods that take universal references. * Removal of some redundant code in range utilities through the use of universal references. * Add new range view `reverse_view` for ranges delineated by bidirectional iterators. * Add single argument form of `make_span` to count up from zero, and associated helper `count_along` that gives a span that indexes a supplied container. * Moved `prefixbuf` to `modcc` source. * Make sequence positive and negative tests in algorithms generic. * Add `private`-subverting helper code/macro to `tests/unit/common.hpp` to reduce the number of public testing-only interfaces in the library code. * Add virtual destructors for virtual base classes. * Add new arb::math:: functions: `next_pow2` for unsigned integral types, `round_up` to round a number away from zero to next largest magnitude multiple. * New `index_into` implementation that supports bidirectional access (moved to `util::` namespace). * Fix problem in `test_ca.mod` with uninitialized ion current. * Rework dangerous `memory::array(Iter, Iter)` constructor to be less dangerous (and do the expected thing). * Allow ranges to be constructed from other ranges if the iterators are compatible.
-
- Dec 21, 2017
-
-
Sam Yates authored
* Fix indirection in ion concentration write. * Remove second indirection in ion write assignment. * Extend ion write unit test to cover non-contiguous ion CV cases and verify correct ion concentration averaging. Fixes #424.
-
- Dec 20, 2017
-
-
* Add three new mechanisms: `nax.mod`, `kdrmt.mod` and `kamt.mod`. * Add new built-in math operators to `modcc`: `min`, `max`, `abs` and `exprelr`. `exprelr` is defined as the reciprocal of the 'exprel' function, exprel(x)=x/(exp(x)-1), exprel(1)=1. This function occurs frequently in HH-style mechanisms, and having a built-in operator avoids the ad hoc `vtrap` functions found in NMODL files in the wild. * Split Arbor SIMD intrinsics support into AVX2- and AVX512-specific files. * Add unit tests for new maths operators for C++, SIMD and CUDA implementations.
-
- Dec 19, 2017
-
-
Sam Yates authored
Reduce differences between `util::optional` and `std::optional`. * Rename `util::nothing` to `util::nullopt`. * Replace `util::get()` by`util::optional::value()`. * Add correct move semantics to `util::optional::value()`. * Add `util::optional::value_or()` method. * Remove unused monadic functionality from `util::optional` and `util::uninitialized`. * Update code formatting to closer adhere to coding guidelines. * Add convenience ""_s string constructor for unit tests. * Update unit tests accordingly. Fixes #419.
-
- Dec 18, 2017
-
-
* Fix various compiler warnings in `modcc` when using Apple Clang.
-
- Dec 05, 2017
-
-
Also removes vestigial `intitial` procedure generated from a MOD file with no INITIAL block. Fixes #407.
-
- Nov 29, 2017
-
-
* Add querying methods to modcc ion species type to simplify code generation. * Add `WriteBack` type that records per-mechanism ion usage metadata. * Add `write_back` implementation for C and CUDA printers that adds weighted concentration contributions to global concentration fields. * Extend `uses_ion` interface to provide information about which concentration fields are modified by mechanisms. * Update `update_ion_symbols` lambda that is responsible for adding metadata about interactions between mechanism fields and external ion species fields, creating a write-back when the mechanism field is a state variable or an indexed variable if it is a current/reversal potential, or else an error. * Add `test_ca.mod` mechanism that writes calcium ion concentration for testing purposes. * Add back-end callbacks that initialize concentration values. * Update `mechanism::uses_ion()` to return information regarding mechanism ion concentration updates. * Add `mechanism::write_back()` method. * Update stimulus mechanism specialization to use new mechanism interface. * Update `ion` type to calculate default concentration contributions in CVs where the concentration is only partially determined by mechanisms. * Update `fvm_multicell` to calculate the default ion concentration weights. * Add unit test for ion concentration weight determination. Fixes #373
-
- Nov 28, 2017
-
-
Sam Yates authored
* Remove optimization option (use SIMD options for vectorization). * Remove arbor utility library dependencies from modcc (pending separation of utility lib from arbor lib source). * Split target (cpu, gpu) specification from vectorization architecture (avx2, avx512). * Remove `Options` singleton; replace with structure local to `modcc.cpp`. * Tidy `modcc` option parsing and main function; allow a single invocation of `modcc` to generate code for multiple backends. * Rename generated sources to include backend target in filename. * Always run a constant simplification pass on generated procedures. * Remove file i/o code from `Module` and `modcc` main function; move functionality to new functions in `io` namespace. (Note: in on-going mechanism revamp, other i/o utility code will reside in the `io` namespace and subdirectory.) * Remove classes `ConstantFolderVisitor` and `ExpressionClassifierVisitor` that are no longer used. * Modify CMakeLists.txt files, `backends/*/fvm.cpp` to reflect the new filenames of generated sources. * Small formatting changes in `modcc` source to reflect coding guidelines (incomplete).
-
Sam Yates authored
Fixes #399. Update `test_printers.cpp`: * Fix compilation and scope management issues, and re-include in CMakeLists.txt. * Update to use `verbose_print` (run-time verbosity). * Apply expected output tests to `CPrinter` and `CUDAPrinter` outputs. Address binary operation rendering issue: * Move C-style expression rendering common to `CPrinter` and `CUDAPrinter` to a specialized `Visitor`, `CExprRenderer`. * Use operator associativity to determine need for parentheses around sub-expressions of a binary expression.
-
- Nov 20, 2017
-
-
Derivatives should only appear on the left hand side of expressions that describe the time evolution of state variables. Without this check `modcc` segfaulted when processing a derivative expression of a non-state variable. * Specialize the semantic analysis of `DerivativeExpression` to enforce that derivatives are only applied to state variables.
-
- Nov 09, 2017
-
-
Update the FVM formulation to use current densities instead of currents. Modifications to modcc: * Update printers to store and use weights for point process mechanisms, * Scale ion species current contributions by area proportion, similarly to contributions to the accumulated current. Changes to FVM code: * Update weights calculation for density and point processes mechanisms: * density channels use relative proportion of CV area, i.e. "density", * point processes use the reciprocal of the CV area to convert to a density. * Add `cv_area` parameter for matrix constructor, which is used by matrix assembly to convert current densities to currents. * Update stimulus implementations (gpu and cpu backends) to contribute current densities. Other changes: * Update unit tests to use new interfaces. * Update units section in LaTeX docs. Fixes #374.
-
- Nov 02, 2017
-
-
Sam Yates authored
Fixes #350 * Replace parameter_list with mechanism_spec. * Add prototype for mechanism parameter schema checking. * Allow mechanism weights to be set after construction. * Combine range parameters on density mechanisms by linear contribution in CVs. * Cable segment electrical parameters are now member variables. * Publish mechanism parameter information through new method `mechanism::field_info`; note this will be replaced/improved in upcoming dynamic mechanism catalog work. * Access mechanism parameter scalars and range data via `mechanism::field_view_ptr` and `mechanism::field_value_ptr` methods. * Allow mechanism 'global' parameters to be set via a method of specializing mechanisms (and giving them corresponding aliases). * Extend recipe interface to allow querying of per-cell-kind global information for use by cell group implementations. * Add unit tests for above - note that linear density mechanism parameter test is tightly coupled with the FVM discretization scheme.
-
- Nov 01, 2017
-
-
Fixes #367. * Add empty INITIAL block if NMODL file does not supply one.
-
- Sep 28, 2017
-
-
Sam Yates authored
* Use ARB_ and arb_ as variable prefixes in place of NMC_ and nmc_. * Replace references to 'NestMC' and 'NEST MC' to refer instead to Arbor. * Use 'arbor' as the sim name in generated validation data. * Reflow long-line paragraphs in `tests/ubench/README.md`. * Change names of CUDA mechanism and CUDA kernel libraries to include arbor name.
-
Sam Yates authored
Change nest::mc namespaces to arb
-
- Sep 25, 2017
-
-
Separate compilation for all CUDA code. * Move all CUDA kernels to their own .cu files, together with C++ function wrappers. * Compile all CUDA .cu files to a single static library. * Merge gpu and multicore backend validation tests. * Simply and clean up cruft from CMakeLists.txt files.
-
- Sep 21, 2017
-
-
Separable compilation of the CUDA kernels generated by modcc from NMODL files. CMake scripts: * Update the `build_modules()` helper function to cleanly handle calls to modcc that generate multiple output files. * Add a new library target `gpu_mechanisms` for the separately compiled CUDA kernels and the implementation of their C wrappers. * Reduce verbosity of compilation messages. * Simplify mechanism C++ namespace use: move everything in nest::mc::mechanisms::gpu::_mechanism-name_ into `nest::mc::gpu`, and similarly for multicore mechanism implementations, ions. * Remove template parameters for `value_type` and `size_type` from all of the mechanism implementations, and use `fvm_value_type` and `fvm_size_type` everywhere instead. modcc changes: * Modify `CUDAPrinter` to keep track of 3 text buffers, one each for "implementation", "interface" and "implementation interface": * Write the CUDA implementation interface to `X_impl.hpp`, comprising the definition of the mechanism-specific 'X_ParamParck' struct used to pass function arguments to the CUDA kernels. * Write the CUDA kernels and C wrappers to `X_impl.cu`. * Write the public C++ mechanism interface (with calls to implementation wrappers) to `X.hpp`. * Modify modcc driver to support multiple generated output files.
-
- Sep 20, 2017
-
-
Vasileios Karakasis authored
Provides the following transcendentals intrinsics: * `nmc_mm256_exp_pd` * `nmc_mm256_log_pd` * `nmc_mm256_pow_pd` * `nmc_mm256_frexp_pd` The first three are the equivalent of the corresponding SVML intrinsics without the `nmc` prefix. The last one is used by the `log` function, but I decided it's nice to have it public. All results are tested against full precision standard library implementation and provide equal results (except for the `pow` case). `NaN`s and infinities are treated according to the standard. Limitations: * Subnormals are treated as zeros by `frexp` and as a result by `log`.
-
Sam Yates authored
Collect samples in FVM lowered cells across an integration period, prior to delivery to sampler callbacks. Fixes #283. * Make `multi_event_stream` a generic data structure, based on the event type. These are now used for both `deliverable_event` objects and `sample_event` objects. * Add generic accessors for event data, separating access to event metadata (cell index, time) from payload (weight, sampling info etc.). * Make a flat, CUDA-friendly data structure describing the `multi_event_queue` marked event state, for passing to mechanisms and kernels. This also brings the underlying representations of the `multicore` and `gpu` versions of `multi_event_queue` closer. * Implement batched sampling kernel for GPU. * Additional utility function: `util::is_sorted_by`. Overhead of implementation and sampling process is negligible for CPU, and with no samples less than 1% on GPU. Running the miniapp with 10'000 cells and sampling 1% of the somata at 0.01ms adds approximately 10% to simulation time.
-
- Jul 04, 2017
-
-
Optimized reduce by key used by the GPU back end when accumulating synapse current contributions to compartment currents. This leads to significant speedup in the miniapp for cells with few compartments and many synapses. * Implement `gpu::reduce_by_key` device function that uses warp intrinsics to perform reduction between threads in a warp before using a global atomic update to store the result. * Add unit tests for `reduce_by_key` functionality. * Add micro benchmarks that compare against using CUDA atomics. * Modify `CudaPrinter` modcc class to emit `reduce_by_key` in place of `cudaAtomicAdd` functions. Some improvements to meter reporting: * Shorten names of metering regions in miniapp to make them easier to grep. * JSON is no longer used as an intermediate data type when gathering distributed meters into a single report, instead conversion to JSON is performed just before writing to file. * Add a print function for summarizing meter results to a stream.
-
- Jun 19, 2017
-
-
Sam Yates authored
Finalizes #184 and includes fix for #285. This PR constitutes a brutal rebase of the `devel/async-integration` branch onto current master: git patch, baby. Implements asynchronous integration of cells within an `fvm_multicell` instance together with the code to implement efficient device-side event delivery. Summarized patch notes from the development branch: * Add four new views in mechanism state: * `vec_ci`: cv index to cell index (w.r.t. one lowered multicell instance) map. * `vec_t`: current or integration starting time, indexed by cell. * `vec_t_to`: integration stopping time, indexed by cell. * `vec_dt`: pre-computed delta between `vec_t` and `vec_t_to`, to address performance regression on multicore backend. * Extend notion of indexed view in `modcc` to encompass `CellIndexedVariable` variables, which have a per-cell value. * Add/change 'built-in' variables `t`, `t_to`, and `dt` for NMODL. * Additional utility functions: * `util...
-
- Jun 14, 2017
-
-
Vasileios Karakasis authored
Add AVX2 instrinsics back end for Haswell and Broadwell architectures. We're still 3.5% and 5% slower than the icc `#pragma` version on Haswell and Broadwell, respectively.
-
- May 19, 2017
-
-
Vasileios Karakasis authored
The `emit_gather()` function emitted the "wrong" instruction in terms of its arguments but the instruction actually generated was correct, because the `simd_printer` was passing the arguments to `emit_gather()` in a different order, which was though the correct order for the finally emitted instruction. Complicated? This commit cleans this up.
-
- May 16, 2017
-
-
Fixes #265. In the `modcc`-generated mechanism code, the `cnexp` solver method uses an expensive call to `exp` to integrate dependent variables over one time step. This commit replaces the exponential with a second-order Padé approximation. * Modify `modcc` to insert `exp_pade_11` and `exp_pade_22` functions into every module, which define Padé approximations of second and fourth order respectively (m=n=1 and m=n=2). * Have `cnexp` solver use `exp_pade_11` instead of the built in `exp` unary operator. The validation tests pass for both the 2nd and 4th order approximations; the second order approximation will suffices.
-
Ben Cumming authored
Remove some old compiler directives and properties from modcc-generated kernels that were not used by the intel compiler, and were not recognised by gcc. Now the optimized x86 kernels can be compiled by both gcc and intel compilers.
-
- Mar 31, 2017
-
-
Fixes #185. Add a new back end GPU Hines matrix solver that uses a block-interleaved storage pattern to improve memory coalescing during the matrix solve. * Refactor the `src/backends` path into `src/backends/gpu` and `src/backends/multicore` paths that contain `gpu` and `multicore` implementations. * Refactor the matrix state and threshold detection members that were declared inline in the back end specifications to separate files. * Add a new interleaved matrix state back end. * Refactor all of the GPU kernels that were originally in the one back end header file into their own header files. * Write more comprehensive unit tests for the GPU matrix solver back end to test the `interleave` and `reverse_interleave` operations in isolation, as well as ensure that the flat and interleaved back ends produce identical results. * Add the GPU versions of the kinetic scheme validation tests.
-
Sam Yates authored
* Implement `indirect_view` for indexed access via `transform_view`. * Extend `transform_iterator` to permit non-const access to reference-returning functor results. * Replace use of `indexed_view` with `indirect_view`. * Fix missing cpu target for vectorized modcc outputs.
-
- Mar 28, 2017
-
-
Sam Yates authored
Fixes #196. Correct treatment of missing coefficients in `cnexp` solver. * Extend `EXPECT_EXPR_EQ` functionality with wrapper that works with `Expression *` and `expression_ptr` arguments. * Replace string comparison checks in `test_symdiff.cpp` with equivalents that use `EXPECT_EXPR_EQ`. * Check explicitly for missing coefficient in `cnexp` solver, which should be treated equivalently to zero.
-
- Mar 20, 2017
-
-
Vasileios Karakasis authored
Basic features: * Compile with -t avx512 * Automatically set up by CMake if USE_OPTIMIZED_KERNELS is on and VECTORIZE_TARGET is set to KNL * Generic SIMD printer that contacts a SIMD backend for emitting the actual SIMD intrinsics Note: compilation for the avx512 target requires the Intel compiler.
-
- Mar 15, 2017
-
-
Sam Yates authored
Fixes #183 Use a device kernel for net_receive state updates. Note: very naive, but gives about a 30% speed up on the 1000 cell miniapp test. All the fun optimization will end up under issue #184. This also incorporates PR #192, so this PR will be amended if that one is rejected.
-
- Mar 08, 2017
-
-
Fixes #125 * Add `cuda_atomic_add` and `cuda_atomic_sub` wrappers for atomic addition. * Choose native atomic add for Pascal and later architectures. * Choose CAS workaround for devices earlier than Pascal. * Add unit test for wrappers. * Change default CUDA architecture target to `sm_60` in `CMakeLists.txt`.
-
- Mar 07, 2017
-
-
* Add missing `<string>` header to `modcc/msparse.hpp`
-