Commits · 5da63dde25bad77abdf665b16ab54d361b5317f4 · arbor-sim / arbor

Jul 20, 2018

Refactor modccutil.hpp (#542) · 5da63dde

Sam Yates authored 6 years ago

Fixes #139.

* Split colours and `pprintf(...)` into `io/pprintf.hpp` header.
* Remove generic `to_string()` function, replacing its very occasional usage with `pprintf`.
* Move block pretty printing into own .cpp file; this is the only place that the vector ostream printer was used.
* Remove `enum_hash`, as not needed with C++14.
* Move `is_in` utility function to `util.hpp`.
* Remove old SIMD printer backend code.

Unverified

5da63dde

Bugfix: a[i]=b[i] for memory::device_vector (#541) · 48cb9e53
Sam Yates authored 6 years ago and Benjamin Cumming committed 6 years ago
```
* Perform device-to-device copy when device_reference is assigned a device_reference.
```
48cb9e53
Merge pull request #540 from halfflat/feature/modcc-graceful-missing-file · c1a58409
Benjamin Cumming authored 6 years ago
```
Better error message on missing file.
```
Unverified

c1a58409
Better error message on missing file. · a3d87e73
Sam Yates authored 6 years ago
```
* Don't call a missing file an 'internal compiler error'.
```
a3d87e73

Jul 19, 2018

Cthreads: implement task queue per thread with task stealing (#528) · 4d63988a

noraabiakar authored 6 years ago and

Sam Yates committed 6 years ago

Cthreads classes:
- Notification queue : Manages tasks: tries or forces popping and pushing tasks.  
- Task system : manages the notifications queues; controls which queue to pop from/push to; controls spinning on queues if necessary; manages creating/joining threads. Is a singleton.
- Task group : manages synchronization on a group of tasks. 

Operation: 
- Each thread has an associated queue
- Task system _tries to_ push tasks in one of the available queues. If it is unable to acquire a lock on a queue, it tries the next in a round robin fashion. After it loops all queues if it still hasn't successfully pushed the task, it spins on a single queue until lock is acquired and task is pushed. 
- Task system _tries to_ pop a task from the calling thread's queue. If it is unable to acquire the lock, it tries to steal the task from another thread's queue, in a round robin loop. If it is still unable to pop a task, it spins on its the calling thread's queue until the lock is acquired. 
- Task group keeps a counter for the number of tasks in the group which it increments/decrements when calling push/pop on the task system. The counter is used to know when all tasks in the group have been executed. 

Unit tests: 
- Basic tests to pop/push from notification_queue, task_system and task_group. 
- Simple tests for deadlock 
- Simple tests for correctness 

Benchmark: 
- Benchmarks performance for various task sizes

4d63988a

Jul 13, 2018

fix to compile bench without mpi (#533) · bbe99176
noraabiakar authored 6 years ago and Benjamin Cumming committed 6 years ago

bbe99176

Feature/lib install target part 4 (#531) · d6af0c4d

Sam Yates authored 6 years ago and

Benjamin Cumming committed 6 years ago

All example code and validation tests no longer require access to private include directories. This provides the minimal requirement for an installable target

Note that it is still not possible to separately build mechanisms from NMODL with just the public includes, and there is not yet any package configuration file creation for use with CMake or pkg-config.

* Replace `hw::node_info` with `proc_allocation`, describing local resources for the purposes of domain decomposition.
* Group processor counting and gpu counting implementation under `node_info.cpp`.
* Remove `domain_decomposition` dependency from `cell_group_factory.hpp` so we can use the latter to test for backend support for a cell kind.
* Add `arb::cell_kind_implementation()` which performs the mapping from cell kind and backend kind to a `cell_group_ptr`-producing function (this will then become the site for custom cell group kind mapping support in future work).
* Move headers for aux library ...

d6af0c4d

Jul 10, 2018
- Do not build/use local modcc if ARB_MODCC set (#527) · b068a9d8
  Sam Yates authored 6 years ago and Benjamin Cumming committed 6 years ago
```
Fixes #526.
```
  b068a9d8
- Fix simd/native.hpp header after move to public includes (#525) · a38e73f9
  Sam Yates authored 6 years ago
```
Fixes issue #524.
```
  Unverified
  
  a38e73f9
Jul 06, 2018

Remove NDEBUG tests in memory utils. (#523) · 0c9906bd
Sam Yates authored 6 years ago and Benjamin Cumming committed 6 years ago
```
Fixes #182.
```
0c9906bd

Migrate source/build to c++14 ... · 3ee79191

Sam Yates authored 6 years ago and

Benjamin Cumming committed 6 years ago

Migrate source/build to c++14                                                                                                                    (#522)

* Update `CMakeLists.txt` for C++14 option.
* Update to gcc 6 minimum.
* Update travis CI from gcc-5 to gcc-6
* Use `std::..._t` style type traits, replacing `util::` aliases.
* Use `std::cbegin`, `std::cend`, and `std::make_unique`, replacing `util::` versions.
* Remove `DEDUCED_RETURN_TYPE` macros.
* Remove redundant return type specifications.
* Use correct ADL for `begin` and `end` in (almost all) the range utilities.
* Remove redundant `mechinfo` ctor (aggregate initialization suffices).
* Use lambda capture initializers where appropriate.
* Use generic `std::equal_to`.
* Use variable templates for `math::infinity` and `math::pi`.
* Remove `enum_hash` workaround.
* Use `""s` string literals where we were using our own `""_s` construction.
* Use generic lambda for recursive lambda instead of `std::function` wrapper.
* Use generic lambda ...

3ee79191

Fix GPU breakage in last PR (#520) · 26eda785
Sam Yates authored 6 years ago and Benjamin Cumming committed 6 years ago
```
Who broke the build? Sam did!
```
26eda785

Jul 05, 2018

Test for xlC and refuse to build with it. (#519) · 775fe807

Sam Yates authored 6 years ago and

Benjamin Cumming committed 6 years ago

Fixes issue #517.

Deprecate the IBM xlC compiler.
xlC generates code that is an order of a magnitude slower than gcc, while generating spurious warnings, and requiring hacks and workarounds to pass all tests.
Supporting it makes no sense.

* Add test and fatal error for xlC detection in CheckCompilerXLC.cmake.
* Move xlC 13 misdetection work around to CheckCompilerXLC.cmake.
* Remove xlC-specific compatibility workarounds from code.

775fe807

Feature/lib install target part 3 (#518) · 40612fa7

Sam Yates authored 6 years ago and

Benjamin Cumming committed 6 years ago

This time we're moving `recipe.hpp` and `simulation.hpp`, plus the requirements they bring.

Code changes:
* Pimplize `simulation`.
* Consolidate arbor exceptions: all non-cell kind specific exceptions that might be expected to reach user code now have consistent messages and fit in an exception hierarchy based at `arb::arbor_exception`. Internal errors throw an `arb::arbor_internal_error` exception.
* Renamed `postsynaptic_spike_event` to `spike_event`. (Note: `pse_vector` name is unchanged.)
* Repurposed `pprintf` and moved it into `strprintf.h` — further consolidation is a TODO.
* Made a generic `util::to_string` to avoid redundancy of `operator<<` overloads and other `to_string` definitions. Defaults to ADL `to_string`, `std::to_string`, and finally tries using `operator<<`.

40612fa7

Jul 03, 2018

Move cell description types to public includes. (#508) · a1894edc

Sam Yates authored 6 years ago and

Benjamin Cumming committed 6 years ago

Further work to public install target.

* Move SIMD classes, cell description classes, simple sampler to public include.
* Rename `cell` to `mc_cell`, `segment` to `mc_segment`, and remove `_description` from cell description class names and includes.
* Move `compartment_model` out of `mc_cell` interface and use only in `fvm_layout.cpp`.
* (Provisionally) remove area/volume methods on `mc_cell` and `mc_segment`.

a1894edc

Jun 25, 2018

Feature/lib install target part i (#506) · ad1c78ab

Sam Yates authored 6 years ago and

Benjamin Cumming committed 6 years ago

CMake and build refactoring

*   Use CUDA as first-class language (leading to CMake 3.9 minimum version requirement).

*   Use 'modern CMake' interface libraries for compiler options, include file and library dependency tracking. Interface library targets:
    * `arbor-deps`: compiler options and library requirements for the `libarbor.a` static library, as governed by configure-time options and environment.
    * `arbor-private-headers`: include path for non-installed headers, as required by unit tests and arbor itself.
    * `arbor-aux`: helper classes and utilities used across tests and examples.
    * `ext-json`, `ext-tclap`, `ext-tbb`, `ext-benchmark`, `ext-sphinx_rtd_theme`: externally maintained software that we include (directly or via submodule) in the `ext/` subdirectory.
 
*   Single static library `libarbor.a` includes all built-in modules and CUDA objects.

*   Simply configuration options:
    *  `ARB_WITH_TRACE`, `ARB_AUTORUN_MODCC_ON_CHA...

ad1c78ab

Jun 22, 2018

Benchmark cell type (#500) · 6ba39a92

Benjamin Cumming authored 6 years ago

Add a new cell type, and corresponding cell_group implementation, for benchmarking the simulator library architecture.

Add an benchmark_cell_group, where each cell in the group

generates a spike train prescribed by a time_seq
takes a prescribed time interval per cell to perform the cell_group::advance method.
With this cell type, one can easily build arbitrary networks with prescribed spiking and cell update overheads.
A miniapp that uses this cell type to build a benchmark model is implemented in example/bench.

Fixes #493
Fixes #501

6ba39a92

Jun 07, 2018

profile multicore mechanism state and current calls individually (#492) · 5e65a939

Benjamin Cumming authored 6 years ago

The built in profiler generates timings for state and current for individual multicore mechanisms.

Modcc generates and PE(advance_integrate_{state,current}_X) profiler calls (along with corresponding PL() for calls to multicore mechanism nrn_state and nrn_current API calls.

No timings are made for the gpu back end, which is not properly supported by the current profiling tools.

Unverified

5e65a939

Jun 04, 2018

Simd partition by constraint (#494) · 64171e43

noraabiakar authored 6 years ago and

Benjamin Cumming committed 6 years ago

Changes have been made to the simd implementation of mechansim functions:

- The node_index array (array of indices that specifies for each mechanism the CVs where it is present), is now partitioned into 4 arrays according to the constraint on each simd_vector in node_index:
1. contiguous array: contains the indices of all simd_vectors in node_index where the elements in simd_vector are contiguous
2. constant array: contains the indices of all simd_vectors in node_index where the elements in simd_vector are identical
3. independent array: contains the indices of all simd_vectors in node_index where the elements in simd_vector are independent (no repetitions) but not contiguous
4. none array: contains the indices of all simd_vectors in node_index where the none of the above constraints apply

When mechanism functions are executed, they loop over each of the 4 arrays separately. This allows for optimizations in every category.

- The modcc compiler was modified to generate code for the previous changes, including the optimizations per constraint:
1. contiguous array: we use vector load/store and vector arithmetic.
2. constant array: we load only one element and broadcast it into a simd_vector; we use vector arithmetic; we reduce the result; we store one element.
3. indepndent array: we use vector scatter/gather and vector arithmetic.
4. none array: we cannot operate on the simd_vector in parallel, we loop over the elements to read, perform arithmetic and write back

- Added a mechanism benchmark for pas, hh and expsyn

- Moved/modified some functions in simd.hpp to ensure that the correct implementation of a function is being called.

64171e43

generalize time sequences (#496) · 3082607f

Benjamin Cumming authored 6 years ago

Changes to libarbor
-------------------------

Time sequences were added in `src/time_sequence.hpp`:
- added new `time_seq` type that implements a type-erasure interface for the
  concept of a time sequence generator.
- added poisson, regular and vector-backed implementations of the time sequence
  concept.

Event generators:
- The poisson, regular and vector-backed implementations of the event generator
  concept were refactored to use the.

Cell groups:
- Removed the `dss_cell_group` and `rss_cell_group` and associated types.
- Added a generic spike source cell  that generates a sequence of spikes
  at time points specified by a `time_seq`. Using this approach, an
  additional `cell_group` specialization is not required for each type of
  sequence, and user-defined sequences can be used with minimal overhead.

Unit tests
------------

- Added unit tests for `time_seq`.
- Simplified `event_generator` unit tests, because much of the testing
  of the sequences was moved to the `time_seq` tests.
- Added unit tests for `spike_source_cell_group`.

Changes to miniapp
-------------------------

- simplified the miniapp by removing the command line options for using an input spike chain from file.
- updated the miniapp recipe to use `spike_source` cell group instead of `dss_cell_group`.

Unverified

3082607f

Jun 01, 2018

Runtime distributed context (#485) · 5fde0b00

Benjamin Cumming authored 6 years ago and

Sam Yates committed 6 years ago

Move from choosing the distributed communication model from a compile time choice (the old `arb::communication::communication_policy` type) to a run time decision.

* Add `arb::distributed_context` class that provides the required interface for distributed communication implementations, using type-erasure to provide value semantics.
* Add two implementations for the distributed context: `arb::mpi_context` and `arb::local_context`.
* Allow distribution over a user-supplied MPI communicator by providing it as an argument to `arb::mpi_context`.
* Add `mpi_error` exception type to wrap MPI errors.
* Move contents of the `arb::communication` namespace to the `arb` namespace.
* Add preprocessor for-each utility `ARB_PP_FOREACH`.
* Rewrite all examples and tests to use the new distributed context interface.
* Add documentation for distributed context class and semantics, and update documentation for load balancer and simulation classes accordingly.

Fixes #472

5fde0b00

May 15, 2018

Generic external variables for CUDA printer. (#490) · b07875e1

Sam Yates authored 6 years ago and

Benjamin Cumming committed 6 years ago

Replace hard-coded index variable names in modcc cuda printer with ones derived from the external variable.

Uses `decode_indexed_variable`.

b07875e1

May 11, 2018

Fix small compilation errors on OS X (#482) · f8f169db

Benjamin Cumming authored 6 years ago and

Sam Yates committed 6 years ago

* Use `sys/types.h` instead of `endian.h` for greater portability.
* Avoid use of constructor for `std::vector` in unit tests that is only available from C++14.

f8f169db

Update SIMD developer docs. (#488) · 1cf2df76
Sam Yates authored 6 years ago
```
* Update SIMD developer docs to reflect newly merged mechanism refactor work.
```
Unverified

1cf2df76

May 09, 2018

CUDA back end for the new mechanism infrastructure (#487) · e0f0b5d7

Benjamin Cumming authored 6 years ago and

Sam Yates committed 6 years ago

Completes CUDA printing in modcc.
* Add CudaPrinter visitor, overriding CPrinter.
* Add `ostream` `operator<<` overloads for `arb::gpu::shared_state` and `device_view` for debugging.
* Fix GPU back-end bugs.

e0f0b5d7

Mechanism Refactor: multicore and simd (#484) · 68135148

Sam Yates authored 6 years ago

First commit of two for mechanism refactor work (refer to PR #484 and PR #483).

FVM/mechanism code:
* Refactor mechanism data structures to decouple backend-specific implementations and mechanism metadata.
* Add mechanism catalogue for managing mechanism metadata and concrete implementation prototypes.
* Add fingerprint-checking to mechanism metadata and implementations to confirm they come from the same NMODL source (fingerprint is not yet computed, but tests are in place).
* Split FVM discretization work out from FVM integrator code.
* Use abstract base class over backend-templated FVM integrator class `fvm_lowered_cell_impl` to allow separate compilation of `mc_cell_group` and to remove the dummy backend code.
* Add a new FVM-specific scalar type `fvm_index_type` that is an alias for `int` to replace
`fvm_size_type` in fvm layouts and mechanisms. This was chosen as an alternative
to making `unsigned` versions of all our SIMD implementation classes.
* Extend `cable1d_neuron` global data to encompass: mechanism catalogue; default ion concentrations and charges; global temperature (only for Nernst); initial membrane potential.

Modcc:
* Collect printer sources in modcc under `printer/`.
* Move common functionality across printers into `printer/printerutil.{hpp,cpp}`.
* Add string to file I/O implemented in routines read_all and write_all in `io/bulkio.hpp`.
* Implement indent-friendly source code generation via a `std::streambuf` filter `io::prefixbuf` defined in `io/prefixbuf.hpp`, together with manipulators and a corresponding std::ostream-derived wrapper.
* Rewrite printers to use new infrastructure: cpu target incorporates SIMD printing options; CUDA printer at this point produces only stubs for CUDA kernel wrappers.
* Modify SIMD printing command line options for modcc: `-s` enables explicit vectorization using the SIMD classes;  `-S <N>` allows a specific data width to be prescribed.
* Fix problem in `test_ca.mod` with uninitialized ion current.
* Add infrastructure support to allow future pre-computation of SIMD index conflict cases for (hopefully) faster scatters and updates.
* Simplify `IndexedVariable` expressions in the AST, making data source explicit via a `sourceKind` enum, and leaving the indexing method and index names up to the printers.
* Allow state variables in the AST to 'shadow' an ion concentration — these are assigned in the
generated `write_ions` method.

SIMD classes:
* Add `simd_cast` operation between SIMD value types of the same width, and with `std::array`. (Note: this was tested and used in an early development version of the code, but not in this version. It was still a lacuna in the original SIMD wrappers, so it has been left in.)
* Restructure SIMD gather/scatter API to use a `simd::indirect` expression,  which encapsulates a pointer and SIMD offset.
* Add `simd::index_constraint` scoped enum to describe knowledge of contention in indirect indices, so that we can branch on this to the appropriate implementation.
* Add SIMD concrete implementation routines `reduce_add` for horizontal reduction and `element0` for access to first lane scalar value.
* Add SIMD value method `sum()` that exposes implementation `reduce_add`.
* Add SIMD concrete implementation routine `compound_indexed_add` that provides the implementation for `indirect(p, simd_indices) += simd_value` construction.
* Fix SIMD `implbase` bug where some static methods were using the `implbase` fall-back functions instead of the derived class specialized implementations.
* Move SIMD mathematical functions into friend routines of `simd_impl` in order to resolve implicit conversions from scalars in mixed SIMD-scalar operations.
* Use a templated `tag` class to dispatch on SIMD concrete implementation types, to avoid problems with incomplete types in method signatures.
* Remove old SIMD intrinsics.

CMake infrastructure:
* Downcase some variables in `CMakeLists.txt` files to  distinguish them visually from CMake keywords and variables.
* Split arbor modcc vectorization option (now `ARB_VECTORIZE`) and target-architecture optimization (now `ARB_ARCH`).
* For `arbor` and `arbormech` targets, and in particular not the `modcc` target, use `ARB_ARCH` to generate corresponding target-appropriate binaries, including, for example, appropriate SIMD support.
* Extend `CompilerOptions.cmake` to map as best as able between the various target architecture names (we use the gcc names) and the correct option to pass to the compiler based on the compiler and platform.
* Add work-around for misidentification by CMake of XL C as Clang.
* As a temporary work-around, include `arbormech` library twice on link line to resolve circular arbor–arbormech dependencies.

Unit tests:
* Extend repertoire of generic sequence equality/near equality testing support  in `common.hpp`.
* Add warning suppression for icc for the malloc instrumentation code.
* SIMD unit tests for indirect expressions, compound indirect add, reduction.
* Make some exact tests into floating point 'near' tests when comparing computed areas and lengths in swc and fvm layout tests, to account for compiler (e.g. icc) performing semantically inequivalent floating point operation reordering or fusion at `-O3`.
* Split out some of the CUDA tests into separate .cpp/.cu files for  separate-compilation purposes.

Other:
* The `padded_allocator` has been modified to propagate alignment/padding on move and copy (these semantics make their use much easier and safer in the multicore mechanism instantiation code).
* Map/table searching utilities in `util/maputil.hpp`.
* Fixes for correct sequence type categorization and `begin/end` ADL.
* Fixes for type guards for range methods that take universal references.
* Removal of some redundant code in range utilities through the use of universal references.
* Add new range view `reverse_view` for ranges delineated by bidirectional iterators.
* Add single argument form of `make_span` to count up from zero, and associated helper `count_along` that gives a span that indexes a supplied container.
* Moved `prefixbuf` to `modcc` source.
* Make sequence positive and negative tests in algorithms generic.
* Add `private`-subverting helper code/macro to `tests/unit/common.hpp` to reduce the number of public testing-only interfaces in the library code.
* Add virtual destructors for virtual base classes.
* Add new arb::math:: functions: `next_pow2` for unsigned integral types, `round_up` to round a number away from zero to next largest magnitude multiple.
* New `index_into` implementation that supports bidirectional access (moved to `util::` namespace).
* Fix problem in `test_ca.mod` with uninitialized ion current.
* Rework dangerous `memory::array(Iter, Iter)` constructor to be less dangerous (and do the expected thing).
* Allow ranges to be constructed from other ranges if the iterators are compatible.

68135148

Apr 11, 2018

Domain decomposition and simulation C++ API docs (#471) · 4c742a57

Ben Cumming authored 6 years ago and

Sam Yates committed 6 years ago

Add two new documentation pages for the C++ API

* Add domain decomposition page that covers `domain_decomposition`, `node_info` and `partition_load_balance`.
* Add simulation page that describes `arb::simulation` API interface.
* Fix some small typos elsewhere in the docs.
* Use `std::move` when adding spike callbacks to `arb::simulation` (useful if callbacks are stateful).

4c742a57

Fix support for Keplar (K20 & K80) GPUs. (#470) · 6b659a39

Ben Cumming authored 6 years ago and

Sam Yates committed 6 years ago

Fixes issue #467 

* Add GPU synchronization points where required for Kepler to coordinate CPU access of managed memory.
* Use hand-rolled double precision atomic addition for Kelper targets.
* Replace `ARB_WITH_CUDA` build option with `ARB_GPU_MODEL` option that takes one of 'none', 'K20', 'K80' or 'P100', and set up source-code defines accoringly.
* Clean up of redundant compiler flags and defines no longer required now that the project uses separate compilation for CUDA sources.

6b659a39

Apr 06, 2018
- Fix typos in SIMD docs (#469) · e9c45232
  noraabiakar authored 6 years ago
```
Fix some typos in the SIMD documentation.
```
  e9c45232
Apr 05, 2018

Add C++ docs for recipe (#461) · bc6fcffd

Ben Cumming authored 6 years ago and

Sam Yates committed 6 years ago

Add some C++ API documentation.

* Create C++ API section in docs.
* Document `arb::recipe`: both a class reference along with more explanatory text and best practices guide.
* Add some class documentation of basic types required to understand recipe definition.
* Some in-code comment clean up.
* Change `arb::cell_kind` from a vanilla enum to a scoped enum.

bc6fcffd

Mar 29, 2018

rename class 'model' to 'simulation' (#462) · 2b2044a6

Ben Cumming authored 6 years ago and

Sam Yates committed 6 years ago

The name `arb::model` did not clearly describe the role of the class, while `arb::simulation` better captures that this is an instantiation of a model for the purpose of running a simulation, as distinct from the description of a model represented by an `arb::recipe` instance.

* Rename sources `model.{hpp,cpp}` to `simulation.{hpp,cpp}`.
* Rename class `arb::model` to `arb::simulation`.
* Update docs and tests to suit.

2b2044a6

merge all SIMD docs into a single topic (#463) · 3d83af5b
Ben Cumming authored 6 years ago and Sam Yates committed 6 years ago
```
Put all the SIMD docs in a single topic, to simplify the documentation tree.
```
3d83af5b

Mar 27, 2018

Installation Guide (#459) · 0cf65a4c

Ben Cumming authored 6 years ago

Added an installation guide to the Read The Docs
Removed the outdated build/install information from README.md
Link from README to Read The Docs
Updated the splash page for Read The Docs

0cf65a4c

wrap warp intrinsics to fix depricated warnings (#456) · 7e6ea389

Ben Cumming authored 6 years ago

CUDA 9 introduced new, fine-grained, thread synchronization primitives.
In doing so, it introduced new forms of the warp intrinsics like __shfl_up, depricating the old symbols in the process.

It will be a while before we can use 9 as the default minimum, so we have to support compilers that expect the new and old behavior.

There are two options: wrap the intrinsics in question, or pass nvcc a flag to not issue warnings about depricated symbols. I go for the approach of wrapping, because I would rather keep the compiler warning turned on.

Fixes #379.

7e6ea389

Mar 26, 2018

Add padded allocator for aligned and padded vectors. (#460) · 581c4ef3

Sam Yates authored 6 years ago

Padded vectors with run-time padding/alignment guarantees will form the basis of the storage class for the new CPU and SIMD generated mechanisms.

* Add `padded_allocator` that aligns and pads allocations.
* Make microbenchmark for `default_construct_adaptor` that overrides the allocator construct() to default- instead of value-initialization on values.
* Add `with_instrumented_malloc` class for tracking malloc, realloc, etc. calls.
* Add unit tests for `padded_allocator`.

581c4ef3

Mar 20, 2018
- change project name to arbor in CMakeLists (#455) · 3019ae1e
  Ben Cumming authored 7 years ago
```
fixes #446.
```
  Unverified
  
  3019ae1e
Mar 19, 2018

Avoid intermediate underflow in expm1 calc with ftz. (#454) · 1499bf1e

Sam Yates authored 7 years ago

Intel compiler with default options does not guarantee correct fp behaviour with subnormals; it presumably sets the fp state to flush to zero.

Reordering a multiply and divide in the expm1 calculation avoids a transient subnormal value that was causing the routine to incorrectly return zero for very small, but normal, arguments.

1499bf1e

Mar 16, 2018

Fix broken namespace renaming in SIMD (#453) · f3be6dff
Sam Yates authored 7 years ago

f3be6dff

SIMD wrappers for Arbor generated mechanisms. (#450) · 2dff9c41

Sam Yates authored 7 years ago

This provides a bunch of SIMD intrinsic wrappers as a precursor to the SIMD printers.

The aim is that the SIMD printer can be agnostic regarding the particular vector architecture.

The design is based rather loosely on the proposal P0214R6 for C++ Parallelism TS 2. The transcendental function implementations are adapted from the existing SIMD architecture-specific code, which in turn are based on the Cephes library algorithms.

The custom CSS for the html documentation have been tweaked.

2dff9c41

update tbb cmake to use check_git_submodule (#452) · 5fe81e83
Ben Cumming authored 7 years ago

5fe81e83