- Nov 13, 2018
-
-
-
This reverts commit be2a8a9f.
-
Add a new Hines matrix solver implementation for the GPU that can solve a single tree in parallel with multiple threads. It replaces the interleaved solver, which used a single thread to solve each matrix. Branches with the same common root in the tree can be solved independently on each of the forward and backward solution passes. * Add a matrix storage type, `arb::gpu::matrix_state_fine` that stores the branches of multiple trees for efficient backward and forward substitution. * Extend the `arb::tree` data structure to support operations for choosing a new root node and determining a root node which minimises the maximum distance between the root and any of the trees leaves. * Implement code for rebalancing a set of matrix trees, a.k.a. a "forest" of trees. * Add CUDA kernels for efficiently performing matrix assembly and matrix solution steps. * Add CMake option `ARB_WITH_GPU_FINE_MATRIX` for toggling the new solver (default `on`).
-
- Oct 16, 2018
-
-
- Oct 15, 2018
-
-
* Use `Unitful.uconvert` for scalar conversions (Float64 cast apparently does not work at the moment). * Use .+ for scalar/array addition. * Replace `immutable` with `struct`. * Qualify included modules with `Main.` for using statements. * Add informational note to FindJulia as component identification can take a long time as Julia may compile them from source.
-
Benjamin Cumming authored
fixes #627
-
Fixes #622.
-
- Oct 12, 2018
-
-
Sam Yates authored
* Use python3 version of print. * Use dict update method instead of item concatenation, as in Python3 dict.items() no longer returns a list.
-
Sam Yates authored
cf. CMake issue 16716: https://gitlab.kitware.com/cmake/cmake/issues/16716 * Bump version post 0.1 for development. * Read version string from file VERSION. * Strip suffix to make a numerical, CMake-compatible PROJECT_VERSION.
-
Fixes #618 and fixes #617. * Add convenience targets: 'examples' for all examples; 'tests' for all tests. * Add support for component-testing in installed CMake package. * Allow test for MPI support via find_package via component. * Remove REQUIRED specification from `find_dependency()` commands in generated config. * Update `mech_vec.cpp` to match new `fvm_lowered_cell_impl` constructor.
-
- Oct 11, 2018
-
-
Fix potential numeric instabilities in the ring benchmark caused by passing arguments to an event generator in the wrong order.
-
- Oct 10, 2018
-
-
Fixes #612. * Fix issues with permissions on directories created at install time (at least for CMake 3.11+). * Add CMake export guff to various targets and install an `arbor-config.cmake` for consumption by other CMake-based projects.
-
- Oct 04, 2018
-
-
Benjamin Cumming authored
Extend the ring benchmark to have an optional number of synapses attached to each cell, instead of a fixed count of one synapse per cell. This doesn't change the behavior of the model: only the first synapse is used for communication. The other synapses only effect is to increase the per-cell computational overheads, to more effectively mimic real world performance.
-
- Oct 03, 2018
-
-
Fixes an error in vectorized kernels that sees the incorrect index passed to PROCEDURE calls. The loop index variable was being passed, instead of the pack of vector indexes. Fixes #609
-
- Oct 01, 2018
-
-
Add CMake options for V100 support. fixes #605
-
Updates the install docs. Fixes #604
-
changes: - .travis.yml: - added matrix for different osx's, since enumeration style only works for `env` and `compiler` - scripts/travis/build.sh: - changed getting compiler version from ``${CXX} -dumpversion`` to ``${CXX} --version | grep -m1 ""`` - added `--oversubscribe` flag to `mpiexec` on Mac to allow more processes on a node than processing elements - added `--mca btl tcp,self` flag for Open MPI to use the "tcp" and "self" BTLs for transporting MPI messages on Mac
-
Fixes #603. * Clear exception pointer in exception_state helper class after move of state. * Rename exception_state::get() method to reset(). * Call std::terminate() if task_group is destroyed before tasks are collected with wait(). * Do not attempt to collect tasks in destructor for task_group. * Do not attempt to rethrow exception in destructor for exception_state. * Add unit test to verify correct exception behaviour when a task_group is runs and waits on a series of tasks. * Add unit test for terminate behaviour as above. Code quality fix ups: * Remove unused warning variable warning in threading exception tests. * Address if-statement spacing in threading.hpp. * Use ARB_HAVE_MPI in execution_context.cpp instead of introducing a dependency on generated version header via feature macro ARB_MPI_ENABLED.
-
- Sep 26, 2018
-
-
Propagate exceptions generated in `task_group` tasks on different threads in the threading backend, so that they are thrown on the main thread on `task_group.wait()`. Add tests that verify that exceptions are propagated correctly. Fixes #310.
-
- Sep 19, 2018
-
-
- Sep 18, 2018
-
- Sep 17, 2018
-
-
Dry-run mode: * An implementation of distributed_context that is used to mimic the performance of running an MPI distributed simulation with n ranks. * Verifiable against an MPI run with the same parameters. Implementation: * Describe the model on a single domain (tile) and translate it to however many domains we want to mimic using arb::tile and arb::symmetric_recipe. This allows us to know the exact behavior of the entire system by only running the simulation on a single node. * Mimic communication between domains using arb::dry_run_context Example: * dryrun in example/ is a verifiable example of using dry-run mode with mc_cells Other: * Documentation of dry-run mode * unit test for dry_run_context
-
- Sep 07, 2018
-
-
fixes #591
-
Benjamin Cumming authored
Turns out that CMake thinks Clang and AppleClang are different things.
-
- Sep 06, 2018
-
-
Fixes #587. * Eliminate Clang warnings from GCC-tree-optimization bug work-around. * Error with static-assert if simd type is used with a missing simd abi. * Clarify install documentation regarding use of ARB_VECTORIZE with ARB_ARCH.
-
- Sep 05, 2018
-
-
Fixes #584. * Add CUDA compile guard generator expression to architecture options iff CUDA is an enabled language.
-
Benjamin Cumming authored
Fixes #584.
-
- Sep 01, 2018
-
-
Remove redundant profiler calls that caused crashes when using event generators.
-
- Aug 30, 2018
-
-
Benjamin Cumming authored
Make the execution context presented to users an opaque handle, moving all implementation of the gpu, thread and distributed contexts into the back end. * move `execution_context` and `distributed_context` definitions to the back end * create `execution_context` handle called `context` in the public API * provide `make_context` helper functions that build different context configurations (default, user-specified local resources, with MPI) * update documentation for all parts of the public API that touch contexts * move `distributed_context` docs to the developer documentation (from the public API docs)
-
- Aug 29, 2018
-
-
Fixes #575. * Guard CPU architecture option for nvcc with generator expression.
-
- Aug 24, 2018
-
-
* Add new ring benchmark to examples. * Refactored common functionality for reading miniapp parameters from a json file to `aux` (used by both bench and ring). Fixes #516.
-
Benjamin Cumming authored
Move implementation of `gpu_context` from header to `cpp` file, so that `ARB_WITH_CUDA` doesn't leak from library implementation.
-
- Aug 22, 2018
-
-
* Add gpu_context as part of execution context containing information about GPU availability, managed_memory synchronization, and atomic double availability. * Choose between ON and OFF for ARB_GPU in CMake. If ON compile for K20, K80, and P100 Note that we still need compile time information about the GPU in cuda_atomic.hpp for atomicAdd(double*, double*). This is because the function is only defined when the program is compiled for sm_60 or more.
-
Fixes #568.
-
Fixes #564
-
Use a compat::fma wrapper for std::fma to avoid a bug in the tree optimizer in GCC version < 8.2. See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87046 Fixes #568.
-
- Aug 20, 2018
-
-
Global temperature for mechanisms. * Make 'celsius' magic in modcc: now an indexed variable. * Add a new temperature data source for indexed variables. * Add support to printers for indexed variables that reference a scalar. * Check that indexed variables aren't used in PROCEDURE blocks (this is a problem not just for 'celsius'). * Modify built-in mod files to pass celsius as a parameter to rates() procedures. * Add global temperature to shared_state classes, and initialize through backend mechanism superclasses. * Add some infrastructure for unit-test only mechanisms. * Set modcc flags globally in top level CMakeLists.txt. * Add test mechanism/module for checking celsius setting. * Add unit test for multicore and gpu mechanism celsius setting. * Make common mechanism private field data access helper for unit tests. * Use helper in temperature, synapses tests. * Fix warning in `distribued_context.hpp` about errant semicolon. * Fix global scal...
-
- Aug 06, 2018
-
-
Sam Yates authored
Two MacPorts/gcc7 issues: std::uint64_t is unsigned long long on OS X, breaking an assumption about size_t in the distributed_context interface. Problems with missing errno defines in the standard library headers. With MacPorts gcc7, the installed c++config.h defines _GLIBCXX_HAVE_EOWNERDEAD and _GLIBCXX_HAVE_ENOTRECOVERABLE, but the corresponding errno defines are not provided by sys/errno.h unless __DARWIN_C_SOURCE, which takes its value from _POSIX_C_SOURCE if defined, is greater than or equal to 200809L. Technically a MacPorts configuration bug? but easily worked around. Use basic integral types for communication collectives interfaces. Define _POSIX_C_SOURCE to be 200809L for glob.cpp. Fixes #562.
-
- Jul 31, 2018
-
-
Sam Yates authored
* Remove dependency on memory library and range utils from `multi_event_stream.cu` source. Fixes #545
-
* Replace distributed_contest with shared_ptr<distributed_context> in execution_context and pass around the shared pointer instead of a raw pointer. * Fix construction of mpi_context * Remove num_threads() from arb and arb::threading. Modify mpi_context so it also returns a shared_ptr. proc_allocation is initialized from execution context to determine available resources. * Rename threading backend files. Delete useless files. * Pass execution_context by const reference or value. * Remove code duplication in thread_system constructors.
-