- Nov 27, 2018
-
-
CMake wants to run a device link pass with nvcc despite there being no CUDA seperable compilation enabled anywhere, and then passes on -pthread to that unnecessary nvcc invocation when we use the Threads dependency. The latter, at least, is fixed in CMake 3.13. We used the prefer -pthread option for compatibility with our earlier build configuration; turning it off will hopefully have no consequence. We also enable device linking on the arbor library. Which is not needed, but if they are going to insist on doing it, it should be on the library rather than the executable. CMake then goes and does it on the executable anyway. Great. Fixes #645.
-
- Nov 21, 2018
-
-
* Forward CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES to compilation of arbor library and unit tests. Fixes #651
-
- Nov 13, 2018
-
-
-
This reverts commit be2a8a9f.
-
Add a new Hines matrix solver implementation for the GPU that can solve a single tree in parallel with multiple threads. It replaces the interleaved solver, which used a single thread to solve each matrix. Branches with the same common root in the tree can be solved independently on each of the forward and backward solution passes. * Add a matrix storage type, `arb::gpu::matrix_state_fine` that stores the branches of multiple trees for efficient backward and forward substitution. * Extend the `arb::tree` data structure to support operations for choosing a new root node and determining a root node which minimises the maximum distance between the root and any of the trees leaves. * Implement code for rebalancing a set of matrix trees, a.k.a. a "forest" of trees. * Add CUDA kernels for efficiently performing matrix assembly and matrix solution steps. * Add CMake option `ARB_WITH_GPU_FINE_MATRIX` for toggling the new solver (default `on`).
-
- Oct 16, 2018
-
-
- Oct 15, 2018
-
-
* Use `Unitful.uconvert` for scalar conversions (Float64 cast apparently does not work at the moment). * Use .+ for scalar/array addition. * Replace `immutable` with `struct`. * Qualify included modules with `Main.` for using statements. * Add informational note to FindJulia as component identification can take a long time as Julia may compile them from source.
-
Benjamin Cumming authored
fixes #627
-
Fixes #622.
-
- Oct 12, 2018
-
-
Sam Yates authored
* Use python3 version of print. * Use dict update method instead of item concatenation, as in Python3 dict.items() no longer returns a list.
-
Sam Yates authored
cf. CMake issue 16716: https://gitlab.kitware.com/cmake/cmake/issues/16716 * Bump version post 0.1 for development. * Read version string from file VERSION. * Strip suffix to make a numerical, CMake-compatible PROJECT_VERSION.
-
Fixes #618 and fixes #617. * Add convenience targets: 'examples' for all examples; 'tests' for all tests. * Add support for component-testing in installed CMake package. * Allow test for MPI support via find_package via component. * Remove REQUIRED specification from `find_dependency()` commands in generated config. * Update `mech_vec.cpp` to match new `fvm_lowered_cell_impl` constructor.
-
- Oct 11, 2018
-
-
Fix potential numeric instabilities in the ring benchmark caused by passing arguments to an event generator in the wrong order.
-
- Oct 10, 2018
-
-
Fixes #612. * Fix issues with permissions on directories created at install time (at least for CMake 3.11+). * Add CMake export guff to various targets and install an `arbor-config.cmake` for consumption by other CMake-based projects.
-
- Oct 04, 2018
-
-
Benjamin Cumming authored
Extend the ring benchmark to have an optional number of synapses attached to each cell, instead of a fixed count of one synapse per cell. This doesn't change the behavior of the model: only the first synapse is used for communication. The other synapses only effect is to increase the per-cell computational overheads, to more effectively mimic real world performance.
-
- Oct 03, 2018
-
-
Fixes an error in vectorized kernels that sees the incorrect index passed to PROCEDURE calls. The loop index variable was being passed, instead of the pack of vector indexes. Fixes #609
-
- Oct 01, 2018
-
-
Add CMake options for V100 support. fixes #605
-
Updates the install docs. Fixes #604
-
changes: - .travis.yml: - added matrix for different osx's, since enumeration style only works for `env` and `compiler` - scripts/travis/build.sh: - changed getting compiler version from ``${CXX} -dumpversion`` to ``${CXX} --version | grep -m1 ""`` - added `--oversubscribe` flag to `mpiexec` on Mac to allow more processes on a node than processing elements - added `--mca btl tcp,self` flag for Open MPI to use the "tcp" and "self" BTLs for transporting MPI messages on Mac
-
Fixes #603. * Clear exception pointer in exception_state helper class after move of state. * Rename exception_state::get() method to reset(). * Call std::terminate() if task_group is destroyed before tasks are collected with wait(). * Do not attempt to collect tasks in destructor for task_group. * Do not attempt to rethrow exception in destructor for exception_state. * Add unit test to verify correct exception behaviour when a task_group is runs and waits on a series of tasks. * Add unit test for terminate behaviour as above. Code quality fix ups: * Remove unused warning variable warning in threading exception tests. * Address if-statement spacing in threading.hpp. * Use ARB_HAVE_MPI in execution_context.cpp instead of introducing a dependency on generated version header via feature macro ARB_MPI_ENABLED.
-
- Sep 26, 2018
-
-
Propagate exceptions generated in `task_group` tasks on different threads in the threading backend, so that they are thrown on the main thread on `task_group.wait()`. Add tests that verify that exceptions are propagated correctly. Fixes #310.
-
- Sep 19, 2018
-
-
- Sep 18, 2018
-
- Sep 17, 2018
-
-
Dry-run mode: * An implementation of distributed_context that is used to mimic the performance of running an MPI distributed simulation with n ranks. * Verifiable against an MPI run with the same parameters. Implementation: * Describe the model on a single domain (tile) and translate it to however many domains we want to mimic using arb::tile and arb::symmetric_recipe. This allows us to know the exact behavior of the entire system by only running the simulation on a single node. * Mimic communication between domains using arb::dry_run_context Example: * dryrun in example/ is a verifiable example of using dry-run mode with mc_cells Other: * Documentation of dry-run mode * unit test for dry_run_context
-
- Sep 07, 2018
-
-
fixes #591
-
Benjamin Cumming authored
Turns out that CMake thinks Clang and AppleClang are different things.
-
- Sep 06, 2018
-
-
Fixes #587. * Eliminate Clang warnings from GCC-tree-optimization bug work-around. * Error with static-assert if simd type is used with a missing simd abi. * Clarify install documentation regarding use of ARB_VECTORIZE with ARB_ARCH.
-
- Sep 05, 2018
-
-
Fixes #584. * Add CUDA compile guard generator expression to architecture options iff CUDA is an enabled language.
-
Benjamin Cumming authored
Fixes #584.
-
- Sep 01, 2018
-
-
Remove redundant profiler calls that caused crashes when using event generators.
-
- Aug 30, 2018
-
-
Benjamin Cumming authored
Make the execution context presented to users an opaque handle, moving all implementation of the gpu, thread and distributed contexts into the back end. * move `execution_context` and `distributed_context` definitions to the back end * create `execution_context` handle called `context` in the public API * provide `make_context` helper functions that build different context configurations (default, user-specified local resources, with MPI) * update documentation for all parts of the public API that touch contexts * move `distributed_context` docs to the developer documentation (from the public API docs)
-
- Aug 29, 2018
-
-
Fixes #575. * Guard CPU architecture option for nvcc with generator expression.
-
- Aug 24, 2018
-
-
* Add new ring benchmark to examples. * Refactored common functionality for reading miniapp parameters from a json file to `aux` (used by both bench and ring). Fixes #516.
-
Benjamin Cumming authored
Move implementation of `gpu_context` from header to `cpp` file, so that `ARB_WITH_CUDA` doesn't leak from library implementation.
-
- Aug 22, 2018
-
-
* Add gpu_context as part of execution context containing information about GPU availability, managed_memory synchronization, and atomic double availability. * Choose between ON and OFF for ARB_GPU in CMake. If ON compile for K20, K80, and P100 Note that we still need compile time information about the GPU in cuda_atomic.hpp for atomicAdd(double*, double*). This is because the function is only defined when the program is compiled for sm_60 or more.
-
Fixes #568.
-
Fixes #564
-
Use a compat::fma wrapper for std::fma to avoid a bug in the tree optimizer in GCC version < 8.2. See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87046 Fixes #568.
-
- Aug 20, 2018
-
-
Global temperature for mechanisms. * Make 'celsius' magic in modcc: now an indexed variable. * Add a new temperature data source for indexed variables. * Add support to printers for indexed variables that reference a scalar. * Check that indexed variables aren't used in PROCEDURE blocks (this is a problem not just for 'celsius'). * Modify built-in mod files to pass celsius as a parameter to rates() procedures. * Add global temperature to shared_state classes, and initialize through backend mechanism superclasses. * Add some infrastructure for unit-test only mechanisms. * Set modcc flags globally in top level CMakeLists.txt. * Add test mechanism/module for checking celsius setting. * Add unit test for multicore and gpu mechanism celsius setting. * Make common mechanism private field data access helper for unit tests. * Use helper in temperature, synapses tests. * Fix warning in `distribued_context.hpp` about errant semicolon. * Fix global scal...
-
- Aug 06, 2018
-
-
Sam Yates authored
Two MacPorts/gcc7 issues: std::uint64_t is unsigned long long on OS X, breaking an assumption about size_t in the distributed_context interface. Problems with missing errno defines in the standard library headers. With MacPorts gcc7, the installed c++config.h defines _GLIBCXX_HAVE_EOWNERDEAD and _GLIBCXX_HAVE_ENOTRECOVERABLE, but the corresponding errno defines are not provided by sys/errno.h unless __DARWIN_C_SOURCE, which takes its value from _POSIX_C_SOURCE if defined, is greater than or equal to 200809L. Technically a MacPorts configuration bug? but easily worked around. Use basic integral types for communication collectives interfaces. Define _POSIX_C_SOURCE to be 200809L for glob.cpp. Fixes #562.
-