Skip to content
Snippets Groups Projects
Unverified Commit 1c89fbbd authored by Benjamin Cumming's avatar Benjamin Cumming Committed by GitHub
Browse files

Hardware API documentation (#707)

Update Hardware API documentation

* split the domain decomposition and hardware API docs into separate pages
* update hardware API to reflect new *libarbor* and *libarborenv*
* add basic documentation for `optional`, `any` and `unique_any` types.
parent 49d87aba
No related branches found
No related tags found
No related merge requests found
...@@ -115,3 +115,40 @@ Probes ...@@ -115,3 +115,40 @@ Probes
.. cpp:member:: util::any address .. cpp:member:: util::any address
Cell-type specific location info, specific to cell kind of ``id.gid``. Cell-type specific location info, specific to cell kind of ``id.gid``.
Utility Wrappers and Containers
--------------------------------
.. cpp:namespace:: arb::util
.. cpp:class:: template <typename T> optional
A wrapper around a contained value of type :cpp:type:`T`, that may or may not be set.
A faithful copy of the C++17 ``std::optional`` type.
See the online C++ standard documentation
`<https://en.cppreference.com/w/cpp/utility/optional>`_
for more information.
.. cpp:class:: any
A container for a single value of any type that is copy constructable.
Used in the Arbor API where a type of a value passed to or from the API
is decided at run time.
A faithful copy of the C++17 ``std::any`` type.
See the online C++ standard documentation
`<https://en.cppreference.com/w/cpp/utility/any>`_
for more information.
The :cpp:any:`arb::util` namespace also implementations of the
:cpp:any:`any_cast`, :cpp:any:`make_any` and :cpp:any:`bad_any_cast`
helper functions and types from C++17.
.. cpp:class:: unique_any
Equivalent to :cpp:class:`util::any`, except that:
* it can store any type that is move constructable;
* it is move only, that is it can't be copied.
...@@ -3,207 +3,7 @@ ...@@ -3,207 +3,7 @@
Domain Decomposition Domain Decomposition
==================== ====================
The C++ API for defining hardware resources and partitioning a model over The C++ API for partitioning a model over distributed and local hardware is described here.
distributed and local hardware is described here.
Arbor provides two library APIs for working with hardware resources:
* The core *libarbor* is used to *describe* the hardware resources
and their contexts for use in Arbor simulations.
* The *libarborenv* provides an API for querying available hardware
resources (e.g. the number of available GPUs), and initializing MPI.
Managing Hardware
-----------------
The *libarborenv* API for querying and managing hardware resources is in the
:cpp:any:`arbenv` namespace. This functionality is in a seperate
library because the main Arbor library should only
present an interface for running simulations on hardware resources provided
by the calling application. As such, it should not provide access to how
it manages hardware resources internally, or place restrictions on how
the calling application selects or manages resources such as GPUs and MPI communicators.
However, for the purpose of writing tests, examples, benchmarks and validation
tests, functionality for detecting GPUs, managing MPI lifetimes and the like
is neccesary. This functionality is kept in a separate library to ensure
separation of concerns, and to provide examples of quality implementations
of such functionality for users of the library to reuse.
.. cpp:namespace:: arbenv
.. cpp:function:: arb::optional<int> get_env_num_threads()
Tests whether the number of threads to use has been set in an environment variable.
First checks ``ARB_NUM_THREADS``, and if that is not set checks ``OMP_NUM_THREADS``.
Return value:
* no value: the :cpp:any:`optional` return value contains no value if the
* has value: the number of threads set by the environment variable.
Exceptions:
* throws :cpp:any:`std::runtime_error` if environment variable set with invalid
number of threads.
.. container:: example-code
.. code-block:: cpp
if (auto nt = arbenv::get_env_num_threads()) {
std::cout << "requested " << nt.value() << "threads \n";
}
else {
std::cout << "no enviroment variable set\n";
}
.. cpp:function:: int thread_concurrency()
Attempts to detect the number of available CPU cores. Returns 1 if unable to detect
the number of cores.
.. container:: example-code
.. code-block:: cpp
// Set num_threads to value from environment variable if set,
// otherwise set it to the available number of cores.
int num_threads = 0;
if (auto nt = arbenv::get_env_num_threads()) {
num_threads = nt.value();
}
else {
num_threads = arbenv::thread_concurrency();
}
.. cpp:function:: int default_gpu()
Detects if a GPU is available, and returns the
Return value:
* non-negative value: if a GPU is available, the index of the selected GPU is returned. The index will be in the range ``[0, num_gpus)`` where ``num_gpus`` is the number of GPUs detected using the ``cudaGetDeviceCount`` `CUDA API call <https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html>`_.
* -1: if no GPU available, or if Arbor was built without GPU support.
.. container:: example-code
.. code-block:: cpp
if (arbenv::default_gpu()>-1) {}
std::cout << "a GPU is available\n";
}
.. cpp:function:: int find_private_gpu(MPI_Comm comm)
stuff.
.. cpp:class:: with_mpi
Purpose and functionality
Constructor
Usage notes.
Blurb for the *libarbor*
.. cpp:namespace:: arb
.. cpp:class:: proc_allocation
Enumerates the computational resources to be used for a simulation, typically a
subset of the resources available on a physical hardware node.
.. container:: example-code
.. code-block:: cpp
// Default construction uses all detected cores/threads, and the first GPU, if available.
arb::proc_allocation resources;
// Remove any GPU from the resource description.
resources.gpu_id = -1;
.. cpp:function:: proc_allocation() = default
Sets the number of threads to the number detected by :cpp:func:`get_local_resources`, and
chooses either the first available GPU, or no GPU if none are available.
.. cpp:function:: proc_allocation(unsigned threads, int gpu_id)
Constructor that sets the number of :cpp:var:`threads` and selects :cpp:var:`gpus` available.
.. cpp:member:: unsigned num_threads
The number of CPU threads available.
.. cpp:member:: int gpu_id
The identifier of the GPU to use.
The gpu id corresponds to the ``int device`` parameter used by CUDA API calls
to identify gpu devices.
Set to -1 to indicate that no GPU device is to be used.
See ``cudaSetDevice`` and ``cudaDeviceGetAttribute`` provided by the
`CUDA API <https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html>`_.
.. cpp:function:: bool has_gpu() const
Indicates whether a GPU is selected (i.e. whether :cpp:member:`gpu_id` is ``-1``).
Execution Context
-----------------
The :cpp:class:`proc_allocation` class enumerates the hardware resources on the local hardware
to use for a simulation.
.. cpp:namespace:: arb
.. cpp:class:: context
A handle for the interfaces to the hardware resources used in a simulation.
A :cpp:class:`context` contains the local thread pool, and optionally the GPU state
and MPI communicator, if available. Users of the library do not directly use the functionality
provided by :cpp:class:`context`, instead they configure contexts, which are passed to
Arbor methods and types.
.. cpp:function:: context make_context()
Local context that uses all detected threads and a GPU if any are available.
.. cpp:function:: context make_context(proc_allocation alloc)
Local context that uses the local resources described by :cpp:var:`alloc`.
.. cpp:function:: context make_context(proc_allocation alloc, MPI_Comm comm)
A context that uses the local resources described by :cpp:var:`alloc`, and
uses the MPI communicator :cpp:var:`comm` for distributed calculation.
Here are some examples of how to create a :cpp:class:`arb::context`:
.. container:: example-code
.. code-block:: cpp
#include <arbor/context.hpp>
// Construct a non-distributed context that uses all detected available resources.
auto context = arb::make_context();
// Construct a context that:
// * does not use a GPU, reguardless of whether one is available;
// * uses 8 threads in its thread pool.
arb::proc_allocation resources(8, -1);
auto context = arb::make_context(resources);
// Construct a context that:
// * uses all available local hardware resources;
// * uses the standard MPI communicator MPI_COMM_WORLD for distributed computation.
arb::proc_allocation resources; // defaults to all detected local resources
auto context = arb::make_context(resources, MPI_COMM_WORLD);
Load Balancers Load Balancers
-------------- --------------
...@@ -217,11 +17,11 @@ distributed with MPI communication. The returned :cpp:class:`domain_decompositio ...@@ -217,11 +17,11 @@ distributed with MPI communication. The returned :cpp:class:`domain_decompositio
describes the cell groups on the local MPI rank. describes the cell groups on the local MPI rank.
.. Note:: .. Note::
The :cpp:class:`domain_decomposition` type is simple and The :cpp:class:`domain_decomposition` type is
independent of any load balancing algorithm, so users can supply their independent of any load balancing algorithm, so users can define a
own domain decomposition without using one of the built-in load balancers. domain decomposition directly, instead of generating it with a load balancer.
This is useful for cases where the provided load balancers are inadequate, This is useful for cases where the provided load balancers are inadequate,
and when the user has specific insight into running their model on the or when the user has specific insight into running their model on the
target computer. target computer.
.. cpp:namespace:: arb .. cpp:namespace:: arb
......
...@@ -74,11 +74,11 @@ To support dry-run mode we use the following classes: ...@@ -74,11 +74,11 @@ To support dry-run mode we use the following classes:
.. Note:: .. Note::
While this class inherits from :cpp:class:`arb::recipe`, it breaks one of its implicit While this class inherits from :cpp:class:`arb::recipe`, it breaks one of its implicit
rules: it allows connection from gids greater than the total number of cells in a recipe, rules: it allows connection from gids greater than the total number of cells in a recipe,
:cpp:var:`ncells`. :cpp:any:`ncells`.
:cpp:class:`arb::tile` describes the model on a single domain containing :cpp:expr:`num_cells = :cpp:class:`arb::tile` describes the model on a single domain containing :cpp:expr:`num_cells =
num_cells_per_tile` cells, which is to be duplicated over :cpp:var:`num_ranks` num_cells_per_tile` cells, which is to be duplicated over :cpp:any:`num_ranks`
domains in dry-run mode. It contains information about :cpp:var:`num_ranks` which is provided domains in dry-run mode. It contains information about :cpp:any:`num_ranks` which is provided
by the following function: by the following function:
.. cpp:function:: cell_size_type num_tiles() const .. cpp:function:: cell_size_type num_tiles() const
......
.. _cpphardware:
Hardware Management
===================
Arbor provides two library APIs for working with hardware resources:
* The core *libarbor* is used to *describe* the hardware resources
and their contexts for use in Arbor simulations.
* The *libarborenv* provides an API for querying available hardware
resources (e.g. the number of available GPUs), and initializing MPI.
libarborenv
-------------------
The *libarborenv* API for querying and managing hardware resources is in the
:cpp:any:`arbenv` namespace.
This functionality is kept in a separate library to enforce
separation of concerns, so that users have full control over how hardware resources
are selected, either using the functions and types in *libarborenv*, or writing their
own code for managing MPI, GPUs, and thread counts.
.. cpp:namespace:: arbenv
.. cpp:function:: arb::util::optional<int> get_env_num_threads()
Tests whether the number of threads to use has been set in an environment variable.
First checks ``ARB_NUM_THREADS``, and if that is not set checks ``OMP_NUM_THREADS``.
Return value:
* **no value**: the :cpp:any:`optional` return value contains no value if the
no thread count was specified by an environment variable.
* **has value**: the number of threads set by the environment variable.
Throws:
* throws :cpp:any:`std::runtime_error` if environment variable set with invalid
number of threads.
.. container:: example-code
.. code-block:: cpp
#include <arborenv/concurrency.hpp>
if (auto nt = arbenv::get_env_num_threads()) {
std::cout << "requested " << nt.value() << "threads \n";
}
else {
std::cout << "no environment variable set\n";
}
.. cpp:function:: int thread_concurrency()
Attempts to detect the number of available CPU cores. Returns 1 if unable to detect
the number of cores.
.. container:: example-code
.. code-block:: cpp
#include <arborenv/concurrency.hpp>
// Set num_threads to value from environment variable if set,
// otherwise set it to the available number of cores.
int num_threads = 0;
if (auto nt = arbenv::get_env_num_threads()) {
num_threads = nt.value();
}
else {
num_threads = arbenv::thread_concurrency();
}
.. cpp:function:: int default_gpu()
Returns the integer identifier of the first available GPU, if a GPU is available
Return value:
* **non-negative value**: if a GPU is available, the index of the selected GPU is returned. The index will be in the range ``[0, num_gpus)`` where ``num_gpus`` is the number of GPUs detected using the ``cudaGetDeviceCount`` `CUDA API call <https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html>`_.
* **-1**: if no GPU available, or if Arbor was built without GPU support.
.. container:: example-code
.. code-block:: cpp
#include <arborenv/gpu_env.hpp>
if (arbenv::default_gpu()>-1) {}
std::cout << "a GPU is available\n";
}
.. cpp:function:: int find_private_gpu(MPI_Comm comm)
A helper function that assigns a unique GPU to every MPI rank.
.. Note::
Arbor allows at most one GPU per MPI rank, and furthermore requires that
an MPI rank has exclusive access to a GPU, i.e. two MPI ranks can not
share a GPU.
The task of assigning a unique GPU to each rank when more than one rank
have access to the same GPU(s).
An example use case is on systems with "fat" nodes with multiple GPUs
per node, in which case Arbor should be run with multiple MPI ranks
per node.
Uniquely assigning GPUs is quite difficult, and this function provides
what we feel is a robust implementation.
All MPI ranks in the MPI communicator :cpp:any:`comm` should call to
avoid a deadlock.
Return value:
* **non-negative integer**: the identifier of the GPU assigned to this rank.
* **-1**: no GPU was available for this MPI rank.
Throws:
* :cpp:any:`std::runtime_error`: if there was an error in the CUDA runtime
on the local or remote MPI ranks, i.e. if one rank throws, all ranks
will throw.
.. cpp:class:: with_mpi
The :cpp:class:`with_mpi` type is a simple RAII scoped guard for MPI initialization
and finalization. On creation :cpp:class:`with_mpi` will call :cpp:any:`MPI_Init_thread`
to initialize MPI with the minimum level thread support required by Arbor, that is
``MPI_THREAD_SERIALIZED``. When it goes out of scope it will automatically call
:cpp:any:`MPI_Finalize`.
.. cpp:function:: with_mpi(int& argcp, char**& argvp, bool fatal_errors = true)
The constructor takes the :cpp:any:`argc` and :cpp:any:`argv` arguments
passed to main of the calling application, and an additional flag
:cpp:any:`fatal_errors` that toggles whether errors in MPI API calls
should return error codes or terminate.
.. Warning::
Handling exceptions is difficult in MPI applications, and it is the users
responsibility to do so.
The :cpp:class:`with_mpi` scope guard attempts to facilitate error reporting of
uncaught exceptions, particularly in the case where one rank throws an exception,
while the other ranks continue executing. In this case there would be a deadlock
if the rank with the exception attempts to call :cpp:any:`MPI_Finalize` and
other ranks are waiting in other MPI calls. If this happens inside a try-catch
block, the deadlock stops the exception from being handled.
For this reason the destructor of :cpp:class:`with_mpi` only calls
:cpp:any:`MPI_Finalize` if there are no uncaught exceptions.
This isn't perfect because the other MPI ranks still deadlock,
however it gives the exception handling code to report the error for debugging.
An example workflow that uses the MPI scope guard. Note that this code will
print the exception error message in the case where only one MPI rank threw
an exception, though it would either then deadlock or exit with an error code
that one or more MPI ranks exited without calling :cpp:any:`MPI_Finalize`.
.. container:: example-code
.. code-block:: cpp
#include <exception>
#include <iostream>
#include <arborenv/with_mpi.hpp>
int main(int argc, char** argv) {
try {
// Constructing guard will initialize MPI with a
// call to MPI_Init_thread()
arbenv::with_mpi guard(argc, argv, false);
// Do some work with MPI here
// When leaving this scope, the destructor of guard will
// call MPI_Finalize()
}
catch (std::exception& e) {
std::cerr << "error: " << e.what() << "\n";
return 1;
}
return 0;
}
libarbor
-------------------
The core Arbor library *libarbor* provides an API for:
* prescribing which hardware resources are to be used by a
simulation using :cpp:class:`arb::proc_allocation`.
* opaque handles to hardware resources used by simulations called
:cpp:class:`arb::context`.
.. cpp:namespace:: arb
.. cpp:class:: proc_allocation
Enumerates the computational resources on a node to be used for simulation,
specifically the number of threads and identifier of a GPU if available.
.. Note::
Each MPI rank in a distributed simulation uses a :cpp:class:`proc_allocation`
to describe the subset of resources on its node that it will use.
.. container:: example-code
.. code-block:: cpp
#include <arbor/context.hpp>
// default: 1 thread and no GPU selected
arb::proc_allocation resources;
// 8 threads and no GPU
arb::proc_allocation resources(8, -1);
// 4 threads and the first available GPU
arb::proc_allocation resources(8, 0);
// Construct with
auto num_threads = arbenv::thread_concurrency();
auto gpu_id = arbenv::default_gpu();
arb::proc_allocation resources(num_threads, gpu_id);
.. cpp:function:: proc_allocation() = default
By default selects one thread and no GPU.
.. cpp:function:: proc_allocation(unsigned threads, int gpu_id)
Constructor that sets the number of :cpp:var:`threads` and the id :cpp:var:`gpu_id` of
the
.. cpp:member:: unsigned num_threads
The number of CPU threads available.
.. cpp:member:: int gpu_id
The identifier of the GPU to use.
The gpu id corresponds to the ``int device`` parameter used by CUDA API calls
to identify gpu devices.
Set to -1 to indicate that no GPU device is to be used.
See ``cudaSetDevice`` and ``cudaDeviceGetAttribute`` provided by the
`CUDA API <https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html>`_.
.. cpp:function:: bool has_gpu() const
Indicates whether a GPU is selected (i.e. whether :cpp:member:`gpu_id` is ``-1``).
.. cpp:namespace:: arb
.. cpp:class:: context
An opaque handle for the hardware resources used in a simulation.
A :cpp:class:`context` contains a thread pool, and optionally the GPU state
and MPI communicator. Users of the library do not directly use the functionality
provided by :cpp:class:`context`, instead they create contexts, which are passed to
Arbor interfaces for domain decomposition and simulation.
Arbor contexts are created by calling :cpp:func:`make_context`, which returns an initialized
context. There are two versions of :cpp:func:`make_context`, for creating contexts
with and without distributed computation with MPI respectively.
.. cpp:function:: context make_context(proc_allocation alloc=proc_allocation())
Create a local :cpp:class:`context`, with no distributed/MPI,
that uses local resources described by :cpp:any:`alloc`.
By default it will create a context with one thread and no GPU.
.. cpp:function:: context make_context(proc_allocation alloc, MPI_Comm comm)
Create a distributed :cpp:class:`context`.
A context that uses the local resources described by :cpp:any:`alloc`, and
uses the MPI communicator :cpp:var:`comm` for distributed calculation.
Contexts can be queried for information about which features a context has enabled,
whether it has a GPU, how many threads are in its thread pool, using helper functions.
.. cpp:function:: bool has_gpu(const context&)
Query if the context has a GPU.
.. cpp:function:: unsigned num_threads(const context&)
Query the number of threads in a context's thread pool
.. cpp:function:: bool has_mpi(const context&)
Query if the context has an MPI communicator.
.. cpp:function:: unsigned num_ranks(const context&)
Query the number of distributed ranks. If the context has an MPI
communicator, return is equivalent to :cpp:any:`MPI_Comm_size`.
If the communicator has no MPI, returns 1.
.. cpp:function:: unsigned rank(const context&)
Query the rank of the calling rand. If the context has an MPI
communicator, return is equivalent to :cpp:any:`MPI_Comm_rank`.
If the communicator has no MPI, returns 0.
Here are some simple examples of how to create a :cpp:class:`arb::context` using
:cpp:func:`make_context`.
.. container:: example-code
.. code-block:: cpp
#include <arbor/context.hpp>
// Construct a context that uses 1 thread and no GPU or MPI
auto context = arb::make_context();
// Construct a context that:
// * uses 8 threads in its thread pool.
// * does not use a GPU, regardless of whether one is available;
// * does not use MPI
arb::proc_allocation resources(8, -1);
auto context = arb::make_context(resources);
// Construct one that uses:
// * 4 threads and the first GPU.
// * MPI_COMM_WORLD for distributed computation.
arb::proc_allocation resources(4, 0);
auto mpi_context = arb::make_context(resources, MPI_COMM_WORLD)
Here is a more complicated example of creating a :cpp:class:`context` on a
system where support for GPU and MPI support are conditional.
.. container:: example-code
.. code-block:: cpp
#include <arbor/context.hpp>
#include <arbor/version.hpp> // for ARB_MPI_ENABLED
#include <arborenv/concurrency.hpp>
#include <arborenv/gpu_env.hpp>
int main(int argc, char** argv) {
try {
arb::proc_allocation resources;
// try to detect how many threads can be run on this system
resources.num_threads = arbenv::thread_concurrency();
// override thread count if the user set ARB_NUM_THREADS
if (auto nt = arbenv::get_env_num_threads()) {
resources.num_threads = nt;
}
#ifdef ARB_WITH_MPI
// initialize MPI
arbenv::with_mpi guard(argc, argv, false);
// assign a unique gpu to this rank if available
resources.gpu_id = arbenv::find_private_gpu(MPI_COMM_WORLD);
// create a distributed context
auto context = arb::make_context(resources, MPI_COMM_WORLD);
root = arb::rank(context) == 0;
#else
resources.gpu_id = arbenv::default_gpu();
// create a local context
auto context = arb::make_context(resources);
#endif
// Print a banner with information about hardware configuration
std::cout << "gpu: " << (has_gpu(context)? "yes": "no") << "\n";
std::cout << "threads: " << num_threads(context) << "\n";
std::cout << "mpi: " << (has_mpi(context)? "yes": "no") << "\n";
std::cout << "ranks: " << num_ranks(context) << "\n" << std::endl;
// run some simulations!
}
catch (std::exception& e) {
std::cerr << "exception caught in ring miniapp: " << e.what() << "\n";
return 1;
}
return 0;
}
Arbor Arbor
===== =====
.. image:: https://travis-ci.org/eth-cscs/arbor.svg?branch=master .. image:: https://travis-ci.org/arbor-sim/arbor.svg?branch=master
:target: https://travis-ci.org/eth-cscs/arbor :target: https://travis-ci.org/arbor-sim/arbor
What is Arbor? What is Arbor?
-------------- --------------
...@@ -26,10 +26,11 @@ Arbor is designed from the ground up for **many core** architectures: ...@@ -26,10 +26,11 @@ Arbor is designed from the ground up for **many core** architectures:
Features Features
-------- --------
We are actively developing `Arbor <https://github.com/eth-cscs/arbor>`_, improving performance and adding features. We are actively developing `Arbor <https://github.com/arbor-sim/arbor>`_, improving performance and adding features.
Some key features include: Some key features include:
* Optimized back ends for CUDA, KNL and AVX2 intrinsics. * Optimized back end for CUDA
* Optimized vector back ends for Intel (KNL, AVX, AVX2) and Arm (ARMv8-A NEON) intrinsics.
* Asynchronous spike exchange that overlaps compute and communication. * Asynchronous spike exchange that overlaps compute and communication.
* Efficient sampling of voltage and current on all back ends. * Efficient sampling of voltage and current on all back ends.
* Efficient implementation of all features on GPU. * Efficient implementation of all features on GPU.
...@@ -47,6 +48,7 @@ Some key features include: ...@@ -47,6 +48,7 @@ Some key features include:
model_intro model_intro
model_common model_common
model_hardware
model_recipe model_recipe
model_domdec model_domdec
model_simulation model_simulation
...@@ -59,6 +61,7 @@ Some key features include: ...@@ -59,6 +61,7 @@ Some key features include:
cpp_intro cpp_intro
cpp_common cpp_common
cpp_hardware
cpp_recipe cpp_recipe
cpp_domdec cpp_domdec
cpp_simulation cpp_simulation
......
...@@ -299,12 +299,12 @@ and `ARM options <https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html>`_. ...@@ -299,12 +299,12 @@ and `ARM options <https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html>`_.
cmake -DARB_ARCH=skylake-avx512 # skylake with avx512 (Xeon server) cmake -DARB_ARCH=skylake-avx512 # skylake with avx512 (Xeon server)
cmake -DARB_ARCH=knl # Xeon Phi KNL cmake -DARB_ARCH=knl # Xeon Phi KNL
# ARM Arm8a
cmake -DARB_ARCH=armv8-a
# IBM Power8 # IBM Power8
cmake -DARB_ARCH=power8 cmake -DARB_ARCH=power8
# IBM Arm8a
cmake -DARB_ARCH=armv8-a
.. _vectorize: .. _vectorize:
Vectorization Vectorization
...@@ -321,7 +321,7 @@ for the architecture, enabling ``ARB_VECTORIZE`` will lead to a compilation erro ...@@ -321,7 +321,7 @@ for the architecture, enabling ``ARB_VECTORIZE`` will lead to a compilation erro
With this flag set, the library will use architecture-specific vectorization intrinsics With this flag set, the library will use architecture-specific vectorization intrinsics
to implement these kernels. Arbor currently has vectorization support for x86 architectures to implement these kernels. Arbor currently has vectorization support for x86 architectures
with AVX, AVX2 or AVX512 ISA extensions. with AVX, AVX2 or AVX512 ISA extensions, and for ARM architectures with support for AArch64 NEON intrinsincs (first available on ARMv8-A).
.. _gpu: .. _gpu:
......
...@@ -18,17 +18,3 @@ A *load balancer* generates the domain decomposition using the ...@@ -18,17 +18,3 @@ A *load balancer* generates the domain decomposition using the
model recipe and a description of the available computational resources on which the model will run described by an execution context. model recipe and a description of the available computational resources on which the model will run described by an execution context.
Currently Arbor provides one load balancer and more will be added over time. Currently Arbor provides one load balancer and more will be added over time.
Hardware
--------
*Local resources* are locally available computational resources, specifically the number of hardware threads and the number of GPUs.
An *allocation* enumerates the computational resources to be used for a simulation, typically a subset of the resources available on a physical hardware node.
Execution Context
-----------------
An *execution context* contains the local thread pool, and optionally the GPU state and MPI communicator, if available. Users of the library configure contexts, which are passed to Arbor methods and types.
See :ref:`cppdomdec` for documentation of the C++ interface for domain decomposition.
.. _modelhardware:
Hardware
========
*Local resources* are locally available computational resources, specifically the number of hardware threads and the number of GPUs.
An *allocation* enumerates the computational resources to be used for a simulation, typically a subset of the resources available on a physical hardware node.
.. Note::
New users can find using contexts a little verbose.
The design is very deliberate, to allow fine-grained control over which
computational resources an Arbor simulation should use.
As a result Arbor is much easier to integrate into workflows that
run multiple applications or libraries on the same node, because
Arbor has a direct API for using on node resources (threads and GPU)
and distributed resources (MPI) that have been partitioned between
applications/libraries.
Execution Context
-----------------
An *execution context* contains the local thread pool, and optionally the GPU state and MPI communicator, if available. Users of the library configure contexts, which are passed to Arbor methods and types.
See :ref:`cppdomdec` for documentation of the C++ interface for domain decomposition.
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment