diff --git a/doc/cpp_common.rst b/doc/cpp_common.rst index 7b6ef66be3e0d96432ccff1e79f06e1184af1d3d..a7ff50029fbdb330bffc4f015bf925c812661982 100644 --- a/doc/cpp_common.rst +++ b/doc/cpp_common.rst @@ -115,3 +115,40 @@ Probes .. cpp:member:: util::any address Cell-type specific location info, specific to cell kind of ``id.gid``. + +Utility Wrappers and Containers +-------------------------------- + +.. cpp:namespace:: arb::util + + +.. cpp:class:: template <typename T> optional + + A wrapper around a contained value of type :cpp:type:`T`, that may or may not be set. + A faithful copy of the C++17 ``std::optional`` type. + See the online C++ standard documentation + `<https://en.cppreference.com/w/cpp/utility/optional>`_ + for more information. + +.. cpp:class:: any + + A container for a single value of any type that is copy constructable. + Used in the Arbor API where a type of a value passed to or from the API + is decided at run time. + + A faithful copy of the C++17 ``std::any`` type. + See the online C++ standard documentation + `<https://en.cppreference.com/w/cpp/utility/any>`_ + for more information. + + The :cpp:any:`arb::util` namespace also implementations of the + :cpp:any:`any_cast`, :cpp:any:`make_any` and :cpp:any:`bad_any_cast` + helper functions and types from C++17. + +.. cpp:class:: unique_any + + Equivalent to :cpp:class:`util::any`, except that: + * it can store any type that is move constructable; + * it is move only, that is it can't be copied. + + diff --git a/doc/cpp_domdec.rst b/doc/cpp_domdec.rst index c109292c7cd05566111b805d4940e9b4905b3a86..7f5a5813397a631c73ef12f5e6e954b6eb333ae2 100644 --- a/doc/cpp_domdec.rst +++ b/doc/cpp_domdec.rst @@ -3,207 +3,7 @@ Domain Decomposition ==================== -The C++ API for defining hardware resources and partitioning a model over -distributed and local hardware is described here. -Arbor provides two library APIs for working with hardware resources: - -* The core *libarbor* is used to *describe* the hardware resources - and their contexts for use in Arbor simulations. -* The *libarborenv* provides an API for querying available hardware - resources (e.g. the number of available GPUs), and initializing MPI. - - -Managing Hardware ------------------ - -The *libarborenv* API for querying and managing hardware resources is in the -:cpp:any:`arbenv` namespace. This functionality is in a seperate -library because the main Arbor library should only -present an interface for running simulations on hardware resources provided -by the calling application. As such, it should not provide access to how -it manages hardware resources internally, or place restrictions on how -the calling application selects or manages resources such as GPUs and MPI communicators. - -However, for the purpose of writing tests, examples, benchmarks and validation -tests, functionality for detecting GPUs, managing MPI lifetimes and the like -is neccesary. This functionality is kept in a separate library to ensure -separation of concerns, and to provide examples of quality implementations -of such functionality for users of the library to reuse. - -.. cpp:namespace:: arbenv - -.. cpp:function:: arb::optional<int> get_env_num_threads() - - Tests whether the number of threads to use has been set in an environment variable. - First checks ``ARB_NUM_THREADS``, and if that is not set checks ``OMP_NUM_THREADS``. - - Return value: - - * no value: the :cpp:any:`optional` return value contains no value if the - * has value: the number of threads set by the environment variable. - - Exceptions: - - * throws :cpp:any:`std::runtime_error` if environment variable set with invalid - number of threads. - - .. container:: example-code - - .. code-block:: cpp - - if (auto nt = arbenv::get_env_num_threads()) { - std::cout << "requested " << nt.value() << "threads \n"; - } - else { - std::cout << "no enviroment variable set\n"; - } - -.. cpp:function:: int thread_concurrency() - - Attempts to detect the number of available CPU cores. Returns 1 if unable to detect - the number of cores. - - .. container:: example-code - - .. code-block:: cpp - - // Set num_threads to value from environment variable if set, - // otherwise set it to the available number of cores. - int num_threads = 0; - if (auto nt = arbenv::get_env_num_threads()) { - num_threads = nt.value(); - } - else { - num_threads = arbenv::thread_concurrency(); - } - -.. cpp:function:: int default_gpu() - - Detects if a GPU is available, and returns the - - Return value: - - * non-negative value: if a GPU is available, the index of the selected GPU is returned. The index will be in the range ``[0, num_gpus)`` where ``num_gpus`` is the number of GPUs detected using the ``cudaGetDeviceCount`` `CUDA API call <https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html>`_. - * -1: if no GPU available, or if Arbor was built without GPU support. - - .. container:: example-code - - .. code-block:: cpp - - if (arbenv::default_gpu()>-1) {} - std::cout << "a GPU is available\n"; - } - -.. cpp:function:: int find_private_gpu(MPI_Comm comm) - - stuff. - -.. cpp:class:: with_mpi - - Purpose and functionality - - Constructor - - Usage notes. - -Blurb for the *libarbor* - -.. cpp:namespace:: arb - -.. cpp:class:: proc_allocation - - Enumerates the computational resources to be used for a simulation, typically a - subset of the resources available on a physical hardware node. - - .. container:: example-code - - .. code-block:: cpp - - // Default construction uses all detected cores/threads, and the first GPU, if available. - arb::proc_allocation resources; - - // Remove any GPU from the resource description. - resources.gpu_id = -1; - - - .. cpp:function:: proc_allocation() = default - - Sets the number of threads to the number detected by :cpp:func:`get_local_resources`, and - chooses either the first available GPU, or no GPU if none are available. - - .. cpp:function:: proc_allocation(unsigned threads, int gpu_id) - - Constructor that sets the number of :cpp:var:`threads` and selects :cpp:var:`gpus` available. - - .. cpp:member:: unsigned num_threads - - The number of CPU threads available. - - .. cpp:member:: int gpu_id - - The identifier of the GPU to use. - The gpu id corresponds to the ``int device`` parameter used by CUDA API calls - to identify gpu devices. - Set to -1 to indicate that no GPU device is to be used. - See ``cudaSetDevice`` and ``cudaDeviceGetAttribute`` provided by the - `CUDA API <https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html>`_. - - .. cpp:function:: bool has_gpu() const - - Indicates whether a GPU is selected (i.e. whether :cpp:member:`gpu_id` is ``-1``). - -Execution Context ------------------ - -The :cpp:class:`proc_allocation` class enumerates the hardware resources on the local hardware -to use for a simulation. - -.. cpp:namespace:: arb - -.. cpp:class:: context - - A handle for the interfaces to the hardware resources used in a simulation. - A :cpp:class:`context` contains the local thread pool, and optionally the GPU state - and MPI communicator, if available. Users of the library do not directly use the functionality - provided by :cpp:class:`context`, instead they configure contexts, which are passed to - Arbor methods and types. - -.. cpp:function:: context make_context() - - Local context that uses all detected threads and a GPU if any are available. - -.. cpp:function:: context make_context(proc_allocation alloc) - - Local context that uses the local resources described by :cpp:var:`alloc`. - -.. cpp:function:: context make_context(proc_allocation alloc, MPI_Comm comm) - - A context that uses the local resources described by :cpp:var:`alloc`, and - uses the MPI communicator :cpp:var:`comm` for distributed calculation. - - -Here are some examples of how to create a :cpp:class:`arb::context`: - - .. container:: example-code - - .. code-block:: cpp - - #include <arbor/context.hpp> - - // Construct a non-distributed context that uses all detected available resources. - auto context = arb::make_context(); - - // Construct a context that: - // * does not use a GPU, reguardless of whether one is available; - // * uses 8 threads in its thread pool. - arb::proc_allocation resources(8, -1); - auto context = arb::make_context(resources); - - // Construct a context that: - // * uses all available local hardware resources; - // * uses the standard MPI communicator MPI_COMM_WORLD for distributed computation. - arb::proc_allocation resources; // defaults to all detected local resources - auto context = arb::make_context(resources, MPI_COMM_WORLD); +The C++ API for partitioning a model over distributed and local hardware is described here. Load Balancers -------------- @@ -217,11 +17,11 @@ distributed with MPI communication. The returned :cpp:class:`domain_decompositio describes the cell groups on the local MPI rank. .. Note:: - The :cpp:class:`domain_decomposition` type is simple and - independent of any load balancing algorithm, so users can supply their - own domain decomposition without using one of the built-in load balancers. + The :cpp:class:`domain_decomposition` type is + independent of any load balancing algorithm, so users can define a + domain decomposition directly, instead of generating it with a load balancer. This is useful for cases where the provided load balancers are inadequate, - and when the user has specific insight into running their model on the + or when the user has specific insight into running their model on the target computer. .. cpp:namespace:: arb diff --git a/doc/cpp_dry_run.rst b/doc/cpp_dry_run.rst index 565e679092f7e3789436c2bd3ca2e203cc7f701b..08b8c7f48b74e3e6b14c8ecc7821033d7cd5ff4d 100644 --- a/doc/cpp_dry_run.rst +++ b/doc/cpp_dry_run.rst @@ -74,11 +74,11 @@ To support dry-run mode we use the following classes: .. Note:: While this class inherits from :cpp:class:`arb::recipe`, it breaks one of its implicit rules: it allows connection from gids greater than the total number of cells in a recipe, - :cpp:var:`ncells`. + :cpp:any:`ncells`. :cpp:class:`arb::tile` describes the model on a single domain containing :cpp:expr:`num_cells = - num_cells_per_tile` cells, which is to be duplicated over :cpp:var:`num_ranks` - domains in dry-run mode. It contains information about :cpp:var:`num_ranks` which is provided + num_cells_per_tile` cells, which is to be duplicated over :cpp:any:`num_ranks` + domains in dry-run mode. It contains information about :cpp:any:`num_ranks` which is provided by the following function: .. cpp:function:: cell_size_type num_tiles() const diff --git a/doc/cpp_hardware.rst b/doc/cpp_hardware.rst new file mode 100644 index 0000000000000000000000000000000000000000..b2a6783c920580f62c7cc592f2d0325dd87f8b03 --- /dev/null +++ b/doc/cpp_hardware.rst @@ -0,0 +1,393 @@ +.. _cpphardware: + +Hardware Management +=================== + +Arbor provides two library APIs for working with hardware resources: + +* The core *libarbor* is used to *describe* the hardware resources + and their contexts for use in Arbor simulations. +* The *libarborenv* provides an API for querying available hardware + resources (e.g. the number of available GPUs), and initializing MPI. + + +libarborenv +------------------- + +The *libarborenv* API for querying and managing hardware resources is in the +:cpp:any:`arbenv` namespace. +This functionality is kept in a separate library to enforce +separation of concerns, so that users have full control over how hardware resources +are selected, either using the functions and types in *libarborenv*, or writing their +own code for managing MPI, GPUs, and thread counts. + +.. cpp:namespace:: arbenv + +.. cpp:function:: arb::util::optional<int> get_env_num_threads() + + Tests whether the number of threads to use has been set in an environment variable. + First checks ``ARB_NUM_THREADS``, and if that is not set checks ``OMP_NUM_THREADS``. + + Return value: + + * **no value**: the :cpp:any:`optional` return value contains no value if the + no thread count was specified by an environment variable. + * **has value**: the number of threads set by the environment variable. + + Throws: + + * throws :cpp:any:`std::runtime_error` if environment variable set with invalid + number of threads. + + .. container:: example-code + + .. code-block:: cpp + + #include <arborenv/concurrency.hpp> + + if (auto nt = arbenv::get_env_num_threads()) { + std::cout << "requested " << nt.value() << "threads \n"; + } + else { + std::cout << "no environment variable set\n"; + } + +.. cpp:function:: int thread_concurrency() + + Attempts to detect the number of available CPU cores. Returns 1 if unable to detect + the number of cores. + + .. container:: example-code + + .. code-block:: cpp + + #include <arborenv/concurrency.hpp> + + // Set num_threads to value from environment variable if set, + // otherwise set it to the available number of cores. + int num_threads = 0; + if (auto nt = arbenv::get_env_num_threads()) { + num_threads = nt.value(); + } + else { + num_threads = arbenv::thread_concurrency(); + } + +.. cpp:function:: int default_gpu() + + Returns the integer identifier of the first available GPU, if a GPU is available + + Return value: + + * **non-negative value**: if a GPU is available, the index of the selected GPU is returned. The index will be in the range ``[0, num_gpus)`` where ``num_gpus`` is the number of GPUs detected using the ``cudaGetDeviceCount`` `CUDA API call <https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html>`_. + * **-1**: if no GPU available, or if Arbor was built without GPU support. + + .. container:: example-code + + .. code-block:: cpp + + #include <arborenv/gpu_env.hpp> + + if (arbenv::default_gpu()>-1) {} + std::cout << "a GPU is available\n"; + } + +.. cpp:function:: int find_private_gpu(MPI_Comm comm) + + A helper function that assigns a unique GPU to every MPI rank. + + .. Note:: + + Arbor allows at most one GPU per MPI rank, and furthermore requires that + an MPI rank has exclusive access to a GPU, i.e. two MPI ranks can not + share a GPU. + The task of assigning a unique GPU to each rank when more than one rank + have access to the same GPU(s). + An example use case is on systems with "fat" nodes with multiple GPUs + per node, in which case Arbor should be run with multiple MPI ranks + per node. + Uniquely assigning GPUs is quite difficult, and this function provides + what we feel is a robust implementation. + + All MPI ranks in the MPI communicator :cpp:any:`comm` should call to + avoid a deadlock. + + Return value: + + * **non-negative integer**: the identifier of the GPU assigned to this rank. + * **-1**: no GPU was available for this MPI rank. + + Throws: + + * :cpp:any:`std::runtime_error`: if there was an error in the CUDA runtime + on the local or remote MPI ranks, i.e. if one rank throws, all ranks + will throw. + +.. cpp:class:: with_mpi + + The :cpp:class:`with_mpi` type is a simple RAII scoped guard for MPI initialization + and finalization. On creation :cpp:class:`with_mpi` will call :cpp:any:`MPI_Init_thread` + to initialize MPI with the minimum level thread support required by Arbor, that is + ``MPI_THREAD_SERIALIZED``. When it goes out of scope it will automatically call + :cpp:any:`MPI_Finalize`. + + .. cpp:function:: with_mpi(int& argcp, char**& argvp, bool fatal_errors = true) + + The constructor takes the :cpp:any:`argc` and :cpp:any:`argv` arguments + passed to main of the calling application, and an additional flag + :cpp:any:`fatal_errors` that toggles whether errors in MPI API calls + should return error codes or terminate. + + .. Warning:: + + Handling exceptions is difficult in MPI applications, and it is the users + responsibility to do so. + + The :cpp:class:`with_mpi` scope guard attempts to facilitate error reporting of + uncaught exceptions, particularly in the case where one rank throws an exception, + while the other ranks continue executing. In this case there would be a deadlock + if the rank with the exception attempts to call :cpp:any:`MPI_Finalize` and + other ranks are waiting in other MPI calls. If this happens inside a try-catch + block, the deadlock stops the exception from being handled. + For this reason the destructor of :cpp:class:`with_mpi` only calls + :cpp:any:`MPI_Finalize` if there are no uncaught exceptions. + This isn't perfect because the other MPI ranks still deadlock, + however it gives the exception handling code to report the error for debugging. + + An example workflow that uses the MPI scope guard. Note that this code will + print the exception error message in the case where only one MPI rank threw + an exception, though it would either then deadlock or exit with an error code + that one or more MPI ranks exited without calling :cpp:any:`MPI_Finalize`. + + .. container:: example-code + + .. code-block:: cpp + + #include <exception> + #include <iostream> + + #include <arborenv/with_mpi.hpp> + + int main(int argc, char** argv) { + try { + // Constructing guard will initialize MPI with a + // call to MPI_Init_thread() + arbenv::with_mpi guard(argc, argv, false); + + // Do some work with MPI here + + // When leaving this scope, the destructor of guard will + // call MPI_Finalize() + } + catch (std::exception& e) { + std::cerr << "error: " << e.what() << "\n"; + return 1; + } + return 0; + } + +libarbor +------------------- + +The core Arbor library *libarbor* provides an API for: + + * prescribing which hardware resources are to be used by a + simulation using :cpp:class:`arb::proc_allocation`. + * opaque handles to hardware resources used by simulations called + :cpp:class:`arb::context`. + +.. cpp:namespace:: arb + +.. cpp:class:: proc_allocation + + Enumerates the computational resources on a node to be used for simulation, + specifically the number of threads and identifier of a GPU if available. + + .. Note:: + + Each MPI rank in a distributed simulation uses a :cpp:class:`proc_allocation` + to describe the subset of resources on its node that it will use. + + .. container:: example-code + + .. code-block:: cpp + + #include <arbor/context.hpp> + + // default: 1 thread and no GPU selected + arb::proc_allocation resources; + + // 8 threads and no GPU + arb::proc_allocation resources(8, -1); + + // 4 threads and the first available GPU + arb::proc_allocation resources(8, 0); + + // Construct with + auto num_threads = arbenv::thread_concurrency(); + auto gpu_id = arbenv::default_gpu(); + arb::proc_allocation resources(num_threads, gpu_id); + + + .. cpp:function:: proc_allocation() = default + + By default selects one thread and no GPU. + + .. cpp:function:: proc_allocation(unsigned threads, int gpu_id) + + Constructor that sets the number of :cpp:var:`threads` and the id :cpp:var:`gpu_id` of + the + + .. cpp:member:: unsigned num_threads + + The number of CPU threads available. + + .. cpp:member:: int gpu_id + + The identifier of the GPU to use. + The gpu id corresponds to the ``int device`` parameter used by CUDA API calls + to identify gpu devices. + Set to -1 to indicate that no GPU device is to be used. + See ``cudaSetDevice`` and ``cudaDeviceGetAttribute`` provided by the + `CUDA API <https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html>`_. + + .. cpp:function:: bool has_gpu() const + + Indicates whether a GPU is selected (i.e. whether :cpp:member:`gpu_id` is ``-1``). + +.. cpp:namespace:: arb + +.. cpp:class:: context + + An opaque handle for the hardware resources used in a simulation. + A :cpp:class:`context` contains a thread pool, and optionally the GPU state + and MPI communicator. Users of the library do not directly use the functionality + provided by :cpp:class:`context`, instead they create contexts, which are passed to + Arbor interfaces for domain decomposition and simulation. + +Arbor contexts are created by calling :cpp:func:`make_context`, which returns an initialized +context. There are two versions of :cpp:func:`make_context`, for creating contexts +with and without distributed computation with MPI respectively. + +.. cpp:function:: context make_context(proc_allocation alloc=proc_allocation()) + + Create a local :cpp:class:`context`, with no distributed/MPI, + that uses local resources described by :cpp:any:`alloc`. + By default it will create a context with one thread and no GPU. + +.. cpp:function:: context make_context(proc_allocation alloc, MPI_Comm comm) + + Create a distributed :cpp:class:`context`. + A context that uses the local resources described by :cpp:any:`alloc`, and + uses the MPI communicator :cpp:var:`comm` for distributed calculation. + +Contexts can be queried for information about which features a context has enabled, +whether it has a GPU, how many threads are in its thread pool, using helper functions. + +.. cpp:function:: bool has_gpu(const context&) + + Query if the context has a GPU. + +.. cpp:function:: unsigned num_threads(const context&) + + Query the number of threads in a context's thread pool + +.. cpp:function:: bool has_mpi(const context&) + + Query if the context has an MPI communicator. + +.. cpp:function:: unsigned num_ranks(const context&) + + Query the number of distributed ranks. If the context has an MPI + communicator, return is equivalent to :cpp:any:`MPI_Comm_size`. + If the communicator has no MPI, returns 1. + +.. cpp:function:: unsigned rank(const context&) + + Query the rank of the calling rand. If the context has an MPI + communicator, return is equivalent to :cpp:any:`MPI_Comm_rank`. + If the communicator has no MPI, returns 0. + +Here are some simple examples of how to create a :cpp:class:`arb::context` using +:cpp:func:`make_context`. + +.. container:: example-code + + .. code-block:: cpp + + #include <arbor/context.hpp> + + // Construct a context that uses 1 thread and no GPU or MPI + auto context = arb::make_context(); + + // Construct a context that: + // * uses 8 threads in its thread pool. + // * does not use a GPU, regardless of whether one is available; + // * does not use MPI + arb::proc_allocation resources(8, -1); + auto context = arb::make_context(resources); + + // Construct one that uses: + // * 4 threads and the first GPU. + // * MPI_COMM_WORLD for distributed computation. + arb::proc_allocation resources(4, 0); + auto mpi_context = arb::make_context(resources, MPI_COMM_WORLD) + +Here is a more complicated example of creating a :cpp:class:`context` on a +system where support for GPU and MPI support are conditional. + +.. container:: example-code + + .. code-block:: cpp + + #include <arbor/context.hpp> + #include <arbor/version.hpp> // for ARB_MPI_ENABLED + + #include <arborenv/concurrency.hpp> + #include <arborenv/gpu_env.hpp> + + int main(int argc, char** argv) { + try { + arb::proc_allocation resources; + + // try to detect how many threads can be run on this system + resources.num_threads = arbenv::thread_concurrency(); + + // override thread count if the user set ARB_NUM_THREADS + if (auto nt = arbenv::get_env_num_threads()) { + resources.num_threads = nt; + } + + #ifdef ARB_WITH_MPI + // initialize MPI + arbenv::with_mpi guard(argc, argv, false); + + // assign a unique gpu to this rank if available + resources.gpu_id = arbenv::find_private_gpu(MPI_COMM_WORLD); + + // create a distributed context + auto context = arb::make_context(resources, MPI_COMM_WORLD); + root = arb::rank(context) == 0; + #else + resources.gpu_id = arbenv::default_gpu(); + + // create a local context + auto context = arb::make_context(resources); + #endif + + // Print a banner with information about hardware configuration + std::cout << "gpu: " << (has_gpu(context)? "yes": "no") << "\n"; + std::cout << "threads: " << num_threads(context) << "\n"; + std::cout << "mpi: " << (has_mpi(context)? "yes": "no") << "\n"; + std::cout << "ranks: " << num_ranks(context) << "\n" << std::endl; + + // run some simulations! + } + catch (std::exception& e) { + std::cerr << "exception caught in ring miniapp: " << e.what() << "\n"; + return 1; + } + + return 0; + } + diff --git a/doc/index.rst b/doc/index.rst index 6cbf8db699b232d05665dacfb47a796fbb5bd8dd..b0cdf6da8ec63aa73be63a5469f267536a6226fd 100644 --- a/doc/index.rst +++ b/doc/index.rst @@ -1,8 +1,8 @@ Arbor ===== -.. image:: https://travis-ci.org/eth-cscs/arbor.svg?branch=master - :target: https://travis-ci.org/eth-cscs/arbor +.. image:: https://travis-ci.org/arbor-sim/arbor.svg?branch=master + :target: https://travis-ci.org/arbor-sim/arbor What is Arbor? -------------- @@ -26,10 +26,11 @@ Arbor is designed from the ground up for **many core** architectures: Features -------- -We are actively developing `Arbor <https://github.com/eth-cscs/arbor>`_, improving performance and adding features. +We are actively developing `Arbor <https://github.com/arbor-sim/arbor>`_, improving performance and adding features. Some key features include: - * Optimized back ends for CUDA, KNL and AVX2 intrinsics. + * Optimized back end for CUDA + * Optimized vector back ends for Intel (KNL, AVX, AVX2) and Arm (ARMv8-A NEON) intrinsics. * Asynchronous spike exchange that overlaps compute and communication. * Efficient sampling of voltage and current on all back ends. * Efficient implementation of all features on GPU. @@ -47,6 +48,7 @@ Some key features include: model_intro model_common + model_hardware model_recipe model_domdec model_simulation @@ -59,6 +61,7 @@ Some key features include: cpp_intro cpp_common + cpp_hardware cpp_recipe cpp_domdec cpp_simulation diff --git a/doc/install.rst b/doc/install.rst index c091f5ab083b45e24b7509f17810a63d0138eb81..6e0162cd0677fe5c9b5e0dd41791f7b8b4eae170 100644 --- a/doc/install.rst +++ b/doc/install.rst @@ -299,12 +299,12 @@ and `ARM options <https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html>`_. cmake -DARB_ARCH=skylake-avx512 # skylake with avx512 (Xeon server) cmake -DARB_ARCH=knl # Xeon Phi KNL + # ARM Arm8a + cmake -DARB_ARCH=armv8-a + # IBM Power8 cmake -DARB_ARCH=power8 - # IBM Arm8a - cmake -DARB_ARCH=armv8-a - .. _vectorize: Vectorization @@ -321,7 +321,7 @@ for the architecture, enabling ``ARB_VECTORIZE`` will lead to a compilation erro With this flag set, the library will use architecture-specific vectorization intrinsics to implement these kernels. Arbor currently has vectorization support for x86 architectures -with AVX, AVX2 or AVX512 ISA extensions. +with AVX, AVX2 or AVX512 ISA extensions, and for ARM architectures with support for AArch64 NEON intrinsincs (first available on ARMv8-A). .. _gpu: diff --git a/doc/model_domdec.rst b/doc/model_domdec.rst index 254f23364724943d03551c55c7e22c16ee36fa8e..cce522dd57c58ef3ca1ff441146f45c8a76f61eb 100644 --- a/doc/model_domdec.rst +++ b/doc/model_domdec.rst @@ -18,17 +18,3 @@ A *load balancer* generates the domain decomposition using the model recipe and a description of the available computational resources on which the model will run described by an execution context. Currently Arbor provides one load balancer and more will be added over time. - -Hardware --------- - -*Local resources* are locally available computational resources, specifically the number of hardware threads and the number of GPUs. - -An *allocation* enumerates the computational resources to be used for a simulation, typically a subset of the resources available on a physical hardware node. - -Execution Context ------------------ - -An *execution context* contains the local thread pool, and optionally the GPU state and MPI communicator, if available. Users of the library configure contexts, which are passed to Arbor methods and types. - -See :ref:`cppdomdec` for documentation of the C++ interface for domain decomposition. diff --git a/doc/model_hardware.rst b/doc/model_hardware.rst new file mode 100644 index 0000000000000000000000000000000000000000..dff3281be12791906baef56f9060c583f8994ee2 --- /dev/null +++ b/doc/model_hardware.rst @@ -0,0 +1,27 @@ +.. _modelhardware: + +Hardware +======== + +*Local resources* are locally available computational resources, specifically the number of hardware threads and the number of GPUs. + +An *allocation* enumerates the computational resources to be used for a simulation, typically a subset of the resources available on a physical hardware node. + +.. Note:: + + New users can find using contexts a little verbose. + The design is very deliberate, to allow fine-grained control over which + computational resources an Arbor simulation should use. + As a result Arbor is much easier to integrate into workflows that + run multiple applications or libraries on the same node, because + Arbor has a direct API for using on node resources (threads and GPU) + and distributed resources (MPI) that have been partitioned between + applications/libraries. + + +Execution Context +----------------- + +An *execution context* contains the local thread pool, and optionally the GPU state and MPI communicator, if available. Users of the library configure contexts, which are passed to Arbor methods and types. + +See :ref:`cppdomdec` for documentation of the C++ interface for domain decomposition.