-
Thorsten Hater authored
The core issue here is to add units to the user facing API. I decided on using the LLNL/units library, which offers conversion and checking at runtime. Runtime is a requirement -- as much as I love static guarantees --, but keeping the interface uniform between Python and C++ is a must. While setting this up, I noticed the severe lack of IDE/LSP support for Arbor, so I added typing stubs using https://github.com/sizmailov/pybind11-stubgen . The conjunction of typing and units exposed misuse of pybind11 in several places, so next I had to massage the ordering of bindings, adjust the specification of default arguments, and add the odd missing binding. The schedule/event generator interface was tightened up, hiding the `*_impl` structs and exposing only the type erased `schedule` object. That in turn required de-generification of the Poisson schedule. Now, Mersenne twister is the only choice and I will remove that later on for the CBRNG we are already using elsewhere. Currently, units are used for: - [X] simulation - [X] schedule/generator - [x] paintables - [X] placeables - [X] iclamp - [X] threshold - [X] connections - [X] gap junctions Adding units to mechanism interfaces is _interesting_ but requires more work and thought, so I'll defer that to a later point in time. We'd need to adjust modcc to expose and **check** units and devise a scheme to handle missing units. Generic TODOs; some might spin off into separate issues. - [x] ~~rename py::iclamp OR cpp::i_clamp for consistency~~ covered by #2239 - [x] use scale/base for iexpr paintables for consistency with scaled_mech - [x] ~~Use CBRNG for Poisson schedule~~ covered by #2243 - [ ] Automate stub generation. A wishlist item, requires installing extra software. - [x] Properly integrate units w/ spack. NB. Units doesn't have a spackage. Closes #1983 Closes #2032 --------- Co-authored-by:
boeschf <48126478+boeschf@users.noreply.github.com>
Unverified60445a43
GPU and profiling
In this example, the ring network created in an :ref:`earlier tutorial <tutorialnetworkring>` will be used to run the model with a GPU. In addition, it is shown how to profile the performance difference. Only the differences with that tutorial will be described.
Note
Concepts covered in this example:
- Building a :py:class:`arbor.context` that'll use a GPU. This requires that you have built Arbor with GPU support enabled.
- Build a :class:`arbor.domain_decomposition` and provide a :class:`arbor.partition_hint`.
- Profile an Arbor simulation using :class:`arbor.meter_manager`.
The hardware context
An :ref:`execution context <modelcontext>` describes the hardware resources on which the simulation will run. It contains the thread pool used to parallelise work on the local CPU, and optionally describes GPU resources and the MPI communicator for distributed simulations. In some other examples, the :class:`arbor.single_cell_model` object created the execution context :class:`arbor.context` behind the scenes. The details of the execution context can be customized by the user. We may specify the number of threads in the thread pool; determine the id of the GPU to be used; or create our own MPI communicator.
Step (11) creates a hardware context where we set the
:py:attr:`~arbor.proc_allocation.gpu_id`. This requires that you have built
Arbor manually, with GPU support (see :ref:`here <in_python_adv>` how to do
that). On a regular consumer device with a single GPU, the index you should pass
is 0
. Change the value to run the example with and without GPU. The number
of threads :class:`~arbor.context.threads` are (when no MPI is used) set to
:py:func:`arbor.env.thread_concurrency`. This value corresponds to the number of
locally available threads as best as can be established by Arbor at the start of
the program.
Note
If you use GPUs in combination with MPI, consider using :py:func:`~arbor.env.find_private_gpu`.
Profiling
Arbor comes with a :class:`arbor.meter_manager` to help you profile your
simulations. In this case, you can run the example with gpu_id=None
and
gpu_id=0
and observe the difference with the :class:`~arbor.meter_manager`.
If you are interested in more detailled report, Arbor also offers a region based
profiler which is aimed at developers and must be enabled at build time.
Step (12) sets up the meter manager and starts it using the (only) context. This way, only Arbor related execution is measured, not Python code.
Step (13) instantiates the recipe and sets the first checkpoint on the meter manager. We now have the time it took to construct the recipe.
The domain decomposition
The domain decomposition describes the distribution of the cells over the available computational resources. The :class:`arbor.single_cell_model` also handled that without our knowledge in the previous examples. Now, we have to define it ourselves.
The :class:`arbor.domain_decomposition` class can be manually created by the user, by deciding which cells go on which ranks. Or we can use a load balancer that can partition the cells across ranks according to some rules. Arbor provides :class:`arbor.partition_load_balance`, which, using the recipe and execution context, creates the :class:`arbor.domain_decomposition` object for us.
A way to customize :class:`arbor.partition_load_balance` is by providing a :class:`arbor.partition_hint`. They let you configure how cells are distributed over the resources in the :class:`~arbor.context`, but without requiring you to know the precise configuration of a :class:`~arbor.context` up front. Whether you run your simulation on your laptop CPU, desktop GPU, CPU cluster of GPU cluster, using :class:`partition hints<arbor.partition_hint>` you can just say: use GPUs, if available. You only have to change the :class:`~arbor.context` to actually define which hardware Arbor will execute on.
Step (14) creates a :class:`arbor.partition_hint`, and tells it to put 1000 cells in a groups allocated to GPUs, and to prefer the utilisation of the GPU if present. In fact, the default distribution strategy of :class:`arbor.partition_load_balance` already spreads out cells as evenly as possible over CPUs, and groups (up to 1000) on GPUs, so strictly speaking it was not necessary to give that part of the hint. Lastly, a dictionary is created with which hints are assigned to a particular :class:`arbor.cell_kind`. Different kinds may favor different execution, hence the option. In this simulation, there are only :class:`arbor.cell_kind.cable`, so we assign the hint to that kind.
Step (15) creates a :class:`arbor.partition_load_balance` with the recipe, context and hints created above. Another checkpoint will help us understand how long creating the load balancer took.
The simulation
Step (16) creates a :class:`arbor.simulation`, sets the spike recorders to record, creates a :term:`handle` to their eventual results and makes another checkpoint.
The execution
Step (17) runs the simulation. Since we have more cells this time, which are
connected in series, it will take some time for the action potential to
propagate. In the :ref:`ring network <tutorialnetworkring>` we could see it
takes about 5 ms for the signal to propagate through one cell, so let's set the
runtime to 5*ncells
. Then, another checkpoint, so that we'll know how long
the simulation took.
The results
The scientific results should be similar, other than number of cells, to those in :ref:`ring network <tutorialnetworkring>`, so we'll not discuss them here. Let's turn our attention to the :class:`~arbor.meter_manager`.
Step (18) shows how :class:`arbor.meter_report` can be used to read out the :class:`~arbor.meter_manager`. It generates a table with the time between checkpoints. As an example, the following table is the result of a run on a 2019 laptop CPU:
---- meters -------------------------------------------------------------------------------
meter time(s) memory(MB)
-------------------------------------------------------------------------------------------
recipe-create 0.000 0.059
load-balance 0.000 0.007
simulation-init 0.012 0.662
simulation-run 0.037 0.319
meter-total 0.049 1.048
The full code
You can find the full code of the example at python/examples/network_ring_gpu.py
.