Skip to content
Snippets Groups Projects
Select Git revision
  • 57e08d895fb1c0493149e941370482cad3f744dc
  • master default protected
  • image_build
  • update-arbor-0.10.0
  • experimental_rel
  • spack_v0.22.1
  • ebrains-24-04
  • update-readme
  • create-module-file
  • add-nestml-tests
  • feat_add_py-norse
  • update-libneuroml
  • update-bluebrain-packages
  • feat_arbor_install_all_binaries
  • ebrains-23.09-jsc-site-config
  • spack-v0.20.0
  • ebrains-23-09-spack-v0.19.2
  • ebrains-23-09
  • nestml-source-distribution
  • ebrains-23-06
  • ebrains-23-02
21 results

load_sim_tools.sh

Blame
  • simd_api.rst 32.43 KiB

    SIMD Classes

    The purpose of the SIMD classes is to abstract and consolidate the use of compiler intrinsics for the manipulation of architecture-specific vector (SIMD) values.

    The implementation is rather loosely based on the data-parallel vector types proposal P0214R6 for the C++ Parallelism TS 2.

    Unless otherwise specified, all classes, namespaces and top-level functions described below are all within the top-level arb::simd namespace.

    Example usage

    The following code performs an element-wise vector product, storing only non-zero values in the resultant array.

    #include <simd/simd.hpp>
    using namespace arb::simd;
    
    void product_nonzero(int n, const double* a, const double* b, double* result) {
        constexpr int N = simd_abi::native_width<double>::value;
        using simd = simd<double, N>;
        using mask = simd::simd_mask;
    
        int i = 0;
        for (; i+N<=n; i+=N) {
            auto vp = simd(a+i)*simd(b+i);
            where(vp!=0, vp).copy_to(result+i);
        }
    
        int tail = n-i;
        auto m = mask::unpack((1<<tail)-1);
    
        auto vp = simd(a+i, m)*simd(b+i, m);
        where(m && vp!=0, vp).copy_to(c+i);
    }

    Classes

    Three user-facing template classes are provided:

    1. simd<V, N, I = simd_abi::default_abi>

      N-wide vector type of values of type V, using architecture-specific implementation I. The implementation parameter is itself a template, acting as a type-map, with I<V, N>::type being the concrete implementation class (see below) for N-wide vectors of type V for this architecture.

      The implementation simd_abi::generic provides a std::array-backed implementation for arbitrary V and N, while simd_abi::native maps to the native architecture implementation for V and N, if one is available for the target architecture.

      simd_abi::default_abi will use simd_abi::native if available, or else fall back to the generic implementation.

    2. simd_mask<V, N, I = simd_api::default_abi>

      The result of performing a lane-wise comparison/test operation on a simd<V, N, I> vector value. simd_mask objects support logical operations and are used as arguments to where expressions.

      simd_mask<V, N, I> is a type alias for simd<V, N, I>::simd_mask.

    3. where_expression<simd<V, N, I>>

      The result of a where expression, used for masked assignment.

    There is, in addition, a templated class detail::indirect_expression that holds the result of an indirect(...) expression. These arise in gather and scatter operations, and are detailed below.

    Implementation typemaps live in the simd_abi namespace, while concrete implementation classes live in detail. A particular specialization for an architecture, for example 4-wide double on AVX, then requires:

    • A concrete implementation class, e.g. detail::avx_double4.
    • A specialization of its ABI map, so that simd_abi::avx<double, 4>::type is an alias for detail::avx_double4.
    • A specialization of the native ABI map, so that simd_abi::native<double, 4>::type is an alias for simd_abi::avx<double, 4>::type.

    The maximum natively supported width for a scalar type V is recorded in simd_abi::native_width<V>::value.

    Indirect expressions

    An expression of the form indirect(p, k) or indirect(p, k, constraint) describes a sequence of memory locations based at the pointer p with offsets given by the simd variable k. A constraint of type index_constraint can be provided, which promises certain guarantees on the index values in k:

    Constraint Guarantee
    index_constraint::none No restrictions.
    index_constraint::independent No indices are repeated, i.e. ki = kj implies i = j.
    index_constraint::contiguous Indices are sequential, i.e. ki = k0 + i.
    index_constraint::constant Indices are all equal, i.e. ki = kj for all i and j.