Skip to content
Snippets Groups Projects
simd_api.rst 32.44 KiB

SIMD Classes

The purpose of the SIMD classes is to abstract and consolidate the use of compiler intrinsics for the manipulation of architecture-specific vector (SIMD) values.

The implementation is rather loosely based on the data-parallel vector types proposal P0214R6 for the C++ Parallelism TS 2.

Unless otherwise specified, all classes, namespaces and top-level functions described below are all within the top-level arb::simd namespace.

Example usage

The following code performs an element-wise vector product, storing only non-zero values in the resultant array.

#include <simd/simd.hpp>
using namespace arb::simd;

void product_nonzero(int n, const double* a, const double* b, double* result) {
    constexpr int N = simd_abi::native_width<double>::value;
    using simd = simd<double, N>;
    using mask = simd::simd_mask;

    int i = 0;
    for (; i+N<=n; i+=N) {
        auto vp = simd(a+i)*simd(b+i);
        where(vp!=0, vp).copy_to(result+i);
    }

    int tail = n-i;
    auto m = mask::unpack((1<<tail)-1);

    auto vp = simd(a+i, m)*simd(b+i, m);
    where(m && vp!=0, vp).copy_to(c+i);
}

Classes

Three user-facing template classes are provided:

  1. simd<V, N, I = simd_abi::default_abi>

    N-wide vector type of values of type V, using architecture-specific implementation I. The implementation parameter is itself a template, acting as a type-map, with I<V, N>::type being the concrete implementation class (see below) for N-wide vectors of type V for this architecture.

    The implementation simd_abi::generic provides a std::array-backed implementation for arbitrary V and N, while simd_abi::native maps to the native architecture implementation for V and N, if one is available for the target architecture.

    simd_abi::default_abi will use simd_abi::native if available, or else fall back to the generic implementation.

  2. simd_mask<V, N, I = simd_api::default_abi>

    The result of performing a lane-wise comparison/test operation on a simd<V, N, I> vector value. simd_mask objects support logical operations and are used as arguments to where expressions.

    simd_mask<V, N, I> is a type alias for simd<V, N, I>::simd_mask.

  3. where_expression<simd<V, N, I>>

    The result of a where expression, used for masked assignment.

There is, in addition, a templated class detail::indirect_expression that holds the result of an indirect(...) expression. These arise in gather and scatter operations, and are detailed below.

Implementation typemaps live in the simd_abi namespace, while concrete implementation classes live in detail. A particular specialization for an architecture, for example 4-wide double on AVX, then requires:

  • A concrete implementation class, e.g. detail::avx_double4.
  • A specialization of its ABI map, so that simd_abi::avx<double, 4>::type is an alias for detail::avx_double4.
  • A specialization of the native ABI map, so that simd_abi::native<double, 4>::type is an alias for simd_abi::avx<double, 4>::type.

The maximum natively supported width for a scalar type V is recorded in simd_abi::native_width<V>::value.

Indirect expressions

An expression of the form indirect(p, k) or indirect(p, k, constraint) describes a sequence of memory locations based at the pointer p with offsets given by the simd variable k. A constraint of type index_constraint can be provided, which promises certain guarantees on the index values in k:

Constraint Guarantee
index_constraint::none No restrictions.
index_constraint::independent No indices are repeated, i.e. ki = kj implies i = j.
index_constraint::contiguous Indices are sequential, i.e. ki = k0 + i.
index_constraint::constant Indices are all equal, i.e. ki = kj for all i and j.