load_sim_tools.sh
SIMD Classes
The purpose of the SIMD classes is to abstract and consolidate the use of compiler intrinsics for the manipulation of architecture-specific vector (SIMD) values.
The implementation is rather loosely based on the data-parallel vector types proposal P0214R6 for the C++ Parallelism TS 2.
Unless otherwise specified, all classes, namespaces and top-level functions described below are all within the top-level arb::simd namespace.
Example usage
The following code performs an element-wise vector product, storing only non-zero values in the resultant array.
#include <simd/simd.hpp>
using namespace arb::simd;
void product_nonzero(int n, const double* a, const double* b, double* result) {
constexpr int N = simd_abi::native_width<double>::value;
using simd = simd<double, N>;
using mask = simd::simd_mask;
int i = 0;
for (; i+N<=n; i+=N) {
auto vp = simd(a+i)*simd(b+i);
where(vp!=0, vp).copy_to(result+i);
}
int tail = n-i;
auto m = mask::unpack((1<<tail)-1);
auto vp = simd(a+i, m)*simd(b+i, m);
where(m && vp!=0, vp).copy_to(c+i);
}
Classes
Three user-facing template classes are provided:
-
simd<V, N, I = simd_abi::default_abi>N-wide vector type of values of type V, using architecture-specific implementation I. The implementation parameter is itself a template, acting as a type-map, with
I<V, N>::typebeing the concrete implementation class (see below) for N-wide vectors of type V for this architecture.The implementation
simd_abi::genericprovides astd::array-backed implementation for arbitrary V and N, whilesimd_abi::nativemaps to the native architecture implementation for V and N, if one is available for the target architecture.simd_abi::default_abiwill usesimd_abi::nativeif available, or else fall back to the generic implementation. -
simd_mask<V, N, I = simd_api::default_abi>The result of performing a lane-wise comparison/test operation on a
simd<V, N, I>vector value.simd_maskobjects support logical operations and are used as arguments towhereexpressions.simd_mask<V, N, I>is a type alias forsimd<V, N, I>::simd_mask. -
where_expression<simd<V, N, I>>The result of a
whereexpression, used for masked assignment.
There is, in addition, a templated class detail::indirect_expression
that holds the result of an indirect(...) expression. These arise in
gather and scatter operations, and are detailed below.
Implementation typemaps live in the simd_abi namespace, while concrete
implementation classes live in detail. A particular specialization
for an architecture, for example 4-wide double on AVX, then requires:
- A concrete implementation class, e.g.
detail::avx_double4. - A specialization of its ABI map, so that
simd_abi::avx<double, 4>::typeis an alias fordetail::avx_double4. - A specialization of the native ABI map, so that
simd_abi::native<double, 4>::typeis an alias forsimd_abi::avx<double, 4>::type.
The maximum natively supported width for a scalar type V is recorded in
simd_abi::native_width<V>::value.
Indirect expressions
An expression of the form indirect(p, k) or indirect(p, k, constraint) describes
a sequence of memory locations based at the pointer p with offsets given by the
simd variable k. A constraint of type index_constraint can be provided, which
promises certain guarantees on the index values in k:
| Constraint | Guarantee |
|---|---|
index_constraint::none |
No restrictions. |
index_constraint::independent |
No indices are repeated, i.e. ki = kj implies i = j. |
index_constraint::contiguous |
Indices are sequential, i.e. ki = k0 + i. |
index_constraint::constant |
Indices are all equal, i.e. ki = kj for all i and j. |