-
Thorsten Hater authored
Add a first round of developer documentation, evolving from the older `internal` part of the docs.
Unverified8c85f006
SIMD Classes
The purpose of the SIMD classes is to abstract and consolidate the use of compiler intrinsics for the manipulation of architecture-specific vector (SIMD) values.
The implementation is rather loosely based on the data-parallel vector types proposal P0214R6 for the C++ Parallelism TS 2.
Unless otherwise specified, all classes, namespaces and top-level functions described below are all within the top-level arb::simd namespace.
Example usage
The following code performs an element-wise vector product, storing only non-zero values in the resultant array.
#include <simd/simd.hpp>
using namespace arb::simd;
void product_nonzero(int n, const double* a, const double* b, double* result) {
constexpr int N = simd_abi::native_width<double>::value;
using simd = simd<double, N>;
using mask = simd::simd_mask;
int i = 0;
for (; i+N<=n; i+=N) {
auto vp = simd(a+i)*simd(b+i);
where(vp!=0, vp).copy_to(result+i);
}
int tail = n-i;
auto m = mask::unpack((1<<tail)-1);
auto vp = simd(a+i, m)*simd(b+i, m);
where(m && vp!=0, vp).copy_to(c+i);
}
Classes
Three user-facing template classes are provided:
-
simd<V, N, I = simd_abi::default_abi>
N-wide vector type of values of type V, using architecture-specific implementation I. The implementation parameter is itself a template, acting as a type-map, with
I<V, N>::type
being the concrete implementation class (see below) for N-wide vectors of type V for this architecture.The implementation
simd_abi::generic
provides astd::array
-backed implementation for arbitrary V and N, whilesimd_abi::native
maps to the native architecture implementation for V and N, if one is available for the target architecture.simd_abi::default_abi
will usesimd_abi::native
if available, or else fall back to the generic implementation. -
simd_mask<V, N, I = simd_api::default_abi>
The result of performing a lane-wise comparison/test operation on a
simd<V, N, I>
vector value.simd_mask
objects support logical operations and are used as arguments towhere
expressions.simd_mask<V, N, I>
is a type alias forsimd<V, N, I>::simd_mask
. -
where_expression<simd<V, N, I>>
The result of a
where
expression, used for masked assignment.
There is, in addition, a templated class detail::indirect_expression
that holds the result of an indirect(...) expression. These arise in
gather and scatter operations, and are detailed below.
Implementation typemaps live in the simd_abi
namespace, while concrete
implementation classes live in detail
. A particular specialization
for an architecture, for example 4-wide double on AVX, then requires:
- A concrete implementation class, e.g.
detail::avx_double4
. - A specialization of its ABI map, so that
simd_abi::avx<double, 4>::type
is an alias fordetail::avx_double4
. - A specialization of the native ABI map, so that
simd_abi::native<double, 4>::type
is an alias forsimd_abi::avx<double, 4>::type
.
The maximum natively supported width for a scalar type V is recorded in
simd_abi::native_width<V>::value
.
Indirect expressions
An expression of the form indirect(p, k)
or indirect(p, k, constraint)
describes
a sequence of memory locations based at the pointer p with offsets given by the
simd
variable k. A constraint of type index_constraint
can be provided, which
promises certain guarantees on the index values in k:
Constraint | Guarantee |
---|---|
index_constraint::none |
No restrictions. |
index_constraint::independent |
No indices are repeated, i.e. ki = kj implies i = j. |
index_constraint::contiguous |
Indices are sequential, i.e. ki = k0 + i. |
index_constraint::constant |
Indices are all equal, i.e. ki = kj for all i and j. |