-
Thorsten Hater authored
Add a first round of developer documentation, evolving from the older `internal` part of the docs.
Unverified8c85f006
SIMD Classes
The purpose of the SIMD classes is to abstract and consolidate the use of compiler intrinsics for the manipulation of architecture-specific vector (SIMD) values.
The implementation is rather loosely based on the data-parallel vector types proposal P0214R6 for the C++ Parallelism TS 2.
Unless otherwise specified, all classes, namespaces and top-level functions described below are all within the top-level arb::simd namespace.
Example usage
The following code performs an element-wise vector product, storing only non-zero values in the resultant array.
#include <simd/simd.hpp>
using namespace arb::simd;
void product_nonzero(int n, const double* a, const double* b, double* result) {
constexpr int N = simd_abi::native_width<double>::value;
using simd = simd<double, N>;
using mask = simd::simd_mask;
int i = 0;
for (; i+N<=n; i+=N) {
auto vp = simd(a+i)*simd(b+i);
where(vp!=0, vp).copy_to(result+i);
}
int tail = n-i;
auto m = mask::unpack((1<<tail)-1);
auto vp = simd(a+i, m)*simd(b+i, m);
where(m && vp!=0, vp).copy_to(c+i);
}
Classes
Three user-facing template classes are provided:
-
simd<V, N, I = simd_abi::default_abi>
N-wide vector type of values of type V, using architecture-specific implementation I. The implementation parameter is itself a template, acting as a type-map, with
I<V, N>::type
being the concrete implementation class (see below) for N-wide vectors of type V for this architecture.The implementation
simd_abi::generic
provides astd::array
-backed implementation for arbitrary V and N, whilesimd_abi::native
maps to the native architecture implementation for V and N, if one is available for the target architecture.simd_abi::default_abi
will usesimd_abi::native
if available, or else fall back to the generic implementation. -
simd_mask<V, N, I = simd_api::default_abi>
The result of performing a lane-wise comparison/test operation on a
simd<V, N, I>
vector value.simd_mask
objects support logical operations and are used as arguments towhere
expressions.simd_mask<V, N, I>
is a type alias forsimd<V, N, I>::simd_mask
. -
where_expression<simd<V, N, I>>
The result of a
where
expression, used for masked assignment.
There is, in addition, a templated class detail::indirect_expression
that holds the result of an indirect(...) expression. These arise in
gather and scatter operations, and are detailed below.
Implementation typemaps live in the simd_abi
namespace, while concrete
implementation classes live in detail
. A particular specialization
for an architecture, for example 4-wide double on AVX, then requires:
- A concrete implementation class, e.g.
detail::avx_double4
. - A specialization of its ABI map, so that
simd_abi::avx<double, 4>::type
is an alias fordetail::avx_double4
. - A specialization of the native ABI map, so that
simd_abi::native<double, 4>::type
is an alias forsimd_abi::avx<double, 4>::type
.
The maximum natively supported width for a scalar type V is recorded in
simd_abi::native_width<V>::value
.
Indirect expressions
An expression of the form indirect(p, k)
or indirect(p, k, constraint)
describes
a sequence of memory locations based at the pointer p with offsets given by the
simd
variable k. A constraint of type index_constraint
can be provided, which
promises certain guarantees on the index values in k:
Constraint | Guarantee |
---|---|
index_constraint::none |
No restrictions. |
index_constraint::independent |
No indices are repeated, i.e. ki = kj implies i = j. |
index_constraint::contiguous |
Indices are sequential, i.e. ki = k0 + i. |
index_constraint::constant |
Indices are all equal, i.e. ki = kj for all i and j. |
Class simd
The class simd<V, N, I>
is an alias for detail::simd_impl<I<V, N>::type>
;
the class detail::simd_impl<C>
provides the public interface and
arithmetic operators for a concrete implementation class C.
In the following:
-
S stands for the class
simd<V, N, I>
. - s is a SIMD value of type S.
-
m is a mask value of type
S::simd_mask
. - t, u and v are const objects of type S.
-
w is a SIMD value of type
simd<W, N, J>
. -
i is an index of type
int
. -
j is a const object of type
simd<U, N, J>
where U is an integral type. - x is a value of type V.
- p is a pointer to V.
- c is a const pointer to V or a length N array of V.
Here and below, the value in lane i of a SIMD vector or mask v is denoted by vi
Type aliases and constexpr members
Name | Type | Description |
---|---|---|
S::scalar_type |
V | The type of one lane of the SIMD type. |
S::simd_mask |
simd_mask<V, N, I> |
The simd_mask specialization resulting from comparisons of S SIMD values. |
S::width |
unsigned |
The SIMD width N. |
Constructors
Expression | Description |
---|---|
S(x) |
A SIMD value v with vi equal to x for i = 0…N-1. |
S(t) |
A copy of the SIMD value t. |
S(c) |
A SIMD value v with vi equal to c[i] for i = 0…N-1. |
S(w) |
A copy or value-cast of the SIMD value w of a different type but same width. |
S(indirect(p, j)) |
A SIMD value v with vi equal to p[j[i]] for i = 0…N-1. |
S(c, m) |
A SIMD value v with vi equal to c[i] for i where mi is true. |
Member functions
Expression | Type | Description |
---|---|---|
t.copy_to(p) |
void |
Set p[i] to ti for i = 0…N-1. |
t.copy_to(indirect(p, j)) |
void |
Set p[j[i]] to ti for i = 0…N-1. |
s.copy_from(c) |
void |
Set si to c[i] for i = 0…N-1. |
s.copy_from(indirect(c, j)) |
void |
Set si to c[j[i]] for i = 0…N-1. |
s.sum() |
V |
Sum of si for i = 0…N-1. |
Expressions
Expression | Type | Description |
---|---|---|
t+u |
S |
Lane-wise sum. |
t-u |
S |
Lane-wise difference. |
t*u |
S |
Lane-wise product. |
t/u |
S |
Lane-wise quotient. |
fma(t, u, v) |
S |
Lane-wise FMA t * u + v. |
s<t |
S::simd_mask |
Lane-wise less-than comparison. |
s<=t |
S::simd_mask |
Lane-wise less-than-or-equals comparison. |
s>t |
S::simd_mask |
Lane-wise greater-than comparison. |
s>=t |
S::simd_mask |
Lane-wise greater-than-or-equals comparison. |
s==t |
S::simd_mask |
Lane-wise equality test. |
s!=t |
S::simd_mask |
Lane-wise inequality test. |
s=t |
S& |
Lane-wise assignment. |
s+=t |
S& |
Equivalent to s=s+t . |
s-=t |
S& |
Equivalent to s=s-t . |
s*=t |
S& |
Equivalent to s=s*t . |
s/=t |
S& |
Equivalent to s=s/t . |
s=x |
S& |
Equivalent to s=S(x) . |
indirect(p, j)=t |
decltype(indirect(p, j))& |
Equivalent to t.copy_to(indirect(p, j)) . |
indirect(p, j)+=t |
decltype(indirect(p, j))& |
Compound indirect assignment: p[j[i]]+=t[i] for i = 0…N-1. |
indirect(p, j)-=t |
decltype(indirect(p, j))& |
Compound indirect assignment: p[j[i]]-=t[i] for i = 0…N-1. |
t[i] |
V |
Value ti |
s[i]=x |
S::reference |
Set value si to x. |
The (non-const) index operator operator[]
returns a proxy object of type S::reference
,
which writes the corresponding lane in the SIMD value on assignment, and has an
implicit conversion to scalar_type
.
Class simd_mask
simd_mask<V, N, I>
is an alias for simd<V, N, I>::simd_mask
, which in turn
will be an alias for a class detail::simd_mask_impl<D>
, where D is
a concrete implementation class for the SIMD mask representation. simd_mask_impl<D>
inherits from, and is implemented in terms of, detail::simd_impl<D>
,
but note that the concrete implementation class D may or may not be the same
as the concrete implementation class I<V, N>::type
used by simd<V, N, I>
.
Mask values are read and written as bool
values of 0 or 1, which may
differ from the internal representation in each lane of the SIMD implementation.
In the following:
-
M stands for the class
simd_mask<V, N, I>
. -
m and q are const objects of type
simd_mask<V, N, I>
. -
u is an object of type
simd_mask<V, N, I>
. - b is a boolean value.
-
q is a pointer to
bool
. -
y is a const pointer to
bool
or a length N array ofbool
. -
i is of type
int
. -
k is of type
unsigned long long
.
Constructors
Expression | Description |
---|---|
M(b) |
A SIMD mask u with ui equal to b for i = 0…N-1. |
M(m) |
A copy of the SIMD mask m. |
M(y) |
A SIMD value u with ui equal to y[i] for i = 0…N-1. |
Note that simd_mask
does not (currently) offer a masked pointer/array constructor.
Member functions
Expression | Type | Description |
---|---|---|
m.copy_to(q) |
void |
Write the boolean value mi to q[i] for i = 0…N-1. |
u.copy_from(y) |
void |
Set ui to the boolean value y[i] for i = 0…N-1. |
Expressions
Expression | Type | Description |
---|---|---|
!m |
M |
Lane-wise negation. |
m&&q |
M |
Lane-wise logical and. |
m||q |
M |
Lane-wise logical or. |
m==q |
M |
Lane-wise equality (equivalent to m!=!q ). |
m!=q |
M |
Lane-wise logical xor. |
m=q |
M& |
Lane-wise assignment. |
m[i] |
bool |
Boolean value mi. |
m[i]=b |
M::reference |
Set mi to boolean value b. |
Static member functions
Expression | Type | Description |
---|---|---|
M::unpack(k) |
M |
Mask with value mi equal to the ith bit of k. |
Class where_expression
where_expression<S>
represents a masked subset of the lanes
of a SIMD value of type S
, used for conditional assignment,
masked scatter, and masked gather. It is a type alias for
S::where_expression
, and is the result of an expression of the
form where(mask, simdvalue)
.
In the following:
-
W stands for the class
where_expression<simd<V, N, I>>
. -
s is a reference to a SIMD value of type
simd<V, N, I>&
. -
t is a SIMD value of type
simd<V, N, I>
. -
m is a mask of type
simd<V, N, I>::simd_mask
. -
j is a const object of type
simd<U, N, J>
where U is an integral type. - x is a scalar of type V.
- p is a pointer to V.
- c is a const pointer to V or a length N array of V.
Expression | Type | Description |
---|---|---|
where(m, s) |
W |
A proxy for masked-assignment operations. |
where(m, s)=t |
void |
Set si to ti for i where mi is true. |
where(m, s)=x |
void |
Set si to x for i where mi is true. |
where(m, s).copy_to(p) |
void |
Set p[i] to si for i where mi is true. |
where(m, s).copy_to(indirect(p, j)) |
void |
Set p[j[i]] to si for i where mi is true. |
where(m, s).copy_from(c) |
void |
Set si to c[i] for i where mi is true. |
where(m, s).copy_from(indirect(c, j)) |
void |
Set si to c[j[i]] for i where mi is true. |
Top-level functions
Lane-wise mathematical operations abs(x), min(x, y) and max(x, y) are offered for all SIMD value types, while the transcendental functions are only usable for SIMD floating point types.
Vectorized implementations of some of the transcendental functions are provided: refer to the vector transcendental functions documentation for details.
In the following: