diff --git a/doc/index.rst b/doc/index.rst index bbd64a5f67e52ea3a8fb8e933f32835350dcb843..b8d8e03e67ac059139481649a784cce7aa99e7a7 100644 --- a/doc/index.rst +++ b/doc/index.rst @@ -55,7 +55,6 @@ Some key features include: library simd_api - simd_maths profiler sampling_api diff --git a/doc/simd_api.rst b/doc/simd_api.rst index be33e7a040ebaa0693adaad8af0ca926b4207750..f88c8f0859bc1bfd80307b34be14dfdf0d92ed87 100644 --- a/doc/simd_api.rst +++ b/doc/simd_api.rst @@ -1,5 +1,5 @@ -SIMD classes for Arbor -====================== +SIMD Classes +============ The purpose of the SIMD classes is to abstract and consolidate the use of compiler intrinsics for the manipulation of architecture-specific vector @@ -456,7 +456,7 @@ all SIMD value types, while the transcendental functions are only usable for SIMD floating point types. Vectorized implementations of some of the transcendental functions are provided: -refer to :doc:`simd_maths` for details. +refer to the `vector transcendental functions documentation <simd_maths_>`_ for details. In the following: @@ -948,3 +948,191 @@ of our generated mechanisms. implementation API to support this would allow, for example, efficient use of AVX512 masked arithmetic instructions. +.. _simd_maths: + +Implementation of vector transcendental functions +------------------------------------------------- + +When building with the Intel C++ compiler, transcendental +functions on SIMD values in ``simd<double, 8, simd_detail::avx512>`` +wrap calls to the Intel scalar vector mathematics library (SVML). + +Outside of this case, the functions *exp*, *log*, *expm1* and +*exprelr* use explicit approximations as detailed below. The +algortihms follow those used in the +`Cephes library <http://www.netlib.org/cephes/>`_, with +some accommodations. + +.. default-role:: math + +Exponentials +^^^^^^^^^^^^ + +`\operatorname{exp}(x)` +~~~~~~~~~~~~~~~~~~~~~~~ + +The exponential is computed as + +.. math:: + + e^x = 2^n · e^g, + +with `|g| ≤ 0.5` and `n` an integer. The power of two +is computed via direct manipulation of the exponent bits of the floating +point representation, while `e^g` is approximated by a rational polynomial. + +`n` and `g` are computed by: + +.. math:: + + n &= \left\lfloor \frac{x}{\log 2} + 0.5 \right\rfloor + + g &= x - n·\log 2 + +where the subtraction in the calculation of `g` is performed in two stages, +to limit cancellation error: + +.. math:: + + g &\leftarrow \operatorname{fl}(x - n · c_1) + + g &\leftarrow \operatorname{fl}(g - n · c_2) + +where `c_1+c_2 = \log 2`, `c_1` comprising the first 32 bits of the mantissa. +(In principle `c_1` might contain more bits of the logarithm, but this +particular decomposition matches that used in the Cephes library.) This +decomposition gives `|g|\leq \frac{1}{2}\log 2\approx 0.347`. + +The rational approximation for `e^g` is of the form + +.. math:: + + e^g \approx \frac{R(g)}{R(-g)} + +where `R(g)` is a polynomial of order 6. The coefficients are again those +used by Cephes, and probably are derived via a Remez algorithm. +`R(g)` is decomposed into even and odd terms + +.. math:: + + R(g) = Q(x^2) + xP(x^2) + +so that the ratio can be calculated by: + +.. math:: + + e^g \approx 1 + \frac{2gP(g^2)}{Q(g^2)-gP(g^2)}. + +Randomized testing indicates the approximation is accurate to 2 ulp. + + +`\operatorname{expm1}(x)` +~~~~~~~~~~~~~~~~~~~~~~~~~ + +A similar decomposition of `x = g + n·\log 2` is performed so that +`g≤0.5`, with the exception that `n` is always taken to +be zero for `|x|≤0.5`, i.e. + +.. math:: + + n = \begin{cases} + 0&\text{if $|x|≤0.5$,}\\ + \left\lfloor \frac{x}{\log 2} + 0.5 \right\rfloor + &\text{otherwise.} + \end{cases} + + +`\operatorname{expm1}(x)` is then computed as + +.. math:: + + e^x - 1 = 2^n·(e^g - 1)+(2^n-1). + +and the same rational polynomial is used to approximate `e^g-1`, + +.. math:: + + e^g - 1 \approx \frac{2gP(g^2)}{Q(g^2)-gP(g^2)}. + +The scaling by step for `n≠0` is in practice calculated as + +.. math:: + + e^x - 1 = 2·(2^{n-1}·(e^g - 1)+(2^{n-1}-0.5)). + +in order to avoid overflow at the upper end of the range. + +The order 6 rational polynomial approximation for small `x` +is insufficiently accurate to maintain 1 ulp accuracy; randomized +testing indicates a maximum error of up to 3 ulp. + + +`\operatorname{exprelr}(x)` +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The function is defined as + +.. math:: + + \operatorname{exprelr}(x) = x/(e^x-1), + +and is the reciprocal of the relative exponential function, + +.. math:: + + \operatorname{exprel}(x) &= {}_1F_1(1; 2; x)\\ + &= \frac{e^x-1}{x}. + +This is computed in terms of expm1 by: + +.. math:: + + \operatorname{exprelr}(x) := + \begin{cases} + 1&\text{if $\operatorname{fl}(1+x) = 1$,}\\ + x/\operatorname{expm1}(x)&\text{otherwise.} + \end{cases} + +With the approximation for `\operatorname{expm1}` used above, +randomized testing demonstrates a maximum error on the order +of 4 ulp. + + +Logarithms +^^^^^^^^^^ + +The natural logarithm is computed as + +.. math:: + + \log x = \log u + n·log 2 + +where `n` is an integer and `u` is in the interval +`[ \frac{1}{2}\sqrt 2, \sqrt 2]`. The logarithm of +`u` is then approximated by the rational polynomial +used in the Cephes implementation, + +.. math:: + + \log u &\approx R(u-1) + + R(z) &= z - \frac{z^2}{2} + z^3·\frac{P(z)}{Q(z)}, + +where `P` and `Q` are polynomials of degree 5, with +`Q` monic. + +Cancellation error is minimized by computing the sum for +`\log x` as: + +.. math:: + + s &\leftarrow \operatorname{fl}(z^3·P(z)/Q(z))\\ + s &\leftarrow \operatorname{fl}(s + n·c_4)\\ + s &\leftarrow \operatorname{fl}(s - 0.5·z^2)\\ + s &\leftarrow \operatorname{fl}(s + z)\\ + s &\leftarrow \operatorname{fl}(s + n·c_3) + +where `z=u-1` and `c_3+c_4=\log 2`, `c_3` comprising +the first 9 bits of the mantissa. + + diff --git a/doc/simd_maths.rst b/doc/simd_maths.rst deleted file mode 100644 index b93b07e663973100949c6f885bea4dff6863bd78..0000000000000000000000000000000000000000 --- a/doc/simd_maths.rst +++ /dev/null @@ -1,185 +0,0 @@ -Implementation of vector transcendental functions -================================================= - -When building with the Intel C++ compiler, transcendental -functions on SIMD values in ``simd<double, 8, simd_detail::avx512>`` -wrap calls to the Intel scalar vector mathematics library (SVML). - -Outside of this case, the functions *exp*, *log*, *expm1* and -*exprelr* use explicit approximations as detailed below. The -algortihms follow those used in the -`Cephes library <http://www.netlib.org/cephes/>`_, with -some accommodations. - -.. default-role:: math - -Exponentials ------------- - -`\operatorname{exp}(x)` -^^^^^^^^^^^^^^^^^^^^^^^ - -The exponential is computed as - -.. math:: - - e^x = 2^n · e^g, - -with `|g| ≤ 0.5` and `n` an integer. The power of two -is computed via direct manipulation of the exponent bits of the floating -point representation, while `e^g` is approximated by a rational polynomial. - -`n` and `g` are computed by: - -.. math:: - - n &= \left\lfloor \frac{x}{\log 2} + 0.5 \right\rfloor - - g &= x - n·\log 2 - -where the subtraction in the calculation of `g` is performed in two stages, -to limit cancellation error: - -.. math:: - - g &\leftarrow \operatorname{fl}(x - n · c_1) - - g &\leftarrow \operatorname{fl}(g - n · c_2) - -where `c_1+c_2 = \log 2`,`c_1` comprising the first 32 bits of the mantissa. -(In principle `c_1` might contain more bits of the logarithm, but this -particular decomposition matches that used in the Cephes library.) This -decomposition gives `|g|\leq \frac{1}{2}\log 2\approx 0.347`. - -The rational approximation for `e^g` is of the form - -.. math:: - - e^g \approx \frac{R(g)}{R(-g)} - -where `R(g)` is a polynomial of order 6. The coefficients are again those -used by Cephes, and probably are derived via a Remez algorithm. -`R(g)` is decomposed into even and odd terms - -.. math:: - - R(g) = Q(x^2) + xP(x^2) - -so that the ratio can be calculated by: - -.. math:: - - e^g \approx 1 + \frac{2gP(g^2)}{Q(g^2)-gP(g^2)}. - -Randomized testing indicates the approximation is accurate to 2 ulp. - - -`\operatorname{expm1}(x)` -^^^^^^^^^^^^^^^^^^^^^^^^^ - -A similar decomposition of `x = g + n·\log 2` is performed so that -`g≤0.5`, with the exception that `n` is always taken to -be zero for `|x|≤0.5`, i.e. - -.. math:: - - n = \begin{cases} - 0&\text{if $|x|≤0.5$,}\\ - \left\lfloor \frac{x}{\log 2} + 0.5 \right\rfloor - &\text{otherwise.} - \end{cases} - - -`\operatorname{expm1}(x)` is then computed as - -.. math:: - - e^x - 1 = 2^n·(e^g - 1)+(2^n-1). - -and the same rational polynomial is used to approximate `e^g-1`, - -.. math:: - - e^g - 1 \approx \frac{2gP(g^2)}{Q(g^2)-gP(g^2)}. - -The scaling by step for `n≠0` is in practice calculated as - -.. math:: - - e^x - 1 = 2·(2^{n-1}·(e^g - 1)+(2^{n-1}-0.5)). - -in order to avoid overflow at the upper end of the range. - -The order 6 rational polynomial approximation for small `x` -is insufficiently accurate to maintain 1 ulp accuracy; randomized -testing indicates a maximum error of up to 3 ulp. - - -`\operatorname{exprelr}(x)` -^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -The function is defined as - -.. math:: - - \operatorname{exprelr}(x) = x/(e^x-1), - -and is the reciprocal of the relative exponential function, - -.. math:: - - \operatorname{exprel}(x) &= {}_1F_1(1; 2; x)\\ - &= \frac{e^x-1}{x}. - -This is computed in terms of expm1 by: - -.. math:: - - \operatorname{exprelr}(x) := - \begin{cases} - 1&\text{if $\operatorname{fl}(1+x) = 1$,}\\ - x/\operatorname{expm1}(x)&\text{otherwise.} - \end{cases} - -With the approximation for `\operatorname{expm1}` used above, -randomized testing demonstrates a maximum error on the order -of 4 ulp. - - -Logarithms ----------- - -The natural logarithm is computed as - -.. math:: - - \log x = \log u + n·log 2 - -where `n` is an integer and `u` is in the interval -`[ \frac{1}{2}\sqrt 2, \sqrt 2]`. The logarithm of -`u` is then approximated by the rational polynomial -used in the Cephes implementation, - -.. math:: - - \log u &\approx R(u-1) - - R(z) &= z - \frac{z^2}{2} + z^3·\frac{P(z)}{Q(z)}, - -where `P` and `Q` are polynomials of degree 5, with -`Q` monic. - -Cancellation error is minimized by computing the sum for -`\log x` as: - -.. math:: - - s &\leftarrow \operatorname{fl}(z^3·P(z)/Q(z))\\ - s &\leftarrow \operatorname{fl}(s + n·c_4)\\ - s &\leftarrow \operatorname{fl}(s - 0.5·z^2)\\ - s &\leftarrow \operatorname{fl}(s + z)\\ - s &\leftarrow \operatorname{fl}(s + n·c_3) - -where `z=u-1` and `c_3+c_4=\log 2`, `c_3` comprising -the first 9 bits of the mantissa. -