From 0ded25a68266f691497311e3052d5f5fdd5a6686 Mon Sep 17 00:00:00 2001 From: Ivan Martinez <ivan.martinez@bsc.es> Date: Fri, 4 Nov 2016 12:00:08 +0100 Subject: [PATCH] Omp (#38) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * first version of openmp threading back end * adding openmp parallel sort implementation * OpenMP sort working * Support for units syntax within state block. * Add soma-less cable cell to test cells. Also: * Ensure intrinsic and passive properties properly set on test cells. * Change bulk resistivity default. * Align defaults with values used in most of the NEURON validation scripts. * Use consistent 100 Ω·m bulk resistivity across both NEURON test models and basic validation cells. * OpenMP back end working * Add Extrae+paraver support, needs to fix compilation warnings * Reorganize validation data generation * Move generation and data to top-level validation directory. * Make BUILD_VALIDATION_DATA and VALIDATION_DATA_DIR cache vars. * Add helper CMake functions for data generation. Note `validation/ref/numeric/foo.sh` is just a placeholder. * Bugfix: hh_soma.jl * Use consistent scaling for y[1] scalar voltage in hh_soma.jl * Also: add more reserved target names to CMakeLists.txt helper function. * Refactor convergence tests; add numeric soma ref. * Amend data directory path in validation tests. * Enmodulate `hh_soma.jl` * Add HH channel reference data generations script. * Switch `validate_soma.cpp` to numeric reference data. * Consolidate common code in `validate_ball_and_stick.cpp` * Add (nearly) Rallpack1 validation test (see below). * Gentle failure on absence of reference data in `validate_ball_and_stick.cpp` Can't yet override mechanism default parameter values, so the cable cell model added to `test_common_cells.hpp` lets the default stand; validation script will have to use the default membrane conductance rather than that given by Rallpack1. * Add Rallpack1 validation, plus bugfix, clean * Implement Rallpack1 validation test (with a workaround for inability to set membrane conductance). * Fix bug in L≠1 case in PassiveCable.jl (this may still be wrong). * Fix bug in peak delta computation in trace analysis when both traces have no local maxima. * Gentle failure on missing `numeric_soma.json` * Allow multiple `-s` selection operations for `tsplot`, acting disjunctively. * Remove errant test file. * file's cleanup * Remove tabs * Use correct routine in numeric_rallpack1.jl x0.3 * Configure-time test for julia * `math::infinity<>()` wrapper for infinity * Use name `i_e` for Stim current density * Use `math::infinity<>()` for infinite value * Adds unit tests for the STATE block. * Add "lib" to search prefixes for libtbb * Fix quoting error in library search. * Add "lib" to prefixes when system is "Linux". * Address deprecated use of 'symbol' warning. Julia 0.5 deprecates use of `symbol` instead of `Symbol`. This patch just substitutes the correct call. * Address deprecated use of 'symbol' warning. Julia 0.5 deprecates use of `symbol` instead of `Symbol`. This patch just substitutes the correct call. * Addresses PR comments. * Unit tests for math.hpp * Tests for `math::pi`, `math::lerp`, `math::area_frustrum` and `math::volume_frustrum` * Fix `math:pi<long double>()`. * Extend range, view functionality. * New `filter` view: lazily selects based on predicate. * Generic `front` and `back` for sequences. * New rangeutil STL wrappers `stable_sort_by`, `all_of`, `any_of`. * Consolidate common utility unit testing structures into `tests/unit/common.hpp` * Add `ball_and_squiggle` model; fix `ball_and_taper`. * Make `test_common_cells.hpp` and `ball_and_taper.py` agree. * Add `ball_and_squiggle` model that has a tapering undulating profile. * Address PR#46 review comments. * Add documentation of template parameters for `filter_iterator`. * Document use of `uninitalized<F>` for holding functional objects in `filter_iterator` and `transform_iterator` * Consolidate validation test code (issue #41) * Simplify trace analysis and reporting code in `trace_analysis.hpp` * Consolidate convergence test run procedures into new class `convergence_test_runner`. * New compartment info structure for FVM. * Make `algorithm::sum`, `algorithm::mean` more generic, allowing use with array types. * Add `div_compartment` compartment representation, that holds geometric information for each half of a compartment that will then be used in calculating control volumes. * Add three compartmentalisation schemes/policies that discretize a segment into `div_compartment` objects: * `div_compartment_by_ends` divides based only on the segment end points and radii. * `div_compartment_sampler` forms frusta by sampling the segment radius at each compartment boundary * `div_compartment_integrator` computes the compartment areas and volumes exactly by summing all frustra in the intersection of the segment and the compartmnet span. * Extrae linked at execution time * cleaning project * Complex compartments * Use divided compartments to determine FVM coefficients. * Pick correct control volume in FVM from sgement position (avoids off-by-half error.) * Add colour override functionality to tsplot: `--colour` option. * Add const accessor for cell soma. * Source formatting, comments in `math.hpp` * Fix `range_view`: was using incorrectly named type trait. * Add unit test for `range_view`. * Allow points of discontinuity to be omitted from L-infinity norm calculations. * Add `-d, --min-dt` option to `validate.exe` to control time step in validation convergence tests. * Add validation test: confirm divided compartment policy does not effect results on simple frustrum dendrites. * Change default max compartments on validation tests to 100 (ad hoc observed convergence limit at dt circa 0.001 ms; finder spatial division would required much finer dt.) * Make NEURON validation data generation scripts use CVODE by default, and with `secondorder=2` when non-zero `dt` is given. * Remove division policy type parameter. * Use only `div_compartment_integrator` for compartmentalization in `fvm_multicell`. The policy will later be moved to a backend policy class. * For now, disable validation tests that test different division policies (see above). * Tweak comments and remove redundant `using`, following comments on PR#54. * Minor twicks and corrections --- CMakeLists.txt | 24 +++- scripts/extrae/extrae.xml | 81 +++++++++++ scripts/extrae/run_instrumented_omp.sh | 38 +++++ scripts/extrae/run_instrumented_serial.sh | 19 +++ scripts/extrae/run_once.sh | 15 ++ src/connection.hpp | 2 + src/threading/omp.hpp | 163 ++++++++++++++++++++++ src/threading/parallel_stable_sort.h | 127 +++++++++++++++++ src/threading/pss_common.h | 110 +++++++++++++++ src/threading/threading.hpp | 2 + tests/unit/test_spike_store.cpp | 1 + 11 files changed, 578 insertions(+), 4 deletions(-) create mode 100644 scripts/extrae/extrae.xml create mode 100755 scripts/extrae/run_instrumented_omp.sh create mode 100755 scripts/extrae/run_instrumented_serial.sh create mode 100755 scripts/extrae/run_once.sh create mode 100644 src/threading/omp.hpp create mode 100644 src/threading/parallel_stable_sort.h create mode 100644 src/threading/pss_common.h diff --git a/CMakeLists.txt b/CMakeLists.txt index 86bd6f78..dfec55ac 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -31,12 +31,25 @@ if(WITH_TRACE) add_definitions("-DWITH_TRACE") endif() -# TBB support -set(WITH_TBB OFF CACHE BOOL "use TBB for on-node threading" ) -if(WITH_TBB) +#threading model selection +set(THREADING_MODEL "serial" CACHE STRING "set the threading model, one of serial/tbb/omp") +if(THREADING_MODEL MATCHES "tbb") + # TBB support find_package(TBB REQUIRED) set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TBB_DEFINITIONS}") add_definitions(-DWITH_TBB) + +elseif(THREADING_MODEL MATCHES "omp") + # OpenMP support + find_package(OpenMP REQUIRED) + set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}") + add_definitions(-DWITH_OMP) + +elseif(THREADING_MODEL MATCHES "serial") + #setup previously done + +else() + message( FATAL_ERROR "-- Threading model '${THREADING_MODEL}' not supported, use one of serial/tbb/omp") endif() # MPI support @@ -50,12 +63,15 @@ if(WITH_MPI) set_property(DIRECTORY APPEND_STRING PROPERTY COMPILE_OPTIONS "${MPI_C_COMPILE_FLAGS}") endif() -# Profiler support + +# Internal profiler support set(WITH_PROFILING OFF CACHE BOOL "use built-in profiling of miniapp" ) if(WITH_PROFILING) add_definitions(-DWITH_PROFILING) endif() + + # Cray systems set(SYSTEM_CRAY OFF CACHE BOOL "add flags for compilation on Cray systems") if(SYSTEM_CRAY) diff --git a/scripts/extrae/extrae.xml b/scripts/extrae/extrae.xml new file mode 100644 index 00000000..fb3e7311 --- /dev/null +++ b/scripts/extrae/extrae.xml @@ -0,0 +1,81 @@ +<?xml version='1.0'?> + +<trace enabled="yes" + home="/apps/CEPBATOOLS/extrae/3.3.0/openmpi+libgomp4.2/64" + initial-mode="detail" + type="paraver" + xml-parser-id="Id: xml-parse.c 3918 2016-03-11 14:59:01Z harald $" +> + + <openmp enabled="yes"> + <locks enabled="no" /> + <counters enabled="yes" /> + </openmp> + + <pthread enabled="no"> + <locks enabled="no" /> + <counters enabled="yes" /> + </pthread> + + <counters enabled="yes"> + <cpu enabled="yes" starting-set-distribution="1"> + <set enabled="yes" domain="all" changeat-time="0"> + PAPI_TOT_INS,PAPI_TOT_CYC,PAPI_L1_DCM,PAPI_L2_DCM,PAPI_L3_TCM,PAPI_FP_INS,PAPI_BR_MSP + </set> + <set enabled="yes" domain="all" changeat-time="0"> + PAPI_TOT_INS,PAPI_TOT_CYC,PAPI_LD_INS,PAPI_SR_INS,PAPI_BR_UCN,PAPI_BR_CN,PAPI_VEC_SP,RESOURCE_STALLS + <sampling enabled="no" period="1000000000">PAPI_TOT_CYC</sampling> + </set> + </cpu> + + <network enabled="no" /> + + <resource-usage enabled="no" /> + + <memory-usage enabled="no" /> + </counters> + + <storage enabled="no"> + <trace-prefix enabled="yes">TRACE</trace-prefix> + <size enabled="no">5</size> + <temporal-directory enabled="yes">/scratch</temporal-directory> + <final-directory enabled="yes">/gpfs/scratch/bsc41/bsc41273</final-directory> + </storage> + + <buffer enabled="yes"> + <size enabled="yes">500000</size> + <circular enabled="no" /> + </buffer> + + <trace-control enabled="yes"> + <file enabled="no" frequency="5M">/gpfs/scratch/bsc41/bsc41273/control</file> + <global-ops enabled="no"></global-ops> + </trace-control> + + <others enabled="yes"> + <minimum-time enabled="no">10M</minimum-time> + <finalize-on-signal enabled="yes" + SIGUSR1="no" SIGUSR2="no" SIGINT="yes" + SIGQUIT="yes" SIGTERM="yes" SIGXCPU="yes" + SIGFPE="yes" SIGSEGV="yes" SIGABRT="yes" + /> + <flush-sampling-buffer-at-instrumentation-point enabled="yes" /> + </others> + + <sampling enabled="no" type="virtual" period="50m" variability="10m" /> + + <dynamic-memory enabled="no" /> + + <input-output enabled="no" /> + + <merge enabled="yes" + synchronization="default" + tree-fan-out="16" + max-memory="512" + joint-states="yes" + keep-mpits="yes" + sort-addresses="yes" + overwrite="yes" + /> + +</trace> diff --git a/scripts/extrae/run_instrumented_omp.sh b/scripts/extrae/run_instrumented_omp.sh new file mode 100755 index 00000000..7e0601a1 --- /dev/null +++ b/scripts/extrae/run_instrumented_omp.sh @@ -0,0 +1,38 @@ +#!/bin/bash + +display_usage() { + echo -e "\nUsage:\n$0 [num_threads] \n" +} + +# if less than one arguments supplied, display usage +if [ $# -le 0 ] + then + display_usage + exit 1 +fi + +# check whether user had supplied -h or --help . If yes display usage +if [[ ( $# == "--help") || $# == "-h" ]] + then + display_usage + exit 0 +fi + +if [ -z ${EXTRAE_HOME+x} ] + then + echo -e "\nSpecify EXTRAE_HOME variable before executing this script\n" + echo " export EXTRAE_HOME=path/to/directory" + exit 1 + else + echo -e "\nEXTRAE_HOME is set to '$EXTRAE_HOME'" +fi + +export OMP_NUM_THREADS=$1 +export EXTRAE_CONFIG_FILE=extrae.xml +source ${EXTRAE_HOME}/etc/extrae.sh +export LD_PRELOAD=$(find $EXTRAE_HOME -name "libomptrace.so") + + +./../../build/miniapp/miniapp.exe + +unset LD_PRELOAD \ No newline at end of file diff --git a/scripts/extrae/run_instrumented_serial.sh b/scripts/extrae/run_instrumented_serial.sh new file mode 100755 index 00000000..f8787e8d --- /dev/null +++ b/scripts/extrae/run_instrumented_serial.sh @@ -0,0 +1,19 @@ +#!/bin/bash + +if [ -z ${EXTRAE_HOME+x} ] + then + echo -e "\nSpecify EXTRAE_HOME variable before executing this script\n" + echo " export EXTRAE_HOME=path/to/directory" + exit 1 + else + echo -e "\nEXTRAE_HOME is set to '$EXTRAE_HOME'" +fi + + +export EXTRAE_CONFIG_FILE=extrae.xml +source ${EXTRAE_HOME}/etc/extrae.sh +export LD_PRELOAD=$(find $EXTRAE_HOME -name "libseqtrace.so") + +./../../build/miniapp/miniapp.exe + +unset LD_PRELOAD \ No newline at end of file diff --git a/scripts/extrae/run_once.sh b/scripts/extrae/run_once.sh new file mode 100755 index 00000000..b8638418 --- /dev/null +++ b/scripts/extrae/run_once.sh @@ -0,0 +1,15 @@ +#!/bin/bash +# @ job_name = miniapp +# @ partition = debug +## @ reservation = +# @ initialdir = . +# @ output = miniapp_%j.out +# @ error = miniapp_%j.err +# @ total_tasks = 1 +# @ cpus_per_task = 12 +# @ node_usage = not_shared +# @ wall_clock_limit = 00:15:00 + +export OMP_NUM_THREADS=16 +./run_instrumented.sh + diff --git a/src/connection.hpp b/src/connection.hpp index b9e40fb5..361aee20 100644 --- a/src/connection.hpp +++ b/src/connection.hpp @@ -15,6 +15,8 @@ public: using id_type = cell_member_type; using time_type = Time; + connection()=default; + connection(id_type src, id_type dest, float w, time_type d) : source_(src), destination_(dest), diff --git a/src/threading/omp.hpp b/src/threading/omp.hpp new file mode 100644 index 00000000..ad43e82b --- /dev/null +++ b/src/threading/omp.hpp @@ -0,0 +1,163 @@ +#pragma once + +#if !defined(WITH_OMP) + #error "this header can only be loaded if WITH_OMP is set" +#endif + +#include <omp.h> +#include "parallel_stable_sort.h" + +#include <algorithm> +#include <array> +#include <chrono> +#include <string> +#include <vector> + +namespace nest { +namespace mc { +namespace threading { + +/////////////////////////////////////////////////////////////////////// +// types +/////////////////////////////////////////////////////////////////////// +template <typename T> +class enumerable_thread_specific { + using storage_class = std::vector<T>; + storage_class data; +public : + using iterator = typename storage_class::iterator; + using const_iterator = typename storage_class::const_iterator; + + enumerable_thread_specific() { + data = std::vector<T>(omp_get_max_threads()); + } + + enumerable_thread_specific(const T& init) { + data = std::vector<T>(omp_get_max_threads(), init); + } + + T& local() { return data[omp_get_thread_num()]; } + const T& local() const { return data[omp_get_thread_num()]; } + + auto size() -> decltype(data.size()) const { return data.size(); } + + iterator begin() { return data.begin(); } + iterator end() { return data.end(); } + + const_iterator begin() const { return data.begin(); } + const_iterator end() const { return data.end(); } + + const_iterator cbegin() const { return data.cbegin(); } + const_iterator cend() const { return data.cend(); } +}; + +/////////////////////////////////////////////////////////////////////// +// algorithms +/////////////////////////////////////////////////////////////////////// +struct parallel_for { + template <typename F> + static void apply(int left, int right, F f) { + #pragma omp parallel for + for(int i=left; i<right; ++i) { + f(i); + } + } +}; + +template <typename RandomIt> +void sort(RandomIt begin, RandomIt end) { + pss::parallel_stable_sort(begin, end); +} + +template <typename RandomIt, typename Compare> +void sort(RandomIt begin, RandomIt end, Compare comp) { + pss::parallel_stable_sort(begin, end ,comp); +} + +template <typename Container> +void sort(Container& c) { + pss::parallel_stable_sort(c.begin(), c.end()); +} + + +template <typename T> +class parallel_vector { + using value_type = T; + std::vector<value_type> data_; +public: + parallel_vector() = default; + using iterator = typename std::vector<value_type>::iterator; + using const_iterator = typename std::vector<value_type>::const_iterator; + + iterator begin() { return data_.begin(); } + iterator end() { return data_.end(); } + + const_iterator begin() const { return data_.begin(); } + const_iterator end() const { return data_.end(); } + + const_iterator cbegin() const { return data_.cbegin(); } + const_iterator cend() const { return data_.cend(); } + + void push_back (const value_type& val) { + #pragma omp critical + data_.push_back(val); + } + + void push_back (value_type&& val) { + #pragma omp critical + data_.push_back(std::move(val)); + } +}; + +inline std::string description() { + return "OpenMP"; +} + +struct timer { + using time_point = std::chrono::time_point<std::chrono::system_clock>; + + static inline time_point tic() { + return std::chrono::system_clock::now(); + } + + static inline double toc(time_point t) { + return std::chrono::duration<double>(tic() - t).count(); + } + + static inline double difference(time_point b, time_point e) { + return std::chrono::duration<double>(e-b).count(); + } +}; + +constexpr bool multithreaded() { return true; } + + +class task_group { +public: + task_group() = default; + + template<typename Func> + void run(const Func& f) { + f(); + } + + template<typename Func> + void run_and_wait(const Func& f) { + f(); + } + + void wait() + {} + + bool is_canceling() { + return false; + } + + void cancel() + {} +}; + +} // threading +} // mc +} // nest + diff --git a/src/threading/parallel_stable_sort.h b/src/threading/parallel_stable_sort.h new file mode 100644 index 00000000..caf010dd --- /dev/null +++ b/src/threading/parallel_stable_sort.h @@ -0,0 +1,127 @@ +/* + Copyright (C) 2014 Intel Corporation + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, + INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, + BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS + OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED + AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY + WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + POSSIBILITY OF SUCH DAMAGE. +*/ +#include <algorithm> +#include <omp.h> + +#include "pss_common.h" + +namespace pss { + +namespace internal { + +// Merge sequences [xs,xe) and [ys,ye) to output sequence [zs,zs+(xe-xs)+(ye-ys)) +// Destroy input sequence iff destroy==true +template<typename RandomAccessIterator1, typename RandomAccessIterator2, typename RandomAccessIterator3, typename Compare> +#if __INTEL_COMPILER<=1500 +// Work around bug where firstprivate applied to formal parameter does not work. +void parallel_move_merge( RandomAccessIterator1 xs_, RandomAccessIterator1 xe, RandomAccessIterator2 ys_, RandomAccessIterator2 ye, RandomAccessIterator3 zs_, bool destroy, Compare comp ) { + RandomAccessIterator1 xs = xs_; + RandomAccessIterator2 ys = ys_; + RandomAccessIterator3 zs = zs_; +#else +void parallel_move_merge( RandomAccessIterator1 xs, RandomAccessIterator1 xe, RandomAccessIterator2 ys, RandomAccessIterator2 ye, RandomAccessIterator3 zs, bool destroy, Compare comp ) { +#endif + const int MERGE_CUT_OFF = 2000; + while( (xe-xs) + (ye-ys) > MERGE_CUT_OFF ) { + RandomAccessIterator1 xm; + RandomAccessIterator2 ym; + if( xe-xs < ye-ys ) { + ym = ys+(ye-ys)/2; + xm = std::upper_bound(xs,xe,*ym,comp); + } else { + xm = xs+(xe-xs)/2; + ym = std::lower_bound(ys,ye,*xm,comp); + } +#pragma omp task untied mergeable firstprivate(xs,xm,ys,ym,zs,destroy,comp) + parallel_move_merge( xs, xm, ys, ym, zs, destroy, comp ); + zs += (xm-xs) + (ym-ys); + xs = xm; + ys = ym; + } + serial_move_merge( xs, xe, ys, ye, zs, comp ); + if( destroy ) { + serial_destroy( xs, xe ); + serial_destroy( ys, ye ); + } +#pragma omp taskwait +} + +// Sorts [xs,xe), where zs[0:xe-xs) is temporary buffer supplied by caller. +// Result is in [xs,xe) if inplace==true, otherwise in [zs,zs+(xe-xs)) +template<typename RandomAccessIterator1, typename RandomAccessIterator2, typename Compare> +void parallel_stable_sort_aux( RandomAccessIterator1 xs, RandomAccessIterator1 xe, RandomAccessIterator2 zs, int inplace, Compare comp ) { + //typedef typename std::iterator_traits<RandomAccessIterator2>::value_type T; + const int SORT_CUT_OFF = 500; + if( xe-xs<=SORT_CUT_OFF ) { + stable_sort_base_case(xs, xe, zs, inplace, comp); + } else { + RandomAccessIterator1 xm = xs + (xe-xs)/2; + RandomAccessIterator2 zm = zs + (xm-xs); + RandomAccessIterator2 ze = zs + (xe-xs); +#pragma omp task + parallel_stable_sort_aux( xs, xm, zs, !inplace, comp ); + parallel_stable_sort_aux( xm, xe, zm, !inplace, comp ); +#pragma omp taskwait + if( inplace ) + parallel_move_merge( zs, zm, zm, ze, xs, inplace==2, comp ); + else + parallel_move_merge( xs, xm, xm, xe, zs, false, comp ); + } +} + +} // namespace internal + + + +template<typename RandomAccessIterator, typename Compare> +void parallel_stable_sort( RandomAccessIterator xs, RandomAccessIterator xe, Compare comp ) { + typedef typename std::iterator_traits<RandomAccessIterator>::value_type T; + if( internal::raw_buffer z = internal::raw_buffer( sizeof(T)*(xe-xs) ) ) + if( omp_get_num_threads() > 1 ) + internal::parallel_stable_sort_aux( xs, xe, (T*)z.get(), 2, comp ); + else + #pragma omp parallel + #pragma omp master + internal::parallel_stable_sort_aux( xs, xe, (T*)z.get(), 2, comp ); + else + // Not enough memory available - fall back on serial sort + std::stable_sort( xs, xe, comp ); +} + +template<class RandomAccessIterator> +void parallel_stable_sort( RandomAccessIterator xs, RandomAccessIterator xe ) { + typedef typename std::iterator_traits<RandomAccessIterator>::value_type T; + parallel_stable_sort( xs, xe, std::less<T>() ); +} + + +} // namespace pss diff --git a/src/threading/pss_common.h b/src/threading/pss_common.h new file mode 100644 index 00000000..d49f6dbf --- /dev/null +++ b/src/threading/pss_common.h @@ -0,0 +1,110 @@ +/* + Copyright (C) 2014 Intel Corporation + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, + INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, + BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS + OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED + AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY + WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + POSSIBILITY OF SUCH DAMAGE. +*/ +#include <utility> +#include <iterator> +#include <algorithm> + +namespace pss { + +namespace internal { + +//! Destroy sequence [xs,xe) +template<class RandomAccessIterator> +void serial_destroy( RandomAccessIterator zs, RandomAccessIterator ze ) { + typedef typename std::iterator_traits<RandomAccessIterator>::value_type T; + while( zs!=ze ) { + --ze; + (*ze).~T(); + } +} + +//! Merge sequences [xs,xe) and [ys,ye) to output sequence [zs,(xe-xs)+(ye-ys)), using std::move +template<class RandomAccessIterator1, class RandomAccessIterator2, class RandomAccessIterator3, class Compare> +void serial_move_merge( RandomAccessIterator1 xs, RandomAccessIterator1 xe, RandomAccessIterator2 ys, RandomAccessIterator2 ye, RandomAccessIterator3 zs, Compare comp ) { + if( xs!=xe ) { + if( ys!=ye ) + for(;;) + if( comp(*ys,*xs) ) { + *zs = std::move(*ys); + ++zs; + if( ++ys==ye ) break; + } else { + *zs = std::move(*xs); + ++zs; + if( ++xs==xe ) goto movey; + } + ys = xs; + ye = xe; + } +movey: + std::move( ys, ye, zs ); +} + +template<typename RandomAccessIterator1, typename RandomAccessIterator2, typename Compare> +void stable_sort_base_case( RandomAccessIterator1 xs, RandomAccessIterator1 xe, RandomAccessIterator2 zs, int inplace, Compare comp) { + std::sort( xs, xe, comp ); + if( inplace!=2 ) { + RandomAccessIterator2 ze = zs + (xe-xs); + typedef typename std::iterator_traits<RandomAccessIterator2>::value_type T; + if( inplace ) + // Initialize the temporary buffer + for( ; zs<ze; ++zs ) + new(&*zs) T; + else + // Initialize the temporary buffer and move keys to it. + for( ; zs<ze; ++xs, ++zs ) + new(&*zs) T(std::move(*xs)); + } +} + +//! Raw memory buffer with automatic cleanup. +class raw_buffer { + void* ptr; +public: + //! Try to obtain buffer of given size. + raw_buffer( size_t bytes ) : ptr( operator new(bytes,std::nothrow) ) {} + //! True if buffer was successfully obtained, zero otherwise. + operator bool() const {return ptr;} + //! Return pointer to buffer, or NULL if buffer could not be obtained. + void* get() const {return ptr;} + //! Destroy buffer + ~raw_buffer() {operator delete(ptr);} +}; + +} // namespace internal + + +/*template<typename T> +void parallel_stable_sort(typename std::iterator<T> i,typename std::iterator<T> j){ + parallel_stable_sort( i, j, std::less<T>() ); +}*/ +} // namespace pss diff --git a/src/threading/threading.hpp b/src/threading/threading.hpp index 76431d7d..26b2bace 100644 --- a/src/threading/threading.hpp +++ b/src/threading/threading.hpp @@ -2,6 +2,8 @@ #if defined(WITH_TBB) #include "tbb.hpp" +#elif defined(WITH_OMP) + #include "omp.hpp" #else #define WITH_SERIAL #include "serial.hpp" diff --git a/tests/unit/test_spike_store.cpp b/tests/unit/test_spike_store.cpp index dd075f87..cf5e5290 100644 --- a/tests/unit/test_spike_store.cpp +++ b/tests/unit/test_spike_store.cpp @@ -1,5 +1,6 @@ #include "gtest.h" +#include <threading/threading.hpp> #include <thread_private_spike_store.hpp> TEST(spike_store, insert) -- GitLab