Initial microbenchmark builds. (#227)

* Use git submodule for incorporating Google benchmark library. * Add one microbenchmark for comparing `util::transform_view` performance. Note that the microbenchmarks are not built by default; they can be built with `make ubenches`, and then run individually. The microbenchmarks will be built in `tests/ubench/`, relative to the build directory.

Initial microbenchmark builds. (#227)
* Use git submodule for incorporating Google benchmark library. * Add one microbenchmark for comparing `util::transform_view` performance. Note that the microbenchmarks are not built by default; they can be built with `make ubenches`, and then run individually. The microbenchmarks will be built in `tests/ubench/`, relative to the build directory.
c579fa19 · Sam Yates · Ben Cumming · 0443c271 · c579fa19 · c579fa19
Commit c579fa19 authored 8 years ago by Sam Yates Committed by Ben Cumming 8 years ago
--- a/.gitmodules
+++ b/.gitmodules
+[submodule "tests/ubench/google-benchmark"]
+	path = tests/ubench/google-benchmark
+	url = https://github.com/google/benchmark
--- a/tests/CMakeLists.txt
+++ b/tests/CMakeLists.txt
@@ -13,6 +13,10 @@ add_subdirectory(global_communication)
 # Tests for performance: This could include stand alone tests. These do not necessarily be run automatically
 add_subdirectory(performance)

+# Microbenchmarks.
+add_subdirectory(ubench)
+
+
 # modcc tests
 if(NOT use_external_modcc)
    add_subdirectory(modcc)

--- a/tests/ubench/CMakeLists.txt
+++ b/tests/ubench/CMakeLists.txt
+include(ExternalProject)
+
+# List of micro benchmarks to build.
+
+set(bench_sources
+    accumulate_functor_values.cpp)
+
+# Set up google benchmark as an external project.
+
+set(gbench_src_dir "${CMAKE_CURRENT_SOURCE_DIR}/google-benchmark")
+set(gbench_install_dir "${PROJECT_BINARY_DIR}/gbench")
+
+set(gbench_cmake_args
+    "-DCMAKE_BUILD_TYPE=release"
+    "-DCMAKE_INSTALL_PREFIX=${gbench_install_dir}"
+    "-DCMAKE_C_COMPILER=${CMAKE_C_COMPILER}"
+    "-DCMAKE_CXX_COMPILER=${CMAKE_CXX_COMPILER}")
+
+
+# Attempt to update git submodule if required.
+find_package(Git)
+if(NOT EXISTS "${gbench_src_dir}/.git")
+    if(GIT_FOUND)
+        exec_program("${GIT_EXECUTABLE}" "${CMAKE_CURRENT_SOURCE_DIR}"
+            ARGS submodule update --init google-benchmark)
+    else()
+        message(WARNING "Unable to update the google-benchmark submodule: git not found.")
+    endif()
+endif()
+
+ExternalProject_Add(gbench
+    SOURCE_DIR "${gbench_src_dir}"
+    CMAKE_ARGS "${gbench_cmake_args}"
+    INSTALL_DIR "${gbench_install_dir}"
+)
+set_target_properties(gbench PROPERTIES EXCLUDE_FROM_ALL TRUE)
+
+# Build benches.
+
+foreach(bench_src ${bench_sources})
+    string(REGEX REPLACE "\\.[^.]*$" "" bench_exe "${bench_src}")
+    add_executable("${bench_exe}" EXCLUDE_FROM_ALL "${bench_src}")
+    add_dependencies("${bench_exe}" gbench)
+    target_include_directories("${bench_exe}" PRIVATE "${gbench_install_dir}/include")
+    target_link_libraries("${bench_exe}" "${gbench_install_dir}/lib/libbenchmark.a")
+
+    list(APPEND bench_exe_list ${bench_exe})
+endforeach()
+
+add_custom_target(ubenches DEPENDS ${bench_exe_list})
+
--- a/tests/ubench/README.md
+++ b/tests/ubench/README.md
+# Library microbenchmarks
+
+The benchmarks here are intended to:
+* answer questions regarding choices of implementation in the library where performance is a concern;
+* track the performance behaviour of isolated bits of library functionality across different platforms.
+
+
+## Building and running
+
+The micro-benchmarks are not built by default. After configuring CMake, they can be built with
+`make ubenches`. Each benchmark is provided by a stand-alone C++ source file in `tests/ubench`;
+the resulting executables are found in `test/ubench` relative to the build directory.
+
+[Google benchmark](https://github.com/google/benchmark) is used as a harness. It is included
+in the repository via a git submodule, and the provided CMake scripts will attempt to
+run `git submodule update --init` on the submodule if it appears not to have been instantiated.
+
+
+## Adding new benchmarks
+
+New benchmarks are added by placing the corresponding implementation as a stand-alone
+`.cpp` file in `tests/ubench` and adding the name of this file to the list `bench_sources`
+in `tests/ubench/CMakeLists.txt`.
+
+Each new benchmark should also have a corresponding entry in this `README.md`, describing
+the motivation for the test and summarising at least one benchmark result.
+
+Results in this file are destined to become out of date; we should consider some form
+of semi-automated registration of results in a database should the number of benchmarks
+become otherwise unwieldy.
+
+
+## Benchmarks
+
+### `accumulate_functor_values`
+
+#### Motivation
+
+The problem arises when constructing the partition of an integral range where the sizes of each
+sub-interval are given by a function of the index. This requires the computation of the sizes
+> d<sub><i>i</i></sub> = Σ<sub><i>j</i>&lt;<i>i</i></sub> <i>f</i>(<i>j</i>).
+
+One approach using the provided range utilities is to use `std::partial_sum` with
+`util::transform_view` and `util::span`; the other is to simply write a loop that
+performs the accumulation directly. What is the extra cost, if any, of the
+transform-based approach?
+
+The micro-benchmark compares the two implementations, where the function is a simple
+integer square operation, called either via a function pointer or a functional object.
+
+#### Results
+
+Results here are presented only for vector size _n_ equal to 1024.
+
+Platform:
+*  Xeon E3-1220 v2 with base clock 3.1 GHz and max clock 3.5 GHz. 
+*  Linux 4.4.34
+*  gcc version 6.2.0
+*  clang version 3.8.1
+
+| Compiler    | direct/function | transform/function | direct/object | transform/object |
+|:------------|----------------:|-------------------:|--------------:|-----------------:|
+| g++ -O3     |  907 ns | 2090 ns |  907 ns | 614 ns |
+| clang++ -O3 | 1063 ns |  533 ns | 1051 ns | 532 ns |
+
--- a/tests/ubench/accumulate_functor_values.cpp
+++ b/tests/ubench/accumulate_functor_values.cpp
+// Compare implementations of partial summation of the f(i) for i=1..n,
+// for a simple square function.
+
+// Explicitly undef NDEBUG for assert below.
+#undef NDEBUG
+
+#include <cassert>
+#include <numeric>
+#include <vector>
+
+#include <benchmark/benchmark.h>
+
+#include <util/span.hpp>
+#include <util/transform.hpp>
+
+#define NOINLINE __attribute__((noinline))
+
+using namespace nest::mc;
+
+inline long long square_function(long long x) { return x*x; }
+
+struct square_object {
+    long long operator()(long long x) const { return x*x; }
+};
+
+using result_vec = std::vector<long long>;
+
+template <typename Func>
+void partial_sums_direct(Func f, int upto, result_vec& psum) {
+    long long sum = 0;
+    for (int i=1; i<=upto; ++i) {
+        sum += f(i);
+        psum[i-1] = sum;
+    }
+}
+
+template <typename Func>
+void partial_sums_transform(Func f, int upto, result_vec& psum) {
+    auto nums = util::span<long long>(1, upto+1);
+    auto values = util::transform_view(nums, f);
+    std::partial_sum(values.begin(), values.end(), psum.begin());
+}
+
+template <typename Impl>
+void bench_generic(benchmark::State& state, const Impl& impl) {
+    int upto = state.range(0);
+    result_vec psum(upto);
+
+    while (state.KeepRunning()) {
+        impl(upto, psum);
+        benchmark::ClobberMemory();
+    }
+
+    // validate result
+    auto sum_squares_to = [](long long x) {return (2*x*x*x+3*x*x+x)/6; };
+    for (int i = 0; i<upto; ++i) {
+        assert(sum_squares_to(i+1)==psum[i]);
+    }
+}
+
+void accum_direct_function(benchmark::State& state) {
+    bench_generic(state,
+        [](int upto, result_vec& psum) { partial_sums_direct(square_function, upto, psum); });
+}
+
+void accum_direct_object(benchmark::State& state) {
+    bench_generic(state,
+        [](int upto, result_vec& psum) { partial_sums_direct(square_object{}, upto, psum); });
+}
+
+void accum_transform_function(benchmark::State& state) {
+    bench_generic(state,
+        [](int upto, result_vec& psum) { partial_sums_transform(square_function, upto, psum); });
+}
+
+void accum_transform_object(benchmark::State& state) {
+    bench_generic(state,
+        [](int upto, result_vec& psum) { partial_sums_transform(square_object{}, upto, psum); });
+}
+
+BENCHMARK(accum_direct_function)->Range(64, 1024);
+BENCHMARK(accum_transform_function)->Range(64, 1024);
+BENCHMARK(accum_direct_object)->Range(64, 1024);
+BENCHMARK(accum_transform_object)->Range(64, 1024);
+
+BENCHMARK_MAIN();
+
--- a/google-benchmark @ 9a5072d1
+++ b/google-benchmark @ 9a5072d1
+Subproject commit 9a5072d1bf9187b32ce9a88842dffa31ef416442