Skip to content
Snippets Groups Projects
Commit eac5253a authored by Benjamin Cumming's avatar Benjamin Cumming
Browse files

update README with KNL build instructions

parent 5271e619
No related branches found
No related tags found
No related merge requests found
...@@ -77,3 +77,143 @@ cmake <path to CMakeLists.txt> -DWITH_TBB=ON -DSYSTEM_CRAY=ON ...@@ -77,3 +77,143 @@ cmake <path to CMakeLists.txt> -DWITH_TBB=ON -DSYSTEM_CRAY=ON
cmake <path to CMakeLists.txt> -DWITH_TBB=ON -DWITH_MPI=ON -DSYSTEM_CRAY=ON cmake <path to CMakeLists.txt> -DWITH_TBB=ON -DWITH_MPI=ON -DSYSTEM_CRAY=ON
``` ```
# targetting KNL
## build modparser
The source to source compiler "modparser" that generates the C++/CUDA kernels for the ion channels and synapses is in a separate repository.
It is included in our project as a git submodule, and by default it will be built with the same compiler and flags that are used to build the miniapp and tests.
This can cause problems if we are cross compiling, e.g. for KNL, because the modparser compiler might not be runnable on the compilation node.
CMake will look for the source to source compiler executable, `modcc`, in the `PATH` environment variable, and will use the version if finds instead of building its own.
Modparser requires a C++11 compiler, and has been tested on GCC, Intel, and Clang compilers
- if the default compiler on your is some ancient version of gcc you might need to load a module/set the CC and CXX environment variables.
```bash
git clone git@github.com:eth-cscs/modparser.git
cd modparser
# example of setting a C++11 compiler
export CXX=`which gcc-4.8`
cmake .
make -j
# set path and test that you can see modcc
export PATH=`pwd`/bin:$PATH
which modcc
```
## set up environment
- source the intel compilers
- source the TBB vars
- I have only tested with the latest stable version from online, not the version that comes installed sometimes with the Intel compilers.
## build miniapp
```bash
# clone the repo and set up the submodules
git clone TODO
cd cell_algorithms
git submodule init
git submodule update
# make a path for out of source build
mkdir build_knl
cd build_knl
## build miniapp
```bash
# clone the repo and set up the submodules
git clone TODO
cd cell_algorithms
git submodule init
git submodule update
# make a path for out of source build
mkdir build_knl
cd build_knl
# run cmake with all the magic flags
export CC=`which icc`
export CXX=`which icpc`
cmake .. -DCMAKE_BUILD_TYPE=release -DWITH_TBB=ON -DWITH_PROFILING=ON -DVECTORIZE_TARGET=KNL -DUSE_OPTIMIZED_KERNELS=ON
make -j
```
The flags passed into cmake are described:
- `-DCMAKE_BUILD_TYPE=release` : build in release mode with `-O3`.
- `-WITH_TBB=ON` : use TBB for threading on multicore
- `-DWITH_PROFILING=ON` : use internal profilers that print profiling report at end
- `-DVECTORIZE_TARGET=KNL` : generate AVX512 instructions, alternatively you can use:
- `AVX2` for Haswell & Broadwell
- `AVX` for Sandy Bridge and Ivy Bridge
- `-DUSE_OPTIMIZED_KERNELS=ON` : tell the source to source compiler to generate optimized kernels that use Intel extensions
- without these vectorized code will not be generated.
## run tests
Run some unit tests
```bash
cd tests
./test.exe
cd ..
```
## run miniapp
The miniapp is the target for benchmarking.
First, we can run a small problem to check the build.
For the small test run, the parameters have the following meaning
- `-n 1000` : 1000 cells
- `-s 200` : 200 synapses per cell
- `-t 20` : simulated for 20ms
- `-p 0` : no file output of voltage traces
The number of cells is the number of discrete tasks that are distributed to the threads in each large time integration period.
The number of synapses per cell is the amount of computational work per cell/task.
Realistic cells have anywhere in the range of 1,000-10,000 synapses per cell.
```bash
cd miniapp
# a small run to check that everything works
./miniapp.exe -n 1000 -s 200 -t 20 -p 0
# a larger run for generating meaninful benchmarks
./miniapp.exe -n 2000 -s 2000 -t 100 -p 0
```
This generates the following profiler output (some reformatting to make the table work):
```
---------------------------------------
| small | large |
-------------|-------------------|-------------------|
total | 0.791 100.0 | 38.593 100.0 |
stepping | 0.738 93.3 | 36.978 95.8 |
matrix | 0.406 51.3 | 6.034 15.6 |
solve | 0.308 38.9 | 4.534 11.7 |
setup | 0.082 10.4 | 1.260 3.3 |
other | 0.016 2.0 | 0.240 0.6 |
state | 0.194 24.5 | 23.235 60.2 |
expsyn | 0.158 20.0 | 22.679 58.8 |
hh | 0.014 1.7 | 0.215 0.6 |
pas | 0.003 0.4 | 0.053 0.1 |
other | 0.019 2.4 | 0.287 0.7 |
current | 0.107 13.5 | 7.106 18.4 |
expsyn | 0.047 5.9 | 6.118 15.9 |
pas | 0.028 3.5 | 0.476 1.2 |
hh | 0.006 0.7 | 0.096 0.2 |
other | 0.026 3.3 | 0.415 1.1 |
events | 0.005 0.6 | 0.125 0.3 |
sampling | 0.003 0.4 | 0.051 0.1 |
other | 0.024 3.0 | 0.428 1.1 |
other | 0.053 6.7 | 1.614 4.2 |
-----------------------------------------------------
```
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment