Major refactoring of five python algorithms
Created by: jassak
- Creation of
mipframework
for simplifying writing of python algorithms. - All SQL queries in framework are now handled by SQLAlchemy (no more vulnerable sql string manipulations).
- Creation of a small python API for HighCharts as well as a HighChart server in Flask for quickly testing them.
- Creation of runner written in python for running locally and debugging algorithms.
- Refactor of Pearson, PCA, LogisticRegression, CalibrationBelt and DescriptiveStatistics to work with framework.
- Adds very detailed logging for above mentioned algorithms (with file rotation).
- Complete test suite for all above mentioned algorithms. Each test suite contains:
- Thousands of automatically generated unit tests for all algorithm methods. This is the so called property-based tests, where random inputs with a lot of corner cases are generated en masse and some property (like no NaNs or infs in result) is verified. The library used is hypothesis.
- 80 algorithm correctness tests. These are full python and thus run much faster than the dockerized versions. Only one worker node is used with no privacy constraints.
- 15 federated tests. These are also full python, they run on 10 worker nodes with no privacy constraints. They are meant to test if the aggregation of local results into master node functions properly.
- 5 integration tests. These run in dockerized form together with Exareme. They are meant to test how the algorithms and exareme integrate correctly. They also should run without privacy constraints.
- Additionally, we created a separate suite of tests to test the privacy mechanism of each algorithm, when it runs in production environment (exareme+docker+privacy).
- New version of Descriptive Statistics algorithm where a single request is made even for multiple datasets. This will speed up loading time in Analysis tab. Old version still there as we decided that they should coexist for some time.
- Fix bug with LogisticRegression where a huge matrix was causing a memory error.
- Many more bugs were caught during testing.