algorithms: reduce memory footprint of local UDFs
Created by: jassak
Some memory expensive lines are identified, in local UDFs, and are replaced with equivalent memory efficient ones.
All changes fall into three categories:
- Avoid memory copies when operation can be done inplace
- Use of
numpy.einsum
, when applicable, for summing over arrays - Avoid using
sklearn
methods when re-implementing is easy, becausesklearn
hasn't been written with the above two considerations in mind
There is also one change in udfio.py
. When constructing a pandas.DataFrame
from separate columns, for relational tables, the flag copy=False
is now used. This avoids consolidating columns into a single 2-dimensional array. However, when some operation, later in the UDF, requires the columns to be consolidated, pandas
will eventually do it in the background, copying their data into a single contiguous block of memory. This is unavoidable since many operations in numpy
need to operate on contiguous blocks of memory in order to maximize their efficiency, since they exploit the CPU cache and the CPU vectorization capabilities.
In most cases the memory reduction is accompanied by a reduction in execution time as well.
Results are presented in the next table
file | method | data | mem before | mem after | perc | time before | time after | perc |
---|---|---|---|---|---|---|---|---|
pca.py |
local1 |
160MB | 172MB | 2KB | 0.001% | 250ms | 30ms | 12% |
pca.py |
local2 |
160MB | 305MB | 153MB | 50% | 300ms | 400ms | 133% |
pearson.py |
local1 |
320MB | 153MB | 5KB | 0.003% | 330ms | 300ms | 90% |
ttest_independent.py |
local_independet |
16MB | 24MB | 3KB | 0.012% | 360ms | 190ms | 53% |
ttest_onesample.py |
local_one_sample |
8MB | 16MB | 3KB | 0.018% | 380ms | 800μs | 0.21% |
ttest_paired.py |
local_paired |
16MB | 24MB | 8MB | 33% | 600ms | 4ms | 0.67% |
metrics.py |
_confusion_matrix_local |
16MB | 44MB | 2MB | 4.5% | 140ms | 6ms | 4.3% |
metrics.py |
_roc_curve_local |
16MB | 1.6GB | 2MB | 0.12% | 2.7s | 600ms | 22% |
logistic_regression.py |
LogistcRegression._fit_local_step |
24MB | 24MB | 8KB | 0.32% | 23ms | 233ms | 1000% |
linear_regression.py |
LinearRegression._compute_summary_local |
16MB | 8MB | 8MB | 100% | 360ms | 5ms | 1.3% |
anova_oneway.py |
local1 |
16MB | 106MB | 74MB | 70% | 200ms | 100ms | 50% |
destriptive_stats.py |
local |
168MB | 424MB | 358MB | 84% | 6s | 5.8s | 97% |
udfio.py |
from_relational_table |
2GB | 2GB | 8KB | ~0% | 1.6s | 150μs | 0.009% |