MIP-588 cross validation prototype
Created by: jassak
- Implementation of CV for Linear Regression. This is inefficient due to two reasons:
- The implementation of KFold does a lot of calls to
run_udf_on_local_nodes
due to limitations in the current UDF generator. - Cross-validation is completely parallelizable and should be done asynchronously but it is currently done synchronously.
- The implementation of KFold does a lot of calls to
- UDF generator new feature: when a UDF takes a relational table as input, the
row_id
is passed as well and it is used as the index of the corresponding dataframe. - UDF generator new feature: a new constant
DEFERRED
can now be passed instead of an output relation's schema. This allows the user to defer the declaration of the schema to runtime. She is then required to pass the desired schema as an extra argument torun_udf_on_local_nodes
orrun_udf_on_global_node
. The implementation is temporary and should be redesigned once the UDF generator is refactored. - UDF generator enhancement: remove the variable name prefix in column names of relations/dataframes.
- Test case generator enhancement: when a generated input makes no sense it can be skipped by the implemented of
compute_expected_output
by simply returningNone
. - Various small fixes