Skip to content

MIP-588 cross validation prototype

Created by: jassak

  • Implementation of CV for Linear Regression. This is inefficient due to two reasons:
    1. The implementation of KFold does a lot of calls to run_udf_on_local_nodes due to limitations in the current UDF generator.
    2. Cross-validation is completely parallelizable and should be done asynchronously but it is currently done synchronously.
  • UDF generator new feature: when a UDF takes a relational table as input, the row_id is passed as well and it is used as the index of the corresponding dataframe.
  • UDF generator new feature: a new constant DEFERRED can now be passed instead of an output relation's schema. This allows the user to defer the declaration of the schema to runtime. She is then required to pass the desired schema as an extra argument to run_udf_on_local_nodes or run_udf_on_global_node. The implementation is temporary and should be redesigned once the UDF generator is refactored.
  • UDF generator enhancement: remove the variable name prefix in column names of relations/dataframes.
  • Test case generator enhancement: when a generated input makes no sense it can be skipped by the implemented of compute_expected_output by simply returning None.
  • Various small fixes

Merge request reports