Release 2.2.0-k8 · splicemachine/ml-workflow

What's New?

Stronger AWS Sagemaker deployment support using k8s ServiceAccounts
Model metadata tracking for in-db deployed models using the MODEL_METADATA and LIVE_MODEL_STATUS table and view
Support for in-db deployment for Keras linear models (LSTMs/RNNs/CNNs not yet supported).
Support for in-db deployment XGBoost using H2O/SKlearn implementations
SKLearn bug fix with fastnumbers
SKlearn better support for non-double return types
Upgrade from pickle -> cloudpickle for sklearn model serialization, adding support for both external and lambda functions inside SKLearn Pipelines
Merge in-db deployment to a 1 table design from a 2-table design. All features + model prediction(s) are stored in a single table
Support for deploying models to an existing table
Support for selecting which columns from a table are used in the model prediction. This allows you to deploy models to a "subset" fo a table.
Better support for in-db deployment for sklearn Pipelines that have predict parameters
deploy_db api cleanup: Removed model parameter and make run_id required. Model is pulled behind the scenes. DF parameter is optional and not required if deploying model to existing table.
General code cleanup

deploy_db will no longer work with old parameters. New parameter set and order is required.
createTable from the PySpliceContext now has parameters ordered dataframe, schema_table_name instead of the other way around to match all other APIs in the module.

This release is in tandem with the PySplice release.

Upgrade scripts from 2.1.0 are attached below

Please see the patch release for an important fix.