Improved industrialization support: persisting of model (configuration), reapeated model scoring, model monitoring #127

sandervh14 · 2022-04-01T14:56:22Z

Add functionality to write away model metadata

It would be nice to have a function for writing away the date and time of each new modeling attempt, which variables were selected, which preprocessing was done and which was the resulting score.

Task Description

This could comprise:

store model metadata (scores, datetime, version etc) in a table
storage of files involved (model and preprocessor pickle, and potentially the data) on some filestorage or database blobs.

Provide the code for extracting this metadata, but allow a data scientist/engineer to write a plugin function to do the actual writing of the metadata to the database/filestore of choice.

sandervh14 · 2022-06-02T10:48:38Z

Hi @sborms, @ZlaTanskY, @nicolasmorandi and @c-morey!

Before we can finish implementing this issue entirely and have a better support for industrializing Cobra models, I think we need to discuss a few things.

Proposed use cases (and processes) throughout industiralization & model usage phase that we could support better in Cobra:

persisting the trained model:

has been trained with the existing Cobra API
persisting the preprocessor pipeline configuration as JSON file (1)
persisting the model scores achieved (2)
persisting the data (the basetable, containing all splits for reproducibility) (3)

model monitoring

scoring the model every now and then, on new data (basetables) and reusing persisted Cobra models.
detecting data drift - let's not re-invent the wheel and integrate with a solution online - Nicolas and I thought of trying out NannyML.
persist the findings (model scores, data metrics) of the above model monitoring steps (2)
persist the data (basetables) used for the above model monitoring steps (3)
visualizing the model monitoring, most preferably in a dashboard (4)

retraining the model (proactive/reactive maintenance of the model): same steps as use case 1.
facilitate easy deployment & running: provide example python scripts for production-grade runnable Cobra models post-notebooking-phase, provide example Docker images/docker-compose files, ...

Integrations necessary for the above:

persisting files - see (1) above: we support writing to the local file system at the moment, but should consider supporting uploading to a server location instead, or as a blob in a database, etc.
persisting model scores - see (2) above:
persisting data - see (3) above: support databases both locally (MySQL, PostgreSQL) or in the cloud (BigQuery etc.)
visualizing model monitoring findings - see (4) above: integrate with PowerBI or other dashboarding software.

Additional task: documenting the thoughts above
And any of your additions to it of course, the use cases and integrations of above that we support at the moment, and how to use them. While doing so, also fix the points listed in #133 (fix them here and close #133, or fix #133 separately and mention this issue as linked).

Am I missing interesting use cases or integrations above? Feel free to suggest.

Also: we cannot implement everything right now, and not even in the coming years, but must pick the most interesting things at each time, just adding the use cases and integrations just on-the-go as we are industrializing Cobra for clients with different demands and infrastructure.

I've also gathered the files from the Brico pull request and started structuring it a bit, so we can integrate their efforts into Cobra, see the draft pull request mentioned below on this page (only FYI). But I'd like to first discuss the above thoughts before proceeding on the gathered code, so we agree on what we want to do.

sandervh14 · 2023-03-08T09:44:44Z

Nicolás is interested in the investigation of MLFlow, that investigation could fit in this issue. See details on MLFlow's github, all 4 of the MLFlow components (Tracking for storing model parameters, Projects for reproducible runs, Models for easy deployment and Model registry to track model evolution throughout the model's lifecycle) are very interesting to build integrators for within Cobra. Up to you to decide @nicolasmorandi @pietrodantuono.

sandervh14 changed the title ~~Add functionality to write away model metadata~~ Add functionality to store model score and configuration (to model metadata database) Apr 27, 2022

sandervh14 mentioned this issue Jun 2, 2022

Clearer documentation about how to industrialize Cobra models #133

Open

sandervh14 changed the title ~~Add functionality to store model score and configuration (to model metadata database)~~ Improved industrialization support: persisting of model (configuration), reapeated model scoring, model monitoring Jun 2, 2022

sandervh14 linked a pull request Jun 2, 2022 that will close this issue

Improved industrialization support: persisting of model (configuration), reapeated model scoring, model monitoring #134

Draft

sandervh14 self-assigned this Jun 2, 2022

sandervh14 added the enhancement New feature or request label Jun 2, 2022

sandervh14 added this to the v1.2.0 milestone Jun 2, 2022

sandervh14 linked a pull request Jun 2, 2022 that will close this issue

Improved industrialization support: persisting of model (configuration), reapeated model scoring, model monitoring #134

Draft

sandervh14 modified the milestones: 2023-03, 2023-05 Mar 9, 2023

sandervh14 removed their assignment Mar 9, 2023

sandervh14 modified the milestones: 2023-05, New features development Mar 9, 2023

sandervh14 added the data engineering label Mar 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved industrialization support: persisting of model (configuration), reapeated model scoring, model monitoring #127

Improved industrialization support: persisting of model (configuration), reapeated model scoring, model monitoring #127

sandervh14 commented Apr 1, 2022

sandervh14 commented Jun 2, 2022 •

edited

Loading

sandervh14 commented Mar 8, 2023

Improved industrialization support: persisting of model (configuration), reapeated model scoring, model monitoring #127

Improved industrialization support: persisting of model (configuration), reapeated model scoring, model monitoring #127

Comments

sandervh14 commented Apr 1, 2022

Add functionality to write away model metadata

Task Description

sandervh14 commented Jun 2, 2022 • edited Loading

sandervh14 commented Mar 8, 2023

sandervh14 commented Jun 2, 2022 •

edited

Loading