Skip to content

Commit

Permalink
clean up docs
Browse files Browse the repository at this point in the history
  • Loading branch information
gfursin committed Sep 4, 2022
1 parent ecbe9b2 commit 7e21eef
Show file tree
Hide file tree
Showing 7 changed files with 130 additions and 117 deletions.
45 changes: 33 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,43 @@
# Collective Knowledge approach
# Collective Knowledge concept

Research, development and deployment of novel technologies
is becoming increasingly [challenging, time consuming, costly and messy](https://www.mihaileric.com/posts/mlops-is-a-mess).
We often have to spend lots of time on exciting development of ad-hoc automation scripts
to connect numerous incompatible tools to build, test, optimize and deploy complex applications and manage all related artifacts
is becoming increasingly [challenging, time consuming and costly](https://www.mihaileric.com/posts/mlops-is-a-mess).
We have to spend lots of time developing many ad-hoc automation scripts
to connect many great but often incompatible tools to build, test, optimize and deploy complex applications and manage all related artifacts
across rapidly evolving software and hardware from the cloud to the edge.

The Collective Knowledge approach (CK) is to automatically turn ad-hoc scripts and artifacts from the community
into an open database of [reusable, portable, customizable and deterministic components](cm/docs/tutorial-scripts.md)
The Collective Knowledge concept (CK) is to automatically turn ad-hoc scripts and artifacts from the community
into [reusable, portable, customizable and deterministic components](https://arxiv.org/pdf/2011.01149.pdf)
with no or minimal effort from a user.

All such components have a unified API, human readable CLI and extensible JSON/YAML meta description
making it possible to reuse them in different projects and chain them together
into powerful, efficient and portable automation workflows, applications and web services
adaptable to continuously changing software and hardware.

The following example demonstrates how to use the Collective Mind toolkit (CM - the 2nd generation of the CK framework)
to run the [modular image classification workflow](https://github.com/mlcommons/ck/blob/master/cm-mlops/script/app-image-classification-onnx-py/_cm.json)
assembled from [such shared components called portable CM scripts](https://github.com/mlcommons/ck/blob/master/cm-mlops/script)
that will automatically detect, download, install and build all related artifacts and tools to adapt this workflow to a user platform
with Linux, Windows or MacOS:

```bash
python3 -m pip install cmind
cm pull repo mlcommons@ck
cm run script --tags=app,image-classification,onnx,python --quiet
```
It may take a few minutes to run this workflow for the first time and adapt it to your platform depending on the internet speed.
Note that all the subsequent runs will be much faster because CM automatically caches the output of all components to be quickly reused
in this and other CM workflows.

Originally, we have developed CK to automate [reproducibility initiatives and artifact evaluation at conferences](https://cTuning.org/ae)
and make it easier for researchers and engineers to [validate their ideas in the real world](https://learning.acm.org/techtalks/reproducibility).
However, it turned out that the CK approach also helped [multiple organizations](https://cKnowledge.org/partners.html)
modularize complex ML and AI Systems and automate their benchmarking, optimization and deployment.

That's why we have decided to donate CK to [MLCommons](https://mlcommons.org) to continue developing
this technology, modularize AI Systems and support reproducible research as a community effort
within the [public workgroup](docs/mlperf-education-workgroup.md).
That's why we have decided to donate CK to [MLCommons](https://mlcommons.org) to develop
the 2nd generation of this technology, modularize AI Systems and support reproducible research
as a community effort within the [public workgroup](docs/mlperf-education-workgroup.md).

Everyone is welcome to join our open workgroup to develop an open-source toolkit that can help everyone
share their knowledge, experience, artifacts and automation scripts in such a way
Expand Down Expand Up @@ -74,12 +89,11 @@ Go to the [CK project page](ck1) to get the legacy CK framework v2.6.1 or check
(C)opyright 2021-2022 [MLCommons](https://mlcommons.org)<br>
(C)opyright 2014-2021 [Grigori Fursin](https://cKnowledge.io/@gfursin) and the [cTuning foundation](https://cTuning.org)

## Our community projects
## Community projects

* [MLPerf education workgroup to modularize AI and ML Systems](docs/mlperf-education-workgroup.md)
* [Artifact evaluation and reproducibility initiatives at ML and Systems conferences](https://cTuning.org/ae)


## Contributing

The best way to contribute to this project is to join our [open workgroup](docs/mlperf-education-workgroup.md)
Expand All @@ -92,9 +106,16 @@ and improve the core CM functionality.
* [Grigori Fursin](https://cKnowledge.io@gfursin)
* [Arjun Suresh](https://www.linkedin.com/in/arjunsuresh)

## References

* [Journal article with CK/CM concepts and our long-term vision](https://arxiv.org/pdf/2011.01149.pdf)
* [ACM TechTalk with CK/CM intro moderated by Peter Mattson (MLCommons president)](https://www.youtube.com/watch?v=7zpeIVwICa4)
* [HPCA'22 presentation "MLPerf design space exploration and production deployment"](https://doi.org/10.5281/zenodo.6475385)

## Acknowledgments

We would like to thank all [contributors](https://github.com/mlcommons/ck/blob/master/CONTRIBUTING.md)
We would like to thank [MLCommons](https://mlcommons.org),
[OctoML](https://octoml.ai), all [contributors](https://github.com/mlcommons/ck/blob/master/CONTRIBUTING.md)
and [collaborators](https://cKnowledge.org/partners.html) for their support, fruitful discussions,
and useful feedback! See more acknowledgments in the [CK journal article](https://arxiv.org/abs/2011.01149)
and our [ACM TechTalk](https://www.youtube.com/watch?v=7zpeIVwICa4).
76 changes: 15 additions & 61 deletions cm-mlops/README.md
Original file line number Diff line number Diff line change
@@ -1,68 +1,22 @@
# CM repository to enable more determinstic, portable and reproducible MLOps
# CM MLOps repository

[![CM repository](https://img.shields.io/badge/Collective%20Mind-compatible-blue)](https://github.com/mlcommons/ck/tree/master/cm)
[![CM artifact](https://img.shields.io/badge/Artifact-automated%20and%20reusable-blue)](https://github.com/mlcommons/ck/tree/master/cm)

This repository contains [portable scripts](https://github.com/mlcommons/ck/tree/master/cm-mlops/script)
in the [CM format](https://github.com/mlcommons/ck) to unify and interconnect
different MLOps and DevOps tools.

It is becoming very challenging to co-design, optimize and deploy efficient AI Systems in the real world:
["MLOps Is a Mess But That's to be Expected"](https://www.mihaileric.com/posts/mlops-is-a-mess).
All such components have a unified API, human readable CLI and extensible JSON/YAML meta description
making it possible to reuse them in different projects and chain them together
into powerful, efficient and portable automation workflows, applications and web services
adaptable to continuously changing software and hardware.

However, [our experience](https://doi.org/10.5281/zenodo.6475385)
suggests that it is possible to [apply DevOps principles to MLOps](https://www.datanami.com/2022/03/30/birds-arent-real-and-neither-is-mlops/)
if we treat all AI, ML and Systems artifacts including models, data sets, frameworks, libraries and scripts as "code" meta packages
with dependencies on other artifacts, operating systems and hardware.
We use and extend this repository in the [open education workgroup](../docs/mlperf-education-workgroup.md)
as a common playground and a common language to help researchers and engineers
learn how to modularize complex software systems (such as AI and ML)
and automate their benchmarking, optimization, co-design and deployment.

We use this [CM-based repository](https://github.com/mlcommons/cm-mlops)
as a common playground and a common language to learn with the community
how to automate benchmarking, optimization, co-design and deployment
of complex ML Systems and make it more deterministic, portable and reproducible
across continusly changing software and hardware stacks.


# How to use

## Install CM toolkit and dependencies

Install the CM toolkit as described [here](https://github.com/mlcommons/ck/blob/master/cm/docs/installation.md).

## Install this CM repository

Use CM to install this repository on your system:

```bash
$ cm pull repo mlcommons@ck
```

You can see this and other CM-compatible repositories installed on your system as follows:
```bash
$ cm list repo
```

You can list reusable automations as follows:
```bash
$ cm find automation
```

You can now list available MLOps automation scripts as follows:
```bash
$ cm list script
```

You can run any portable and reusable MLOps automation script as follows:
```bash
$ cm run script {CM script alias or UID}
```


*More to come soon ...*


## Check CM tutorials

TBD


# Contacts

* [Grigori Fursin](https://cKnowledge.io/@gfursin)
* [Arjun Suresh](https://www.linkedin.com/in/arjunsuresh)
Read about the CM concept [here](https://github.com/mlcommons/ck)
and follow [this tutorial](../cm/docs/tutorial-scripts.md)
to install CM framework and understand CM concepts.
91 changes: 60 additions & 31 deletions cm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,28 @@
[![Python Version](https://img.shields.io/badge/python-3+-blue.svg)](https://github.com/mlcommons/ck/tree/master/cm)
[![License](https://img.shields.io/badge/License-Apache%202.0-green)](https://github.com/mlcommons/ck/tree/master/cm)

[![Documentation Status](https://img.shields.io/badge/docs-passing-green)](https://cKnowledge.org/docs/cm)
[![Documentation](https://img.shields.io/badge/Documentation-available%20online-green)](https://cKnowledge.org/docs/cm)
[![CM(CK2) test](https://github.com/mlcommons/ck/actions/workflows/test-cm.yml/badge.svg)](https://github.com/mlcommons/ck/actions/workflows/test-cm.yml)

The Collective Mind toolkit helps you to add and share [simple, human-readable
and platform-independent CLI and JSON API](https://github.com/mlcommons/ck/tree/master/cm-mlops/script)
to existing DevOps and MLOps automation scripts and artifacts to make them more understandable, portable, reusable, interoperable, deterministic and reproducible

There are [many great automation tools and workflow frameworks](https://www.mihaileric.com/posts/mlops-is-a-mess) -
some are good for researchers and some for engineers.
The Collective Mind toolkit (CM) is [our community effort](../docs/mlperf-education-workgroup.md)
to develop a portable meta-framework that is good for both.

CM helps researchers and engineers wrap ad-hoc DevOps and MLOps
automation scripts and artifacts with a simple, human-readable
and platform-independent CLI, Python API and JSON/YAML meta description
to make them more understandable, portable, reusable, interoperable, deterministic and reproducible
across continuously changing hardware, software and data with minimal or no changes to existing projects.

See an example of CM-based image classification that can run natively on any user platform with Linux, Windows and MacOS
while automatically adapting to a given software, hardware and data:
Such wrappers can be automatically connected together into powerful and portable workflows, applications and web-services
to abstract developers and scientists from the rapidly evolving world of technology.

See an example of a modular image classification assembled from such components
([portable CM scripts](https://github.com/mlcommons/ck/tree/master/cm-mlops/script))
that will automatically detect, download, install and build all related artifacts and
tools to adapt this workflow to a user platform with Linux, Windows or MacOS:

```bash
python3 -m pip install cmind
Expand All @@ -23,8 +36,9 @@ cm pull repo mlcommons@ck
cm run script --tags=app,image-classification,onnx,python --quiet
```

Normally, it will take just a few minutes to adapt this task to your platform (depending on your internet speed)
and run image classification.
It may take a few minutes to run this workflow for the first time and adapt it to your platform (depending on the Internet speed).
Note that all the subsequent runs will be much faster because CM automatically caches the output of all portable CM scripts to be quickly reused
in this and other CM workflows.

You can also force to install specific versions of ML artifacts
(models, data sets, engines, libraries, tools, etc)
Expand Down Expand Up @@ -57,27 +71,35 @@ cm run script --tags=get,cuda-devices

CM is [motivated](docs/motivation.md) by our tedious and interesting experience
[reproducing 150+ ML and systems papers and validating them in the real world](https://learning.acm.org/techtalks/reproducibility)
during so-called [artifact evaluation](https://cTuning.org/ae).
during different [reproducibility initiatives and artifact evaluation](https://cTuning.org/ae).

The CM toolkit helps users to gradually transform their existing projects, Git repositories, Docker containers,
The CM toolkit helps researchers and engineers transform their existing projects, Git repositories, Docker containers,
Jupyter notebooks and internal directories into an [open database of portable CM scripts](https://github.com/mlcommons/ck/tree/master/cm-mlops/script)
with a common API, extensible meta descriptions and a simple portability and interoperability layer
written in Python or shell scripts.

Such an evolutionary approach makes it easier to share ML, AI and other artifacts, knowledge and experience in a more unified, automated, portable,
reusable and reproducible way while simplifying and automating the development and deployment of complex applications
across rapidly evolving software and hardware stacks from the cloud to the edge.
Such an evolutionary approach helps the community share their knowledge, experience, artifacts and scripts
in a more unified, automated, portable, reusable and reproducible way while simplifying and automating
the development and deployment of complex applications across rapidly evolving software and hardware stacks
from the cloud to the edge.

The CM toolkit is the 2nd generation of the [Collective Knowledge framework (CK)]( https://arxiv.org/abs/2011.01149 )
that was [originally validated in academia and industry in the past few years]( https://cKnowledge.org/partners.html )
that was [originally developed in collaboration with companies and universities]( https://cKnowledge.org/partners.html )
to enable collaborative and reproducible development, optimization and deployment
of Pareto-efficient ML Systems in terms of accuracy, latency, throughput, energy, size and costs
across continuously changing software, hardware, user environments, settings, models and data.


# Copyright

[MLCommons](https://mlcommons.org) 2022


# News

* **2022 September 1:** We developed a CM workflow to automate and modularize [MLPerf inference benchmark](docs/tutorial-modular-mlperf.md).
We continue these developments within a public [MLPerf education workgroup](../docs/mlperf-education-workgroup.md).

* **2022 July 25:** We updated tutorial about CM scripts: https://github.com/mlcommons/ck/blob/master/cm/docs/tutorial-scripts.md .

* **2022 July 21:** We have pre-released relatively stable scripts for portable DevOps and MLOps at https://github.com/mlcommons/ck/tree/master/cm-mlops/script .
Expand All @@ -95,11 +117,6 @@ across continuously changing software, hardware, user environments, settings, mo



# License

Apache 2.0



# Documentation

Expand All @@ -125,20 +142,18 @@ prefixed with *[CK2/CM core]* to improve and enhance the CM core
that helps to organize projects as a collective database
of reusable artifacts and automation scripts:



## CM automation scripts

CM provides a common playground and a common language to help researchers and engineers
discuss and learn how to [make benchmarking, optimization, co-design and deployment
of complex ML Systems](https://www.mihaileric.com/posts/mlops-is-a-mess) more deterministic, portable and reproducible across
continuously changing software and hardware stacks:
discuss and learn how to connect numerous incompatible tools together and make them
more deterministic, portable and reproducible across continuously changing software and hardware stacks.
We continue these discussions and developments within our [open workgroup](../docs/mlperf-education-workgroup.md):

* [CM scripts for portable MLOps and DevOps](https://github.com/mlcommons/ck/tree/master/cm-mlops/script)
* [CM automations](https://github.com/mlcommons/ck/tree/master/cm-mlops/automation)


## Development meetings
# Development meetings

* [Public notes](meetings/)
* [Regular conf-calls](meetings/conf-calls.md)
Expand All @@ -148,15 +163,29 @@ continuously changing software and hardware stacks:

* [MLOps projects, articles and tools](docs/KB/MLOps.md)

# Contributing

# Acknowledgments
The best way to contribute to this project is to join our [open workgroup](docs/mlperf-education-workgroup.md)
to help the community modularize AI, ML and other complex systems,
share your ML artifacts and automations as [reusable CM scripts](https://github.com/mlcommons/ck/tree/master/cm-mlops/script)
and improve the core CM functionality.

We thank the [users and partners of the original CK framework](https://cKnowledge.org/partners.html),
[OctoML](https://octoml.ai), [MLCommons](https://mlcommons.org)
and all our colleagues for their valuable feedback and support!
# References

* [Journal article with CK/CM concepts and our long-term vision](https://arxiv.org/pdf/2011.01149.pdf)
* [ACM TechTalk with CK/CM intro moderated by Peter Mattson (MLCommons president)](https://www.youtube.com/watch?v=7zpeIVwICa4)
* [HPCA'22 presentation "MLPerf design space exploration and production deployment"](https://doi.org/10.5281/zenodo.6475385)

# Contacts
# Acknowledgments

We would like to thank [MLCommons](https://mlcommons.org),
[OctoML](https://octoml.ai), all [contributors](https://github.com/mlcommons/ck/blob/master/CONTRIBUTING.md)
and [collaborators](https://cKnowledge.org/partners.html) for their support, fruitful discussions,
and useful feedback! See more acknowledgments in the [CK journal article](https://arxiv.org/abs/2011.01149)
and our [ACM TechTalk](https://www.youtube.com/watch?v=7zpeIVwICa4).

* [Grigori Fursin](https://cKnowledge.io/@gfursin)
# Maintainers

* [Grigori Fursin](https://cKnowledge.io@gfursin)
* [Arjun Suresh](https://www.linkedin.com/in/arjunsuresh)

13 changes: 9 additions & 4 deletions cm/docs/motivation.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,19 @@ software, hardware, user environments, settings and data sets.

![](https://cKnowledge.org/images/cm-gap-beween-mlsys-research-and-production2a.png)

We have noticed that while there many great automation tools and workflow automation frameworks out there,
some of them are good only for researchers and some are good only for engineers.

## Community effort

The open-source Collective Mind toolkit (CM aka CK2) is our community effort to solve above problems
by providing a unified CLI, API and extensible meta descriptions to existing artifacts and automation scripts for DevOps and MLOps
to make them more portable, interoperable, deterministic, reusable, reproducible and understandable
The open-source Collective Mind toolkit (CM aka CK2) is our community effort to develop a simple meta-framework
that can solve above problems and make existing tools easier to use for both researchers and engineers.

CM provides a unified CLI, API and extensible meta descriptions to existing artifacts and automation scripts
for DevOps and MLOps to make them more portable, interoperable, deterministic, reusable, reproducible and understandable
with minimal or no changes to existing projects!

The CM toolkit helps users to gradually transform their existing projects, Git repositories, Docker containers,
The CM toolkit helps users gradually transform their existing projects, Git repositories, Docker containers,
Jupyter notebooks and internal directories into an [open database of portable CM scripts](https://github.com/mlcommons/ck/tree/master/cm-mlops/script)
with a common API, extensible meta descriptions and a simple portability and interoperability layer
written in Python or shell scripts.
Expand Down
Loading

0 comments on commit 7e21eef

Please sign in to comment.