clean up docs

mlcommons · Sep 4, 2022 · 7e21eef · 7e21eef
1 parent ecbe9b2
commit 7e21eef
Show file tree

Hide file tree

Showing 7 changed files with 130 additions and 117 deletions.
diff --git a/README.md b/README.md
@@ -1,28 +1,43 @@
-# Collective Knowledge approach
+# Collective Knowledge concept
 
 Research, development and deployment of novel technologies 
-is becoming increasingly [challenging, time consuming, costly and messy](https://www.mihaileric.com/posts/mlops-is-a-mess).
-We often have to spend lots of time on exciting development of ad-hoc automation scripts 
-to connect numerous incompatible tools to build, test, optimize and deploy complex applications and manage all related artifacts 
+is becoming increasingly [challenging, time consuming and costly](https://www.mihaileric.com/posts/mlops-is-a-mess).
+We have to spend lots of time developing many ad-hoc automation scripts 
+to connect many great but often incompatible tools to build, test, optimize and deploy complex applications and manage all related artifacts 
 across rapidly evolving software and hardware from the cloud to the edge.
 
-The Collective Knowledge approach (CK) is to automatically turn ad-hoc scripts and artifacts from the community
-into an open database of [reusable, portable, customizable and deterministic components](cm/docs/tutorial-scripts.md)
+The Collective Knowledge concept (CK) is to automatically turn ad-hoc scripts and artifacts from the community
+into [reusable, portable, customizable and deterministic components](https://arxiv.org/pdf/2011.01149.pdf)
 with no or minimal effort from a user.
 
 All such components have a unified API, human readable CLI and extensible JSON/YAML meta description
 making it possible to reuse them in different projects and chain them together 
 into powerful, efficient and portable automation workflows, applications and web services
 adaptable to continuously changing software and hardware.
 
+The following example demonstrates how to use the Collective Mind toolkit (CM - the 2nd generation of the CK framework) 
+to run the [modular image classification workflow](https://github.com/mlcommons/ck/blob/master/cm-mlops/script/app-image-classification-onnx-py/_cm.json) 
+assembled from [such shared components called portable CM scripts](https://github.com/mlcommons/ck/blob/master/cm-mlops/script) 
+that will automatically detect, download, install and build all related artifacts and tools to adapt this workflow to a user platform 
+with Linux, Windows or MacOS:
+
+```bash
+python3 -m pip install cmind
+cm pull repo mlcommons@ck
+cm run script --tags=app,image-classification,onnx,python --quiet
+```
+It may take a few minutes to run this workflow for the first time and adapt it to your platform depending on the internet speed.
+Note that all the subsequent runs will be much faster because CM automatically caches the output of all components to be quickly reused
+in this and other CM workflows.
+
 Originally, we have developed CK to automate [reproducibility initiatives and artifact evaluation at conferences](https://cTuning.org/ae)
 and make it easier for researchers and engineers to [validate their ideas in the real world](https://learning.acm.org/techtalks/reproducibility).
 However, it turned out that the CK approach also helped [multiple organizations](https://cKnowledge.org/partners.html) 
 modularize complex ML and AI Systems and automate their benchmarking, optimization and deployment.
 
-That's why we have decided to donate CK to [MLCommons](https://mlcommons.org) to continue developing 
-this technology, modularize AI Systems and support reproducible research as a community effort 
-within the [public workgroup](docs/mlperf-education-workgroup.md).
+That's why we have decided to donate CK to [MLCommons](https://mlcommons.org) to develop
+the 2nd generation of this technology, modularize AI Systems and support reproducible research 
+as a community effort within the [public workgroup](docs/mlperf-education-workgroup.md).
 
 Everyone is welcome to join our open workgroup to develop an open-source toolkit that can help everyone
 share their knowledge, experience, artifacts and automation scripts in such a way 
@@ -74,12 +89,11 @@ Go to the [CK project page](ck1) to get the legacy CK framework v2.6.1 or check
 (C)opyright 2021-2022 [MLCommons](https://mlcommons.org)<br>
 (C)opyright 2014-2021 [Grigori Fursin](https://cKnowledge.io/@gfursin) and the [cTuning foundation](https://cTuning.org)
 
-## Our community projects
+## Community projects
 
 * [MLPerf education workgroup to modularize AI and ML Systems](docs/mlperf-education-workgroup.md)
 * [Artifact evaluation and reproducibility initiatives at ML and Systems conferences](https://cTuning.org/ae)
 
-
 ## Contributing
 
 The best way to contribute to this project is to join our [open workgroup](docs/mlperf-education-workgroup.md)
@@ -92,9 +106,16 @@ and improve the core CM functionality.
 * [Grigori Fursin](https://cKnowledge.io@gfursin)
 * [Arjun Suresh](https://www.linkedin.com/in/arjunsuresh)
 
+## References
+
+* [Journal article with CK/CM concepts and our long-term vision](https://arxiv.org/pdf/2011.01149.pdf)
+* [ACM TechTalk with CK/CM intro moderated by Peter Mattson (MLCommons president)](https://www.youtube.com/watch?v=7zpeIVwICa4)
+* [HPCA'22 presentation "MLPerf design space exploration and production deployment"](https://doi.org/10.5281/zenodo.6475385)
+
 ## Acknowledgments
 
-We would like to thank all [contributors](https://github.com/mlcommons/ck/blob/master/CONTRIBUTING.md) 
+We would like to thank [MLCommons](https://mlcommons.org), 
+[OctoML](https://octoml.ai), all [contributors](https://github.com/mlcommons/ck/blob/master/CONTRIBUTING.md) 
 and [collaborators](https://cKnowledge.org/partners.html) for their support, fruitful discussions, 
 and useful feedback! See more acknowledgments in the [CK journal article](https://arxiv.org/abs/2011.01149)
 and our [ACM TechTalk](https://www.youtube.com/watch?v=7zpeIVwICa4).
diff --git a/cm-mlops/README.md b/cm-mlops/README.md
@@ -1,68 +1,22 @@
-# CM repository to enable more determinstic, portable and reproducible MLOps
+# CM MLOps repository 
 
 [![CM repository](https://img.shields.io/badge/Collective%20Mind-compatible-blue)](https://github.com/mlcommons/ck/tree/master/cm)
 [![CM artifact](https://img.shields.io/badge/Artifact-automated%20and%20reusable-blue)](https://github.com/mlcommons/ck/tree/master/cm)
 
+This repository contains [portable scripts](https://github.com/mlcommons/ck/tree/master/cm-mlops/script) 
+in the [CM format](https://github.com/mlcommons/ck) to unify and interconnect 
+different MLOps and DevOps tools.
 
-It is becoming very challenging to co-design, optimize and deploy efficient AI Systems in the real world:
-["MLOps Is a Mess But That's to be Expected"](https://www.mihaileric.com/posts/mlops-is-a-mess).
+All such components have a unified API, human readable CLI and extensible JSON/YAML meta description
+making it possible to reuse them in different projects and chain them together 
+into powerful, efficient and portable automation workflows, applications and web services
+adaptable to continuously changing software and hardware.
 
-However, [our experience](https://doi.org/10.5281/zenodo.6475385) 
-suggests that it is possible to [apply DevOps principles to MLOps](https://www.datanami.com/2022/03/30/birds-arent-real-and-neither-is-mlops/)
-if we treat all AI, ML and Systems artifacts including models, data sets, frameworks, libraries and scripts as "code" meta packages 
-with dependencies on other artifacts, operating systems and hardware.
+We use and extend this repository in the [open education workgroup](../docs/mlperf-education-workgroup.md) 
+as a common playground and a common language to help researchers and engineers
+learn how to modularize complex software systems (such as AI and ML) 
+and automate their benchmarking, optimization, co-design and deployment.
 
-We use this [CM-based repository](https://github.com/mlcommons/cm-mlops) 
-as a common playground and a common language to learn with the community
-how to automate benchmarking, optimization, co-design and deployment
-of complex ML Systems and make it more deterministic, portable and reproducible 
-across continusly changing software and hardware stacks.
-
-
-# How to use
-
-## Install CM toolkit and dependencies
-
-Install the CM toolkit as described [here](https://github.com/mlcommons/ck/blob/master/cm/docs/installation.md).
-
-## Install this CM repository
-
-Use CM to install this repository on your system:
-
-```bash
-$ cm pull repo mlcommons@ck
-```
-
-You can see this and other CM-compatible repositories installed on your system as follows:
-```bash
-$ cm list repo
-```
-
-You can list reusable automations as follows:
-```bash
-$ cm find automation
-```
-
-You can now list available MLOps automation scripts as follows:
-```bash
-$ cm list script
-```
-
-You can run any portable and reusable MLOps automation script as follows:
-```bash
-$ cm run script {CM script alias or UID}
-```
-
-
-*More to come soon ...*
-
-
-## Check CM tutorials
-
-TBD
-
-
-# Contacts
-
-* [Grigori Fursin](https://cKnowledge.io/@gfursin)
-* [Arjun Suresh](https://www.linkedin.com/in/arjunsuresh)
+Read about the CM concept [here](https://github.com/mlcommons/ck) 
+and follow [this tutorial](../cm/docs/tutorial-scripts.md) 
+to install CM framework and understand CM concepts.
diff --git a/cm/README.md b/cm/README.md
@@ -5,15 +5,28 @@
 [![Python Version](https://img.shields.io/badge/python-3+-blue.svg)](https://github.com/mlcommons/ck/tree/master/cm)
 [![License](https://img.shields.io/badge/License-Apache%202.0-green)](https://github.com/mlcommons/ck/tree/master/cm)
 
-[![Documentation Status](https://img.shields.io/badge/docs-passing-green)](https://cKnowledge.org/docs/cm)
+[![Documentation](https://img.shields.io/badge/Documentation-available%20online-green)](https://cKnowledge.org/docs/cm)
+[![CM(CK2) test](https://github.com/mlcommons/ck/actions/workflows/test-cm.yml/badge.svg)](https://github.com/mlcommons/ck/actions/workflows/test-cm.yml)
 
-The Collective Mind toolkit helps you to add and share [simple, human-readable  
-and platform-independent CLI and JSON API](https://github.com/mlcommons/ck/tree/master/cm-mlops/script) 
-to existing DevOps and MLOps automation scripts and artifacts to make them more understandable, portable, reusable, interoperable, deterministic and reproducible
+
+There are [many great automation tools and workflow frameworks](https://www.mihaileric.com/posts/mlops-is-a-mess) - 
+some are good for researchers and some for engineers. 
+The Collective Mind toolkit (CM) is [our community effort](../docs/mlperf-education-workgroup.md) 
+to develop a portable meta-framework that is good for both.
+
+CM helps researchers and engineers wrap ad-hoc DevOps and MLOps 
+automation scripts and artifacts with a simple, human-readable
+and platform-independent CLI, Python API and JSON/YAML meta description
+to make them more understandable, portable, reusable, interoperable, deterministic and reproducible
 across continuously changing hardware, software and data with minimal or no changes to existing projects.
 
-See an example of CM-based image classification that can run natively on any user platform with Linux, Windows and MacOS
-while automatically adapting to a given software, hardware and data:
+Such wrappers can be automatically connected together into powerful and portable workflows, applications and web-services
+to abstract developers and scientists from the rapidly evolving world of technology.
+
+See an example of a modular image classification assembled from such components 
+([portable CM scripts](https://github.com/mlcommons/ck/tree/master/cm-mlops/script)) 
+that will automatically detect, download, install and build all related artifacts and 
+tools to adapt this workflow to a user platform with Linux, Windows or MacOS:
 
 ```bash
 python3 -m pip install cmind
@@ -23,8 +36,9 @@ cm pull repo mlcommons@ck
 cm run script --tags=app,image-classification,onnx,python --quiet
 ```
 
-Normally, it will take just a few minutes to adapt this task to your platform (depending on your internet speed)
-and run image classification.
+It may take a few minutes to run this workflow for the first time and adapt it to your platform (depending on the Internet speed).
+Note that all the subsequent runs will be much faster because CM automatically caches the output of all portable CM scripts to be quickly reused
+in this and other CM workflows.
 
 You can also force to install specific versions of ML artifacts 
 (models, data sets, engines, libraries, tools, etc) 
@@ -57,27 +71,35 @@ cm run script --tags=get,cuda-devices
 
 CM is [motivated](docs/motivation.md) by our tedious and interesting experience
 [reproducing 150+ ML and systems papers and validating them in the real world](https://learning.acm.org/techtalks/reproducibility)
-during so-called [artifact evaluation](https://cTuning.org/ae).
+during different [reproducibility initiatives and artifact evaluation](https://cTuning.org/ae).
 
-The CM toolkit helps users to gradually transform their existing projects, Git repositories, Docker containers,
+The CM toolkit helps researchers and engineers transform their existing projects, Git repositories, Docker containers,
 Jupyter notebooks and internal directories into an [open database of portable CM scripts](https://github.com/mlcommons/ck/tree/master/cm-mlops/script)
 with a common API, extensible meta descriptions and a simple portability and interoperability layer
 written in Python or shell scripts.
 
-Such an evolutionary approach makes it easier to share ML, AI and other artifacts, knowledge and experience in a more unified, automated, portable, 
-reusable and reproducible way while simplifying and automating the development and deployment of complex applications
-across rapidly evolving software and hardware stacks from the cloud to the edge.
+Such an evolutionary approach helps the community share their knowledge, experience, artifacts and scripts 
+in a more unified, automated, portable, reusable and reproducible way while simplifying and automating 
+the development and deployment of complex applications across rapidly evolving software and hardware stacks 
+from the cloud to the edge.
 
 The CM toolkit is the 2nd generation of the [Collective Knowledge framework (CK)]( https://arxiv.org/abs/2011.01149 )
-that was [originally validated in academia and industry in the past few years]( https://cKnowledge.org/partners.html )
+that was [originally developed in collaboration with companies and universities]( https://cKnowledge.org/partners.html )
 to enable collaborative and reproducible development, optimization and deployment
 of Pareto-efficient ML Systems in terms of accuracy, latency, throughput, energy, size and costs
 across continuously changing software, hardware, user environments, settings, models and data.
 
 
+# Copyright
+
+[MLCommons](https://mlcommons.org) 2022
+
 
 # News
 
+* **2022 September 1:** We developed a CM workflow to automate and modularize [MLPerf inference benchmark](docs/tutorial-modular-mlperf.md). 
+  We continue these developments within a public [MLPerf education workgroup](../docs/mlperf-education-workgroup.md).
+
 * **2022 July 25:** We updated tutorial about CM scripts: https://github.com/mlcommons/ck/blob/master/cm/docs/tutorial-scripts.md .
 
 * **2022 July 21:** We have pre-released relatively stable scripts for portable DevOps and MLOps at https://github.com/mlcommons/ck/tree/master/cm-mlops/script .
@@ -95,11 +117,6 @@ across continuously changing software, hardware, user environments, settings, mo
 
 
 
-# License
-
-Apache 2.0
-
-
 
 # Documentation
 
@@ -125,20 +142,18 @@ prefixed with *[CK2/CM core]* to improve and enhance the CM core
 that helps to organize projects as a collective database 
 of reusable artifacts and automation scripts:
 
-
-
 ## CM automation scripts
 
 CM provides a common playground and a common language to help researchers and engineers
-discuss and learn how to [make benchmarking, optimization, co-design and deployment 
-of complex ML Systems](https://www.mihaileric.com/posts/mlops-is-a-mess) more deterministic, portable and reproducible across
-continuously changing software and hardware stacks:
+discuss and learn how to connect numerous incompatible tools together and make them 
+more deterministic, portable and reproducible across continuously changing software and hardware stacks.
+We continue these discussions and developments within our [open workgroup](../docs/mlperf-education-workgroup.md):
 
 * [CM scripts for portable MLOps and DevOps](https://github.com/mlcommons/ck/tree/master/cm-mlops/script)
 * [CM automations](https://github.com/mlcommons/ck/tree/master/cm-mlops/automation)
 
 
-## Development meetings
+# Development meetings
 
 * [Public notes](meetings/)
 * [Regular conf-calls](meetings/conf-calls.md)
@@ -148,15 +163,29 @@ continuously changing software and hardware stacks:
 
 * [MLOps projects, articles and tools](docs/KB/MLOps.md)
 
+# Contributing
 
-# Acknowledgments
+The best way to contribute to this project is to join our [open workgroup](docs/mlperf-education-workgroup.md)
+to help the community modularize AI, ML and other complex systems, 
+share your ML artifacts and automations as [reusable CM scripts](https://github.com/mlcommons/ck/tree/master/cm-mlops/script)
+and improve the core CM functionality.
 
-We thank the [users and partners of the original CK framework](https://cKnowledge.org/partners.html), 
-[OctoML](https://octoml.ai), [MLCommons](https://mlcommons.org) 
-and all our colleagues for their valuable feedback and support!
+# References
 
+* [Journal article with CK/CM concepts and our long-term vision](https://arxiv.org/pdf/2011.01149.pdf)
+* [ACM TechTalk with CK/CM intro moderated by Peter Mattson (MLCommons president)](https://www.youtube.com/watch?v=7zpeIVwICa4)
+* [HPCA'22 presentation "MLPerf design space exploration and production deployment"](https://doi.org/10.5281/zenodo.6475385)
 
-# Contacts
+# Acknowledgments
+
+We would like to thank [MLCommons](https://mlcommons.org), 
+[OctoML](https://octoml.ai), all [contributors](https://github.com/mlcommons/ck/blob/master/CONTRIBUTING.md) 
+and [collaborators](https://cKnowledge.org/partners.html) for their support, fruitful discussions, 
+and useful feedback! See more acknowledgments in the [CK journal article](https://arxiv.org/abs/2011.01149)
+and our [ACM TechTalk](https://www.youtube.com/watch?v=7zpeIVwICa4).
 
-* [Grigori Fursin](https://cKnowledge.io/@gfursin)
+# Maintainers
+
+* [Grigori Fursin](https://cKnowledge.io@gfursin)
 * [Arjun Suresh](https://www.linkedin.com/in/arjunsuresh)
+
diff --git a/cm/docs/motivation.md b/cm/docs/motivation.md
@@ -12,14 +12,19 @@ software, hardware, user environments, settings and data sets.
 
 ![](https://cKnowledge.org/images/cm-gap-beween-mlsys-research-and-production2a.png)
 
+We have noticed that while there many great automation tools and workflow automation frameworks out there,
+some of them are good only for researchers and some are good only for engineers.
+
 ## Community effort
 
-The open-source Collective Mind toolkit (CM aka CK2) is our community effort to solve above problems 
-by providing a unified CLI, API and extensible meta descriptions to existing artifacts and automation scripts for DevOps and MLOps 
-to make them more portable, interoperable, deterministic, reusable, reproducible and understandable
+The open-source Collective Mind toolkit (CM aka CK2) is our community effort to develop a simple meta-framework 
+that can solve above problems and make existing tools easier to use for both researchers and engineers.
+
+CM provides a unified CLI, API and extensible meta descriptions to existing artifacts and automation scripts 
+for DevOps and MLOps to make them more portable, interoperable, deterministic, reusable, reproducible and understandable
 with minimal or no changes to existing projects!
 
-The CM toolkit helps users to gradually transform their existing projects, Git repositories, Docker containers,
+The CM toolkit helps users gradually transform their existing projects, Git repositories, Docker containers,
 Jupyter notebooks and internal directories into an [open database of portable CM scripts](https://github.com/mlcommons/ck/tree/master/cm-mlops/script)
 with a common API, extensible meta descriptions and a simple portability and interoperability layer
 written in Python or shell scripts.