From 2ff381d3091d4387a4dd31f0ac2b36f356d25f35 Mon Sep 17 00:00:00 2001 From: Francis Charette Migneault Date: Wed, 6 Nov 2024 14:04:51 -0500 Subject: [PATCH] add more explicit indication about Item-only vs shared Item/Asset MLM fields --- README.md | 63 ++++++++++++++++++++++++++++++------------------------- 1 file changed, 35 insertions(+), 28 deletions(-) diff --git a/README.md b/README.md index f427c99..1e82e73 100644 --- a/README.md +++ b/README.md @@ -116,34 +116,41 @@ The fields in the table below can be used in these parts of STAC documents: [item-assets]: https://github.com/stac-extensions/item-assets -| Field Name | Type | Description | -|-----------------------------|---------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| mlm:name | string | **REQUIRED** A name for the model. This can include, but must be distinct, from simply naming the model architecture. If there is a publication or other published work related to the model, use the official name of the model. | -| mlm:architecture | [Model Architecture](#model-architecture) string | **REQUIRED** A generic and well established architecture name of the model. | -| mlm:tasks | \[[Task Enum](#task-enum)] | **REQUIRED** Specifies the Machine Learning tasks for which the model can be used for. If multi-tasks outputs are provided by distinct model heads, specify all available tasks under the main properties and specify respective tasks in each [Model Output Object](#model-output-object). | -| mlm:framework | string | Framework used to train the model (ex: PyTorch, TensorFlow). | -| mlm:framework_version | string | The `framework` library version. Some models require a specific version of the machine learning `framework` to run. | -| mlm:memory_size | integer | The in-memory size of the model on the accelerator during inference (bytes). | -| mlm:total_parameters | integer | Total number of model parameters, including trainable and non-trainable parameters. | -| mlm:pretrained | boolean | Indicates if the model was pretrained. If the model was pretrained, consider providing `pretrained_source` if it is known. | -| mlm:pretrained_source | string \| null | The source of the pretraining. Can refer to popular pretraining datasets by name (i.e. Imagenet) or less known datasets by URL and description. If trained from scratch (i.e.: `pretrained = false`), the `null` value should be set explicitly. | -| mlm:batch_size_suggestion | integer | A suggested batch size for the accelerator and summarized hardware. | -| mlm:accelerator | [Accelerator Type Enum](#accelerator-type-enum) \| null | The intended computational hardware that runs inference. If undefined or set to `null` explicitly, the model does not require any specific accelerator. | -| mlm:accelerator_constrained | boolean | Indicates if the intended `accelerator` is the only `accelerator` that can run inference. If undefined, it should be assumed `false`. | -| mlm:accelerator_summary | string | A high level description of the `accelerator`, such as its specific generation, or other relevant inference details. | -| mlm:accelerator_count | integer | A minimum amount of `accelerator` instances required to run the model. | -| mlm:input | \[[Model Input Object](#model-input-object)] | **REQUIRED** Describes the transformation between the EO data and the model input. | -| mlm:output | \[[Model Output Object](#model-output-object)] | **REQUIRED** Describes each model output and how to interpret it. | -| mlm:hyperparameters | [Model Hyperparameters Object](#model-hyperparameters-object) | Additional hyperparameters relevant for the model. | - -To decide whether above fields should be applied under Item `properties` or under respective Assets, the context of -each field must be considered. For example, the `mlm:name` should always be provided in the Item `properties`, since -it relates to the model as a whole. In contrast, some models could support multiple `mlm:accelerator`, which could be -handled by distinct source code represented by different Assets. In such case, `mlm:accelerator` definitions should be -nested under their relevant Asset. If a field is defined both at the Item and Asset level, the value at the Asset level -would be considered for that specific Asset, and the value at the Item level would be used for other Assets that did -not override it for their respective reference. For some of the fields, further details are provided in following -sections to provide more precisions regarding some potentially ambiguous use cases. +| Field Name | Type | Description | +|-----------------------------------------|---------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| mlm:name [[1]][1] | string | **REQUIRED** A name for the model. This can include, but must be distinct, from simply naming the model architecture. If there is a publication or other published work related to the model, use the official name of the model. | +| mlm:architecture | [Model Architecture](#model-architecture) string | **REQUIRED** A generic and well established architecture name of the model. | +| mlm:tasks | \[[Task Enum](#task-enum)] | **REQUIRED** Specifies the Machine Learning tasks for which the model can be used for. If multi-tasks outputs are provided by distinct model heads, specify all available tasks under the main properties and specify respective tasks in each [Model Output Object](#model-output-object). | +| mlm:framework | string | Framework used to train the model (ex: PyTorch, TensorFlow). | +| mlm:framework_version | string | The `framework` library version. Some models require a specific version of the machine learning `framework` to run. | +| mlm:memory_size | integer | The in-memory size of the model on the accelerator during inference (bytes). | +| mlm:total_parameters | integer | Total number of model parameters, including trainable and non-trainable parameters. | +| mlm:pretrained | boolean | Indicates if the model was pretrained. If the model was pretrained, consider providing `pretrained_source` if it is known. | +| mlm:pretrained_source | string \| null | The source of the pretraining. Can refer to popular pretraining datasets by name (i.e. Imagenet) or less known datasets by URL and description. If trained from scratch (i.e.: `pretrained = false`), the `null` value should be set explicitly. | +| mlm:batch_size_suggestion | integer | A suggested batch size for the accelerator and summarized hardware. | +| mlm:accelerator | [Accelerator Type Enum](#accelerator-type-enum) \| null | The intended computational hardware that runs inference. If undefined or set to `null` explicitly, the model does not require any specific accelerator. | +| mlm:accelerator_constrained | boolean | Indicates if the intended `accelerator` is the only `accelerator` that can run inference. If undefined, it should be assumed `false`. | +| mlm:accelerator_summary | string | A high level description of the `accelerator`, such as its specific generation, or other relevant inference details. | +| mlm:accelerator_count | integer | A minimum amount of `accelerator` instances required to run the model. | +| mlm:input [[1]][1] | \[[Model Input Object](#model-input-object)] | **REQUIRED** Describes the transformation between the EO data and the model input. | +| mlm:output [[1]][1] | \[[Model Output Object](#model-output-object)] | **REQUIRED** Describes each model output and how to interpret it. | +| mlm:hyperparameters [[1]][1] | [Model Hyperparameters Object](#model-hyperparameters-object) | Additional hyperparameters relevant for the model. | + +[1]: #sup1sup-allowed-only-in-item-properties + +##### [1] Allowed Only in Item `properties` + +> [!NOTE] +> Unless stated otherwise by [[1]][1] in the table, fields can be used at either the Item or Asset level. +>

+> To decide whether above fields should be applied under Item `properties` or under respective Assets, the context of +> each field must be considered. For example, the `mlm:name` should always be provided in the Item `properties`, since +> it relates to the model as a whole. In contrast, some models could support multiple `mlm:accelerator`, which could be +> handled by distinct source code represented by different Assets. In such case, `mlm:accelerator` definitions should be +> nested under their relevant Asset. If a field is defined both at the Item and Asset level, the value at the Asset +> level would be considered for that specific Asset, and the value at the Item level would be used for other Assets that +> did not override it for their respective reference. For some of the fields, further details are provided in following +> sections to provide more precisions regarding some potentially ambiguous use cases. In addition, fields from the multiple relevant extensions should be defined as applicable. See [Best Practices - Recommended Extensions to Compose with the ML Model Extension](best-practices.md#recommended-extensions-to-compose-with-the-ml-model-extension)