Add first scratch of implementation for maestro CLI #35

PawelPeczek-Roboflow · 2024-09-05T20:24:54Z

Description

Did not have chance to fully test it e2e due to lack of GPU machines.

Failed to create the CLI as generic as I wanted:

typer is quite sophisticated regarding generating command hints but it does not support some of the typings - like Unions or Literals (and I am not sure if other CLI libs will)
unfortunately - it seems like the only option to get list of args for train / val commands is to declare them explicitly - initially my plan was to create function to assembly typings based on dataclass (or even better pydantic model) - and then what we would gain is the experience that you create training config and we generate command args based on the config class
decided to declare training config in CLI entrypoint
alternatively - we could accept path to config and kwargs to override, but then people would not be able to see nice command help that we have now - to cover for that, we could have additional command to list all config fields (which could happen automatically based on introspection of dataclass) - so you would have command maestro <model> train and maestro <model> list-train-parameters which would give you hints on what params the train command accept - not sure if that is better than explicit params

What I managed to do:

put the foundation of CLI structure - I imagine maestro CLI have some generic commands - like maestro info - which are implemented at the top-level of CLI
then each model recipe would implement sub-CLI app, which would expose its custom commands (we do not put any constraints on what that commands could be)
we have introspection mechanism at top level of CLI which dynamically load recipes - now based on what can be imported

Examples of usage:

❯ python -m maestro.cli.main info
Welcome to maestro CLI. Let's train some VLM! 🏋

❯ python -m maestro.cli.main --help

 Usage: python -m maestro.cli.main [OPTIONS] COMMAND [ARGS]...

╭─ Options ────────────────────────────────────────────────────────────────────────────────────╮
│ --install-completion          Install completion for the current shell.                      │
│ --show-completion             Show completion for the current shell, to copy it or customize │
│                               the installation.                                              │
│ --help                        Show this message and exit.                                    │
╰──────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ───────────────────────────────────────────────────────────────────────────────────╮
│ florence2   Fine-tune and evaluate Florence 2 model                                          │
│ info        Display information about maestro                                                │
│ paligemma   Fine-tune and evaluate PaliGemma model                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────╯

❯ python -m maestro.cli.main florence2 --help

 Usage: python -m maestro.cli.main florence2 [OPTIONS] COMMAND [ARGS]...

 Fine-tune and evaluate Florence 2 model

╭─ Options ────────────────────────────────────────────────────────────────────────────────────╮
│ --help          Show this message and exit.                                                  │
╰──────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ───────────────────────────────────────────────────────────────────────────────────╮
│ evaluate   Evaluate Florence 2 model                                                         │
│ train      Train Florence 2 model                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────╯

❯ python -m maestro.cli.main florence2 train --help

 Usage: python -m maestro.cli.main florence2 train [OPTIONS]

 Train Florence 2 model

╭─ Options ────────────────────────────────────────────────────────────────────────────────────╮
│ *  --dataset_location                               TEXT     Path to directory with dataset  │
│                                                              [default: None]                 │
│                                                              [required]                      │
│    --model_id_or_path                               TEXT     Model to be used or path to     │
│                                                              your checkpoint                 │
│                                                              [default:                       │
│                                                              microsoft/Florence-2-base-ft]   │
│    --revision                                       TEXT     Revision of Florence2 HF        │
│                                                              repository                      │
│                                                              [default: refs/pr/20]           │
│    --device                                         TEXT     CUDA device ID to be used (in   │
│                                                              format: 'cuda:0')               │
│                                                              [default: cuda:0]               │
│    --transformers_cache_dir                         TEXT     Cache dir for HF weights        │
│                                                              [default: None]                 │
[...]

❯ python -m maestro.cli.main florence2 train --dataset_location some
Training configuration
{
    'dataset_location': 'some',
    'model_id_or_path': 'microsoft/Florence-2-base-ft',
    'revision': 'refs/pr/20',
    'device': device(type='cuda', index=0),
    'transformers_cache_dir': None,
    'training_epochs': 10,
    'optimiser': 'adamw',
    'learning_rate': 1e-05,
    'lr_scheduler': 'linear',
[...]

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
This change requires a documentation update

How has this change been tested, please provide a testcase or example of how you tested the change?

YOUR_ANSWER

Any specific deployment considerations

For example, documentation changes, usability, usage/costs, secrets, etc.

Docs

Docs updated? What were the changes:

…ns_of_cli # Conflicts: # maestro/trainer/common/utils/metrics_tracing.py # maestro/trainer/models/florence_2/entities.py # maestro/trainer/models/florence_2/training.py

PawelPeczek-Roboflow and others added 15 commits September 5, 2024 22:24

Add first scratch of implementation for maestro CLI

ee1a6fa

Merge branch 'feature/foundations_of_training' into feature/foundatio…

3e00b40

…ns_of_cli # Conflicts: # maestro/trainer/common/utils/metrics_tracing.py # maestro/trainer/models/florence_2/entities.py # maestro/trainer/models/florence_2/training.py

TrainingConfiguration filed names refactoer

dad39ba

final tests before plugging in CLI

672f27e

initial tests of CLI mode

4a339a4

fix

c7c63b7

fix No such option: --mode

5cc4220

fix 2 No such option: --mode

518323c

fix 3 No such option: --mode

fb212ea

fix 4 No such option: --mode

f15b7a9

fix 5 No such option: --mode

566d9ca

fix 6 No such option: --mode

fb1c826

bring back Pawel's code with improvements

d556a88

remove Literal from command definitions

f46049e

remove Union from command definitions

a2850ac

SkalskiP marked this pull request as ready for review September 10, 2024 22:41

SkalskiP merged commit 278918c into feature/foundations_of_training Sep 10, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add first scratch of implementation for maestro CLI #35

Add first scratch of implementation for maestro CLI #35

PawelPeczek-Roboflow commented Sep 5, 2024 •

edited

Loading

Add first scratch of implementation for maestro CLI #35

Add first scratch of implementation for maestro CLI #35

Conversation

PawelPeczek-Roboflow commented Sep 5, 2024 • edited Loading

Description

Type of change

How has this change been tested, please provide a testcase or example of how you tested the change?

Any specific deployment considerations

Docs

PawelPeczek-Roboflow commented Sep 5, 2024 •

edited

Loading