Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request]: flepimop sync CLI Tool #450

Closed
TimothyWillard opened this issue Jan 8, 2025 · 1 comment · Fixed by #501
Closed

[Feature request]: flepimop sync CLI Tool #450

TimothyWillard opened this issue Jan 8, 2025 · 1 comment · Fixed by #501
Labels
cli Relating to command line interfaces enhancement Request for improvement or addition of new feature(s). high priority High priority.

Comments

@TimothyWillard
Copy link
Contributor

Label

enhancement

Priority Label

low priority

Is your feature request related to a problem? Please describe.

flepiMoP in general, but in particular inference runs, produce a large number of outputs. Both in number of files and size of files. This presents a challenge when trying to use the files and in particular move files from HPC environments to local for development/analysis or to AWS for long-term storage.

Is your feature request related to a new application, scenario round, pathogen? Please describe.

No response

Describe the solution you'd like

A general utility that can handle moving files locally on the filesystem (directory to directory), moving files between computers directly (local to HPC or visa versa), and moving files between cloud storage (local to AWS or visa versa). A (rough) outline of the man page:

Usage: flepimop sync [OPTIONS] [LOCATION]...

  Sync configuration, manifest, and model input/output files across destinations.

Options:
  --content [all|config|manifest|input|output] The type of files to sync. [default: all]
  --pull                                       A flag indicating if this sync should be a 
                                               pull instead of a push.
  -v, --verbose                                The verbosity level to use for this command.
  --dry-run                                    Should this command be run using dry run?
  --help                                       Show this message and exit.

This is very loose as there are likely options required for pointing to credentials if stored in non-standard locations as well as for the flepi/project paths. There are some open questions about this utility:

  1. How should destination be interpreted, assuming that the order of operations would be to check for s3:// prefix for AWS, <cluster name>:// for HPCs, and then otherwise default to local filesystem?
  2. I've structured this such that it's push, location=destination, by default instead of pull, location=source, but worth think about if that's the preferred usage or if that's even the right framing (maybe source/destination should always be required so there's no confusion)?
  3. I think it would be extremely useful if there was a way to filter the content being transferred. For example moving a subset of a large inference run from an HPC to a local machine for development/testing purposes. What would be the semantics of filtering? By subpopulation, chain, iterations, etc.? This one is complicated enough that it might warrant splitting into it's own issue after a first pass at this.

In general, can think of this as the next step beyond GH-192/GH-296, but with more functionality. I think that PR provides some of the infrastructure needed for this issue. I also think this becomes much easier to do after standardizing inference outputs.

@TimothyWillard TimothyWillard added enhancement Request for improvement or addition of new feature(s). low priority Low priority. cli Relating to command line interfaces labels Jan 8, 2025
@pearsonca pearsonca added high priority High priority. and removed low priority Low priority. labels Jan 29, 2025
@TimothyWillard TimothyWillard linked a pull request Feb 19, 2025 that will close this issue
@TimothyWillard
Copy link
Contributor Author

GH-501

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cli Relating to command line interfaces enhancement Request for improvement or addition of new feature(s). high priority High priority.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants