-
Notifications
You must be signed in to change notification settings - Fork 145
[RFC] Configuration & Environment
:Subject: Configuration & Environment :Authors: CW Andrews & R. Dorgueil :Created: Sep 7, 2017 :Modified: Oct 7, 2017 :Target: 0.5 :Status: First bits released, draft needs cleanup.
THIS IS A DRAFT
ETL jobs needs to be parametrizable by the end user.
- For the simplest needs, system environment is sufficient and one can read from
os.environ
(easy to do since 0.5). - There may be a need for "validation". For example, a variable may be required (api key...) or needs to be an int (number of queries...). This is not yet possible but should enhance developper experience (target: future).
- There should be some possibilities to change the graph topography depending on configuration. For example, the "slack api" branch in the graph may be added only if SLACK_KEY is present.
Runtime configuration should be done using environment variables.
In the future (0.7+), there may be a way to add validation, but not for now.
Order of priority should be, from lower to higher (higher wins, if set):
- default values
os.getenv("VARNAME", default_value)
-
--default-env-file
values Not yet implemented- Specify file to read default env values from. Each env var in the file is used if the var isn't already a corresponding value set at the system environment (system environment vars not overwritten).
-
--default-env
values Not yet implemented- Works like #2 but the default
NAME=var
are passed individually, with one key=value pair for each--default-env
flag rather than gathered from a specified file.
- Works like #2 but the default
- system environment values
- Env vars already set at the system level. It is worth noting that passed env vars via
NAME=value bonobo run ...
falls here in the order of priority.
- Env vars already set at the system level. It is worth noting that passed env vars via
-
--env-file
values Not yet implemented- Env vars specified here are set like those in #2 albeit that these values have priority over those set at the system level.
-
--env
values- Env vars set using the
--env / -e
flag work like #3 but take priority over all other env vars.
- Env vars set using the
- Way to go for runtime configuration.
- Reading a value from environment is done using the standard
os.getenv("VARNAME", default_value)
. - There is no way to "validate" those options yet, not sure about whether it's needed so for now, let's do nothing.
If you have a bash like shell, you can override variables in the shell.
FOO=bar bonobo run ...
Some shells apparently make it harder to override env from the command line. Bonobo now includes the --env / -e
flag to pass vars in a shell-agnostic way.
bonobo run --env FOO=bar ...
bonobo run -e FOO=bar ...
Not implemented yet.
.env file should be possible
-
Perhaps .env.pub (public) which can be safely included in online git repos as they just contain general settings. I think this might be a good idea in anticipation of users being able to share graphs with each-other or use bonobo for their projects without having everyone working on the project re-write the same settings in their private .env files. .env.loc (local) or some-such would be the private counterpart to the .env.pub and might contain individual API keys, environment specific settings, etc. (keep in mind the names I used for the files are really just placeholders). I am not totally sure how this would work but definitely think it is worth contemplating.
-
I think that these two would be used in combination with the private settings overriding the public ones if/when the same variables are set in both the public and private files. I don't think this would be too hard to implement
-
Going-off of #2, as it stands, the implementation of --env I went with has argparse collect whatever args are passed in as a list and bonobo.execute just iterates through the list setting each in-turn. For example,
bonobo run ... --e MY_NUM=3 -e MY_NUM=5
wouldn't cause an issue, the last one set would simply be the one which is used for the graph, in this case MY_NUM being 5. Going off of this, a public and private env file would just need to be collected and set in order for this to work as proposed. As an example, the vars in .env.pub would be collected into list_1 and .env.loc would have its vars collected into list_2, then, a simpleenv_vars = list_1.extend(list_2)
would join the lists in the desired order and iterating through the list would have the desired effect. Extending this example, the collected cli args (--env), having already been collected into a list, would then be added to theenv_vars
list viaenv_vars.extend(passed_env_vars)
.
- The one issue I foresee with this is that passing vars at runtime via
MY_NUM=5 bonobo run ...
would no longer work as expected because there is no way to differentiate MY_NUM as set in this manner from any other environment variable at the system level. However, in this instance the simplest solution would simply be to ask users to use --env to pass args at runtime rather thanMY_NUM=5 bonobo run ...
as not only is using --env shell agnostic (which has it's own benefits), but I simply don't see a real advantage to setting variables viaMY_NUM=5 bonobo run ...
as-opposed to -env.
This needs a complete documentation.