Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the project API and documentation. #668

Merged
merged 8 commits into from
Sep 14, 2023
Merged

Improve the project API and documentation. #668

merged 8 commits into from
Sep 14, 2023

Conversation

nsthorat
Copy link
Contributor

@nsthorat nsthorat commented Sep 12, 2023

Docs:

  • Add a section for projects

API:

  • Add lilac init and lilac load
  • Add set_project_dir to set the environment variable globally.

Details:

  • Deprecate LILAC_DATA_PATH in favor of LILAC_PROJECT_DIR.
  • Add unit tests for load()

@nsthorat nsthorat changed the title [not ready] Improve the project API and documentation. Improve the project API and documentation. Sep 12, 2023
@nsthorat nsthorat added the enhancement New feature or request label Sep 12, 2023
@nsthorat nsthorat requested a review from dsmilkov September 12, 2023 15:44
Copy link
Collaborator

@dsmilkov dsmilkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing cleanup

From CLI:

```sh
lilac init ~/my_project
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious what the use-case is for having 'lilac init', w/o starting the web server versus 'lilac start'. Asking since rilldata doesn't have init

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question -- rill is basically UI-only (and configs). When using the python API, it's really weird that you have to start a server to start a project.

lilac.yml
datasets/
open-orca/
concepts/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to concept/ or change CONCEPTS_DIR to concepts (though we will have to deal with backcompat)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed, we can deal with that later

@@ -46,8 +49,8 @@ class GmailSource(Source):
name = 'gmail'

credentials_file: str = PydanticField(
description='Path to the OAuth credentials file.',
default=os.path.join(_GMAIL_CONFIG_DIR, _CREDS_FILENAME))
description=f'Path to the OAuth credentials file. Defaults to {_CREDS_FILENAME} in your Lilac '
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defaults to os.path.join(gmail_config_dir, _CREDS_FILENAME)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

description='Path to the OAuth credentials file.',
default=os.path.join(_GMAIL_CONFIG_DIR, _CREDS_FILENAME))
description=f'Path to the OAuth credentials file. Defaults to {_CREDS_FILENAME} in your Lilac '
'project directory.')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since credentials_file is marked as required in line 59 below, we should set a default

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I made this a default_factory so that it is fresh when environment variables change.

config: A Lilac config or the path to a json or yml file describing the configuration. The
contents should be an instance of `lilac.Config` or `lilac.DatasetConfig`. When not defined,
uses `LILAC_PROJECT_DIR`/lilac.yml.
overwrite: When True, runs all data from scratch, overwriting existing data. When false, only
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why we no longer pass overwrite to _compute_embedding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were doing the check for existence of signals/embeddings in 2 places, outside the worker, and inside. Now I just check overwrite outside the worker before scheduling a task. I have a unit test to make sure overwrite actually overwrites.

@nsthorat nsthorat merged commit a43503f into main Sep 14, 2023
4 checks passed
@nsthorat nsthorat deleted the nik-projects branch September 14, 2023 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants