-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add exclude_signals to select_rows, and include_signals to export methods. #1139
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
lilac/data/dataset_duckdb.py
Outdated
@@ -2640,7 +2666,7 @@ def _column_to_duckdb_paths( | |||
is_petal = schema.get_field(path).dtype is not None | |||
|
|||
# NOTE: The order of this array matters as we check the source and map manifests for fields | |||
# before reading signal manifests, via source_or_map_has_path. | |||
# before reading signal manifests, via source_or_map_has_path.s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
length_signal = LengthSignal() | ||
dataset.compute_signal(length_signal, 'text') | ||
|
||
result = dataset.select_rows(combine_columns=True, exclude_signals=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a test where you select explicitly just a single sub-field of a signal (via columns, but exclude_signals=True)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
The motivation here is for export methods to quickly allow disabling signals for exporting.
select_rows
now hasexclude_signals
, which automatically disables signals.to_*
now haveinclude_signals
, which by default removes signals. If you set this to true, you will get signals on the other side.Also add a cluster sampling notebook that shows this workflow.