-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(weave): add pre-groupby conditions to heavy query filter in calls query #3781
Conversation
Warning Rate limit exceeded@gtarpenning has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 8 minutes and 59 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
WalkthroughThis pull request introduces new test functions and updates to query builders. Two new test functions in the client trace tests and three in the trace server tests simulate heavy condition aggregations and batch querying. In the production code under Changes
Sequence Diagram(s)sequenceDiagram
participant Test
participant QueryBuilder
participant Field
participant Condition
participant SQLGen
Test->>QueryBuilder: Initiate heavy condition query
QueryBuilder->>Field: Call as_sql(use_agg_fn=True)
Field-->>QueryBuilder: Return SQL snippet
QueryBuilder->>Condition: Invoke is_feedback() check
Condition-->>QueryBuilder: Return feedback status
QueryBuilder->>SQLGen: Evaluate should_add_predicate_filters()
SQLGen-->>QueryBuilder: Return decision on predicate filters
alt Predicate filters required
QueryBuilder->>SQLGen: Call make_call_parts_predicate_filters_sql()
SQLGen-->>QueryBuilder: Return predicate filters SQL
end
QueryBuilder-->>Test: Return complete SQL query
Possibly related PRs
Suggested reviewers
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
Preview this PR with FeatureBee: https://beta.wandb.ai/?betaVersion=6d23ce38de7ba38443b626f75986f36fe8046294 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
weave/trace_server/calls_query_builder.py (2)
76-76
: Clarify default usage ofuse_agg_fn
.The new boolean attribute defaults to
True
, which might not always be intuitive. Consider adding a docstring or inline comment explaining in which scenarios the aggregator function should be activated or skipped.
289-294
: Use a streamlined approach to detect feedback fields.You could simplify the method definition with the built-in
any()
function:def is_feedback(self) -> bool: - for field in self._get_consumed_fields(): - if isinstance(field, CallsMergedFeedbackPayloadField): - return True - return False + return any(isinstance(field, CallsMergedFeedbackPayloadField) + for field in self._get_consumed_fields())
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
tests/trace/test_client_trace.py
(1 hunks)tests/trace_server/test_calls_query_builder.py
(2 hunks)weave/trace_server/calls_query_builder.py
(9 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Focus on pythonic code patterns. Check for proper...
**/*.py
: Focus on pythonic code patterns.
Check for proper exception handling.
Verify type hints usage where applicable.
Look for potential performance improvements.
Don't comment on formatting if black/isort is configured.
Check for proper dependency injection patterns.
Verify proper async handling if applicable.
weave/trace_server/calls_query_builder.py
tests/trace/test_client_trace.py
tests/trace_server/test_calls_query_builder.py
⏰ Context from checks skipped due to timeout of 90000ms (40)
- GitHub Check: Trace nox tests (3, 13, trace)
- GitHub Check: Trace nox tests (3, 12, trace)
- GitHub Check: Trace nox tests (3, 11, trace)
- GitHub Check: Trace nox tests (3, 10, trace)
- GitHub Check: Trace nox tests (3, 13, trace)
- GitHub Check: Trace nox tests (3, 12, trace)
- GitHub Check: Trace nox tests (3, 11, trace)
- GitHub Check: Trace nox tests (3, 10, trace)
- GitHub Check: Trace nox tests (3, 13, trace)
- GitHub Check: Trace nox tests (3, 12, trace)
- GitHub Check: Trace nox tests (3, 11, trace)
- GitHub Check: Trace nox tests (3, 10, trace)
- GitHub Check: Trace nox tests (3, 13, trace)
- GitHub Check: Trace nox tests (3, 12, trace)
- GitHub Check: Trace nox tests (3, 11, trace)
- GitHub Check: Trace nox tests (3, 10, trace)
- GitHub Check: Trace nox tests (3, 13, trace)
- GitHub Check: Trace nox tests (3, 12, trace)
- GitHub Check: Trace nox tests (3, 11, trace)
- GitHub Check: Trace nox tests (3, 10, trace)
- GitHub Check: Trace nox tests (3, 13, trace)
- GitHub Check: Trace nox tests (3, 12, trace)
- GitHub Check: Trace nox tests (3, 11, trace)
- GitHub Check: Trace nox tests (3, 10, trace)
- GitHub Check: Trace nox tests (3, 13, trace)
- GitHub Check: Trace nox tests (3, 12, trace)
- GitHub Check: Trace nox tests (3, 11, trace)
- GitHub Check: Trace nox tests (3, 10, trace)
- GitHub Check: Trace nox tests (3, 13, trace)
- GitHub Check: Trace nox tests (3, 12, trace)
- GitHub Check: Trace nox tests (3, 11, trace)
- GitHub Check: Trace nox tests (3, 10, trace)
- GitHub Check: Trace nox tests (3, 13, trace)
- GitHub Check: Trace nox tests (3, 12, trace)
- GitHub Check: Trace nox tests (3, 11, trace)
- GitHub Check: Trace nox tests (3, 10, trace)
- GitHub Check: Trace nox tests (3, 13, trace)
- GitHub Check: Trace nox tests (3, 12, trace)
- GitHub Check: Trace nox tests (3, 11, trace)
- GitHub Check: Trace nox tests (3, 10, trace)
🔇 Additional comments (11)
tests/trace/test_client_trace.py (2)
3305-3377
: Great implementation of test to verify heavy condition aggregation!This test effectively validates querying of call parts with heavy conditions. It tests both the initial state after call start and the post-end state, while also ensuring that data merging in the database doesn't affect query results. The conditional logic handling for SQLite client is particularly good, as it acknowledges the different behaviors between database backends.
3379-3477
: Well-structured test for batch query handling with heavy conditionsThe test creates a good simulation of a batch scenario with multiple calls and verifies both input and output field filtering. I appreciate the thorough validation of different query combinations and the special handling for Clickhouse's asynchronous merging behavior.
tests/trace_server/test_calls_query_builder.py (3)
737-796
: Excellent test for SQL predicate filter generationThis test correctly validates that the query builder generates SQL with proper NULL handling for heavy conditions. The predicate filter ensures that queries work correctly even when dealing with unmerged call parts by including
OR calls_merged.inputs_dump IS NULL
in the condition.
798-868
: Comprehensive test for multiple heavy conditionsGood test that verifies the behavior when dealing with multiple heavy conditions on both input and output fields. The SQL generation properly handles complex combinations by creating appropriate predicate filters for each condition while maintaining proper NULL handling.
870-921
: Clear validation of OR condition between start and end fieldsThis test provides an important validation of a special case - ensuring that predicate filters aren't created when there's an OR between start and end fields. The docstring clearly explains the test's purpose, and the assertions verify the expected behavior where the conditions are properly moved to the HAVING clause instead.
weave/trace_server/calls_query_builder.py (6)
85-86
: Consider honoring thecast
parameter for non-aggregated calls.When
use_agg_fn
isFalse
, the method ignores the inboundcast
parameter. If this is not the desired behavior, consider applying:if not self.use_agg_fn: - return clickhouse_cast(inner) + return clickhouse_cast(inner, cast) return clickhouse_cast(f"{self.agg_fn}({inner})", cast)
609-609
: No issues detected for initialization.Setting
call_parts_predicate_filters_sql
to an empty string at this point looks fine.
709-710
: Verify the overlap ofattributes_dump
in both sets.
attributes_dump
is declared as start-only and end-only. This may lead to ambiguous or unexpected filtering logic. Ensure this duplication is intentional or adjust accordingly.
739-739
: Parameter addition approved.The new
use_agg_fn
parameter inprocess_query_to_conditions()
gives fine-grained control over aggregation. Looks good to proceed.
925-974
: Confirm handling of nested OR logic.
should_add_predicate_filters
checks top-level OR operations for conflicting start/end fields. If nested or combined logic is possible, ensure those cases are correctly handled or confirm that this limited scope is sufficient.
976-1036
: Reassess null allowance for start-only and end-only fields.Permitting rows with NULL values for heavy fields in an OR-based condition can inflate partial matches. Validate that these broader matching rules meet performance and correctness needs.
@@ -65,6 +65,9 @@ def as_sql( | |||
def as_select_sql(self, pb: ParamBuilder, table_alias: str) -> str: | |||
return f"{self.as_sql(pb, table_alias)} AS {self.field}" | |||
|
|||
def is_heavy(self) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like this is a property of CallsMergedField
. not sure why it needed to be hoisted up to QueryBuilderField
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you resolved this conversation without any comment or code change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gdi cursor keeps adding that back, if you check my first commit I removed it. There is a linter error that Cursor is obsessed with
if len(having_conditions_sql) > 0: | ||
having_filter_sql = "HAVING " + combine_conditions( | ||
having_conditions_sql, "AND" | ||
) | ||
|
||
if should_add_predicate_filters(self.query_conditions): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would slightly adjust this to be more pluggable.
call_parts_predicate_filters_sql = build_call_parts_predicate_filter(self.query_conditions)
I think that the implementation for "should_add_predicate_filers" is coupled with the implementation of "make_call_parts_predicate_filters_sql" and therefore should be abstracted behind a common method. Looking at your algorithm, i can imagine some alternatives which would make this easier to modify
if len(having_conditions_sql) > 0: | ||
having_filter_sql = "HAVING " + combine_conditions( | ||
having_conditions_sql, "AND" | ||
) | ||
|
||
heavy_non_feedback_conditions = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just have this inside the make_call_parts_predicate_filters_sql
?
def is_heavy(self) -> bool: | ||
return False | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol @tssweeney
@@ -893,6 +926,80 @@ def process_calls_filter_to_conditions( | |||
return conditions | |||
|
|||
|
|||
def make_call_parts_predicate_filters_sql( | |||
pb: ParamBuilder, table_alias: str, heavy_conditions: list[Condition] | |||
) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add the comment here that the condition returned MUST account for unmatched parts and NEVER filter out more than the heavy filter would
|
||
Returns an empty string if: | ||
1. There are no heavy conditions | ||
2. There are OR operations between start-only and end-only fields, which would |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think #2 is safe, just not using this algorithm
Description
Add additional filters to calls query when querying by heavy fields. Because we must account for unmerged calls, all conditions must take into account individual call parts. Example: instead of adding:
inputs_dump.a = 'bob'
we must writeinputs_dump.a = 'bob' OR inputs_dump is NULL
to handle the case that we are only looking at an end call part.Prod:
Branch:
Testing
Summary by CodeRabbit
New Features
Tests