Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SQL conversion of conditions #3252

Merged
merged 6 commits into from
Jan 3, 2024

Conversation

jnsrnhld
Copy link
Collaborator

@jnsrnhld jnsrnhld commented Dec 14, 2023

First commit contains only renaming Filters -> WhereClauses and FilterCondition -> WhereCondition.

Second commit contains the conversion of all CTConditions except for the GroovyCondition.

  • Converted CTConditions are applied in PREPROCESSING CTE step.
  • For conversion, the connector's table and column are always passed down to CTConditions, but if the CTCondition has a column itself, the given connector column is ignored.

(An der Stelle ist mir eingefallen, dass wir ja hier auf deutsch schreiben, deswegen hier der Wechsel 😄 )
Bei einer Sache war ich mir noch unsicher: aktuell gucke ich im CQConcept nach den ConceptElements und im Falle von ConceptTreeChilds konvertiere ich die entsprechende CTCondition. Zusätzlich konvertiere ich auch IMMER die CTCondition des Connectors (falls vorhanden). Passt das so oder wird die Condition vom Connector nur angewandt, wenn das top-level TreeConcept ausgewählt ist?

@awildturtok
Copy link
Collaborator

Bei einer Sache war ich mir noch unsicher: aktuell gucke ich im CQConcept nach den ConceptElements und im Falle von ConceptTreeChilds konvertiere ich die entsprechende CTCondition. Zusätzlich konvertiere ich auch IMMER die CTCondition des Connectors (falls vorhanden). Passt das so oder wird die Condition vom Connector nur angewandt, wenn das top-level TreeConcept ausgewählt ist?

Das hast du richtig verstanden, die conditions werden verundet.

Field<String> field = DSL.field(DSL.name(context.getConnectorTable().getName(), context.getConnectorColumn().getName()), String.class);
LikeEscapeStep condition = Arrays.stream(prefixes)
.map(prefix -> field.like(prefix + "%"))
.reduce((like1, like2) -> (LikeEscapeStep) like1.or(like2))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gibt es keine bessere Möglichkeit das zu lösen? Hier wird das für psql skizziert: https://dba.stackexchange.com/questions/236627/how-to-use-like-operator-with-multiple-conditions-at-once-in-postgresql

Das sieht besser aus, einfach weil es echt große Listen werden können. Aber natürlich fragwürdig wie das über unterschiedliche DBs hinweg funktioniert.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sehe weiter unten likeRegex, vlt im regex die Veroderung machen und benchmarken? Das können nämlich locker 100 Werte sein

@awildturtok awildturtok self-requested a review December 20, 2023 15:29
Copy link
Collaborator

@awildturtok awildturtok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Über negation müssen wir separat reden. Der rest sieht gut aus

@@ -40,4 +48,28 @@ public boolean matches(String value, CalculatedValue<Map<String, Object>> rowMap
return false;
}

@Override
public WhereCondition convertToSqlCondition(CTConditionContext context) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oha

@@ -47,13 +52,13 @@ public Class<CQConcept> getConversionClass() {
}

@Override
public ConversionContext convert(CQConcept node, ConversionContext context) {
public ConversionContext convert(CQConcept cqConcept, ConversionContext context) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warum ist der CQConceptConverter (noch) nicht in CQConcept inlined?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wir hatten uns erstmal (noch) dagegen entschieden, weil dann relativ viel Code im CQConverter liegt. Aber da können wir ja auch nochmal drüber sprechen, ob wir auch die verbleibenden Converter mit in die jeweiligen Klassen ziehen wollen. @torbsto

public WhereClauses negated() {
return new WhereClauses(
preprocessingConditions.stream().map(WhereCondition::negate).toList(),
eventFilters.stream().map(WhereCondition::negate).toList(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

da hab ich mir schon länger gedanken zu gemacht und immer noch keine Eindeutige Antwort gefunden, aber ich glaube event-filters werden nicht negated, das macht sonst aggregationen etwas weird.

"description": " ",
"condition": {
"type": "OR",
"conditions": [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Durschnschnittliche Condition 🧛

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😄 Ich benenne den Fall mal um.

{
"label": "F22.8",
"description": "Sonstige anhaltende wahnhafte Störungen",
"additionalInfos": [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kannst du die additionalInfos und die descriptions bitte entfernen?

@jnsrnhld
Copy link
Collaborator Author

@awildturtok Habe das mit dem Regex mal getestet, deine Vermutung war wohl zutreffend 😄
Hier die Auswertung:

TL;DR

cost of 86316.64 (with Regex) < cost of 270699.83 (with LIKE 'foo' OR LIKE 'bar' etc.)
1,5 seconds (with Regex) < 1,9 seconds (with LIKE 'foo' OR LIKE 'bar' etc.)

I used this dataset: https://postgrespro.com/docs/postgrespro/10/demodb-bookings-installation (demo-big, about 2,5 GB data), with 2.9 million tickets entries.

With LIKE 'foo' OR LIKE 'bar' etc.

EXPLAIN ANALYSE select *
from "tickets"
where "passenger_name" LIKE 'A%'
or "passenger_name" LIKE 'B%'
or "passenger_name" LIKE 'C%'
or "passenger_name" LIKE 'D%'
or "passenger_name" LIKE 'E%'
or "passenger_name" LIKE 'F%'
or "passenger_name" LIKE 'G%'
or "passenger_name" LIKE 'H%'
or "passenger_name" LIKE 'I%'
or "passenger_name" LIKE 'J%'
or "passenger_name" LIKE 'K%'
or "passenger_name" LIKE 'L%'
or "passenger_name" LIKE 'M%'
or "passenger_name" LIKE 'N%'
or "passenger_name" LIKE 'O%'
or "passenger_name" LIKE 'P%'
or "passenger_name" LIKE 'Q%'
or "passenger_name" LIKE 'R%'
or "passenger_name" LIKE 'S%'
or "passenger_name" LIKE 'T%'
or "passenger_name" LIKE 'U%'
or "passenger_name" LIKE 'V%'
or "passenger_name" LIKE 'W%'
or "passenger_name" LIKE 'X%'
or "passenger_name" LIKE 'Y%'
or "passenger_name" LIKE 'Z%'

image

Cost: (cost=0.00..270699.83 rows=1919327 width=104)
Execution time: ~1.9 seconds

With Regex

EXPLAIN ANALYSE SELECT *
FROM "tickets"
WHERE "passenger_name" SIMILAR TO '[A-Z]%';
# or more realistic
SELECT *
FROM "tickets"
WHERE "passenger_name" SIMILAR TO '(A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z)%';

image

Cost: (cost=0.00..86316.64 rows=2949862 width=104)
Execution time: ~1.5 seconds

@jnsrnhld jnsrnhld requested a review from awildturtok December 21, 2023 15:17
Copy link
Collaborator

@awildturtok awildturtok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Danke für das ausführliche Benchmark!

@jnsrnhld jnsrnhld merged commit 2bed2b7 into develop Jan 3, 2024
6 checks passed
@jnsrnhld jnsrnhld deleted the sql/feature/tree-concept-conversion branch January 3, 2024 07:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants