-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return a CTE expression to allow for chaining #13
Comments
Thanks @korenmiklos. If you omit the cte = @chain db_table(db, :mtcars) begin
@filter(!starts_with(model, "M"))
@group_by(cyl)
@summarize(mpg = mean(mpg))
# @query
end
df = @chain cte begin
@mutate(mpg_squared = mpg^2,
mpg_rounded = round(mpg),
mpg_efficiency = case_when(
mpg >= cyl^2 , "efficient",
mpg < 15.2 , "inefficient",
"moderate"))
@filter(mpg_efficiency in ("moderate", "efficient"))
@arrange(desc(mpg_rounded))
@collect
end |
If the ideal/goal of chaining from a prior cte, is to be able to do multiple queries from it, then we're are halfway there. In the third chain, you'll notice it contains the contents from the second chain which changed the content of the underlying CTE/SQLQuery struct/metadata. The third chain shows the new columns that were not present in the saved initial One way to enable this (and im open to others if you/others have ideas), would perhaps be to add a or even a
|
@korenmiklos as a quick update/option, the function below would allow you to take the cte
there may be a more memory efficient way to do it without a deepcopy, however. |
Thanks for the clarification @drizk1 and @kdpsingh. I see now that Reusing SQL expressions could be a great use case for TidierDB. To exploit the performance of DuckDB, I write mostly SQL these days, but it's not easy to link pieces of SQL together. dbt does this, but is too heavy machinery and a different language. TidierDB could serve as the glue in Julia. I guess this would need Is this within scope? |
As long as only the |
To clarify, Regarding the If the method above works, I think I know away to do it without using |
Sorry, wasn't clear. I mean that the structs could store the table to operate on not as string, but as a reference to another struct. When the actual query has to be sent to the DB engine, a recursive function would evaluate these references to the actual query string. In select * from {{ ref('other_table') }} limit 10 where Maybe none of this is necessary. I don't yet sufficiently understand the architecture. Why does the second query modify the first? I understand that each command is a CTE and these are chained together. But why is the incoming CTE mutable? |
Thanks for the additional dialog. I think dbt and TidierDB have some similarities but also some crucial differences. Will weigh in soon with thoughts once I have some time to write them down. |
I will have to do some more reading on dbt, because I am a bit unfamiliar with it. The reason the second query above modified the The main reason the incoming cte struct is mutable is that when I was building the package, I added the cte struct later as I realized I needed it to build more complex queries, so I followed the pattern of the original SQL query struct. It is possible that it does not need to be mutable, but I am not 100% sure. |
Hi @korenmiklos , so I have put together a more memory friendly version that only uses If this fits your workflow need (and then likely that of others as well), I think it is worth having as part of the TidierDB ecosystem, I will add docs and put in the next release, but first it needs a name. I think perhaps
|
I am going to close this for now now that |
I understand the @show_query macro only shows the query, does not return anything. It would be great to return the SQL query to be used in CTE expressions in future queries. Something like:
I'd be happy to work on this if you give me some pointers.
The text was updated successfully, but these errors were encountered: