Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow_partials option is updating last completed interval too #3859

Open
mpcarter opened this issue Feb 18, 2025 · 0 comments
Open

Allow_partials option is updating last completed interval too #3859

mpcarter opened this issue Feb 18, 2025 · 0 comments

Comments

@mpcarter
Copy link
Contributor

When using the allow_partials true option in an INCREMENTAL_BY_TIME_RANGE model, SQLMesh is refreshing the most recent completed interval along with the current incomplete (partial) interval.

I have not fully vetted if this happens generally, but I observed this happen with a scenario similar to this model:

MODEL (
    name dw.monthly_stats ,
    kind INCREMENTAL_BY_TIME_RANGE (
        time_column as_of_month ,
        batch_size 1 
    ), 
    cron '@daily' ,
    interval_unit month ,
    allow_partials true 
);

The intent is to have the current month interval refresh daily.

As of writing this, 2/18/2025 using SQLMesh v0.156.0, here is the sequence of events I observed:

  1. Initially load the model- For each month before 1/1/2025, SQLMesh runs a query using the first and last date of the month, as expected e.g.
    1/1/2024 - 1/31/2024, ... , 11/1/2024 - 11/30/2024, 12/1/2024 - 12/31/2024.
  2. During initial load still- For Jan and Feb 2025, the last complete interval and current interval, SQLMesh runs one query using 1/1/2025 - 2/xx/2025 (xx being dependent on what day in Feb initial load is done).
  3. Running sqlmesh plan --restate-model -s 2025-01-01 -e 2025-02-xx results in same effect. One query runs to populate both intervals.
  4. Running sqlmesh plan --restate-model -s 2025-01-01 -e 2025-01-31 results in correct effect of populating just the Jan 2025 interval.
  5. Running sqlmesh run (the day after initial load) results in SQLMesh again running one query using 1/1/2025 - 2/xx+1/2025 to populate both intervals despite the fact that the Jan 2025 interval is already loaded

What I would expect to occur is that the initial load would populate Jan 2025 separately from Feb 2025 (the current partial interval) using 1/1/2025 - 1/31/2025 for Jan, and 2/1/2025 - 2/xx/2025 for Feb.
For subsequent sqlmesh run I would expect that only Feb 2025 would be refreshed using 2/1/2025 - 2/xx/2025.

Actually, on that last point, it would be preferable if the end date was always based on the boundaries of the intervals. E.g. in this case where interval_unit month, the current partial interval Feb 2025 would always use 2/1/2025 - 2/28/2025 each day it is refreshed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant