Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAP-1006] [CT-3321] DBT Models are not using Snowflake Result Cache/Query Cache #825

Closed
2 tasks done
ankitsr92 opened this issue Nov 6, 2023 · 6 comments
Closed
2 tasks done
Labels
bug Something isn't working wontfix This will not be worked on

Comments

@ankitsr92
Copy link

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Snowflake Query Cache states that if the a query is run multiple time within 24 hrs window across any warehouse result would be produced through result cache. Given the source table data has not changed.

When it comes to DBT Models, I executed my models once. Upon executing the models again in next 5 min I still see model sql's being executed and taking the same amount of time as Run 1. The base tables again have not changed this time.

Is it not contrary to Snowflake behaviour? Is it something with DBT backend code which is restricting this?

Expected Behavior

Upon re-run dbt models again within 24 hrs time and with unchanged base tables, Snowflake query cache should be used and it should take considerably less amount of time compared to first run.

Steps To Reproduce

Run any DBT Model on Snowflake
Run the model again post Run 1 completion

  • Make sure base tables are not changed

Relevant log output

No response

Environment

- OS: Ubuntu 20.04
- Python: 3.9.12
- dbt-core: 1.5

Which database adapter are you using with dbt?

snowflake

Additional Context

No response

@ankitsr92 ankitsr92 added bug Something isn't working triage labels Nov 6, 2023
@github-actions github-actions bot changed the title DBT Models are not using Snowflake Result Cache/Query Cache [CT-3321] DBT Models are not using Snowflake Result Cache/Query Cache Nov 6, 2023
@b-per
Copy link
Contributor

b-per commented Nov 6, 2023

Did your Snowflake Warehouse turn off between the 2 calls? (e.g. you might have a setting to auto suspend it after 1 min of inactivity). I believe that the cache is only valid if the Warehouse was not turned off.

@ankitsr92
Copy link
Author

@b-per No the warehouse didn't suspend between the runs. Even if it would have, Result Cache is not dependent on Warehouse activity but only the 24 hrs time window and base table micro partitions.

https://docs.snowflake.com/en/user-guide/querying-persisted-results

@b-per
Copy link
Contributor

b-per commented Nov 6, 2023

Thanks for the update. I got confused with the Warehouse cache which is only available for the time a warehouse is on. Warehouse caching can definitely help get some following queries run faster.

I don't think that there is anything that dbt can do with what you see.

I went directly in Snowflake (so, not using dbt), and ran multiple times in a row create or replace table xyz as <my_sql_code> and Snowflake didn't use the result cache.

If I just run <my_sql_code> without the create table as, then the results get cached after the first run.

So, reusing persisted results doesn't seem to be possible when used as part of DDL statements.

@dbeatty10 dbeatty10 transferred this issue from dbt-labs/dbt-core Nov 6, 2023
@github-actions github-actions bot changed the title [CT-3321] DBT Models are not using Snowflake Result Cache/Query Cache [ADAP-1006] [CT-3321] DBT Models are not using Snowflake Result Cache/Query Cache Nov 6, 2023
@dbeatty10
Copy link
Contributor

Thanks for reaching out about this @ankitsr92.

Based on your description, it sounds like you are doing the following?

  1. Execute all the models (presumably each of the underlying queries is cached by Snowflake)
  2. Execute all the models again
  3. Notice that none of queries resulted in a cache hit

If so, then what happens if you do the following instead?

  1. Execute only a single model
    • i.e., dbt run --select my_model
  2. Execute only that same model again
    • i.e., dbt run --select my_model

Does it result in a cache hit or a cache miss?

@Dikootje
Copy link

Adding a test scenario:

-- create some dummy data
create or replace table test as 
select dateadd(day, 
            row_number() 
                over (order by seq4()),
                '1899-12-31'::date) as d_date 
from table(generator(rowcount => 109573));

select * from test; -- uses warehouse to compute
select * from test; -- no warehouse because of result cache

create or replace table test2 as 
select * from test; -- uses warehouse

create or replace table test2 as 
select * from test; -- uses warehouse

Looking in the query history:
image

The first select query (second to last one in the screenshot) is using the warehouse to return data. The third to last one doesnt use a warehouse because the query is returned from the result cache.

Adding a create or replace always triggers the usage of a warehouse. So I dont think this is a bug in dbt-snowflake

@dbeatty10
Copy link
Contributor

Based on the insights and research by @b-per and @Dikootje, it doesn't look like this is a bug in dbt-snowflake but rather just how "create table as" (CTAS) statements interact with the Snowflake query cache.

So I'm going to close this as "not planned".

@dbeatty10 dbeatty10 closed this as not planned Won't fix, can't repro, duplicate, stale Nov 16, 2023
@dbeatty10 dbeatty10 added wontfix This will not be worked on and removed triage labels Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

4 participants