Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Incremental compilation error not handling column with quotes #4422

Closed
1 task done
ChristopheDuong opened this issue Dec 3, 2021 · 5 comments
Closed
1 task done
Labels
bug Something isn't working stale Issues that have gone stale Team:Adapters Issues designated for the adapter area of the code

Comments

@ChristopheDuong
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

I've created a repository with multiple tests that summarizes compilation errors with SQL queries generated by dbt when running incremental models at: https://github.com/airbytehq/testing_dbt

One of the issues occurs when the models contain columns which identifiers include quoting characters and need to be referred to with quoting characters in the SQL queries:

column`_'with"_quotes

The model file is:

select
    'a column with tricky name with quotes' as  {{ adapter.quote('column`_\'with""_quotes') }},
    {{ current_timestamp() }} as insert_time

see https://github.com/airbytehq/testing_dbt/blob/2d8f87b721e13122dd5fd9fa8088ee7734f9017e/first_project/models/column_with_quotes.sql

  • Table creation works properly while quoting identifiers.
create  table "integrationtests"."test_dbt"."column_with_quotes__dbt_tmp"  as (
    select
    'a column with tricky name with quotes' as  "column`_'with""_quotes",
    getdate() as insert_time
);

see https://github.com/airbytehq/testing_dbt/blob/2d8f87b721e13122dd5fd9fa8088ee7734f9017e/output/postgres-1-full-refresh-SUCCESS-dbt.log#L454

  • But insert into queries fails when using incremental materialization
     insert into "integrationtests"."test_dbt"."column_with_quotes" ("column`_'with"_quotes", "insert_time")
    (
         select "column`_'with"_quotes", "insert_time"
         from "column_with_quotes__dbt_tmp205201621301"
    );

Postgres error: syntax error at or near "_quotes"
LINE 5: ..."."test_dbt"."column_with_quotes" ("column`_'with"_quotes", ...

see https://github.com/airbytehq/testing_dbt/blob/2d8f87b721e13122dd5fd9fa8088ee7734f9017e/output/postgres-2-incremental-FAILED-dbt.log#L2082

Expected Behavior

The expected SQL query should be:

insert into "integrationtests"."test_dbt"."column_with_quotes" ("column`_'with""_quotes", "insert_time")
    (
         select "column`_'with""_quotes", "insert_time"
         from "column_with_quotes__dbt_tmp205201621301"
    );

Steps To Reproduce

  1. in https://github.com/airbytehq/testing_dbt/
  2. ./run_test.sh

Relevant log output

No response

Environment

- OS: mac Os & docker (fishtownanalytics/dbt)
- Python: 3.8.12
- dbt: 0.21.0

What database are you using dbt with?

postgres, redshift, snowflake, other (mention it in "Additional Context")

Additional Context

The error occurs with:

  • postgres adapter
  • redshift
  • snowflake
  • sqlserver
@ChristopheDuong ChristopheDuong added bug Something isn't working triage labels Dec 3, 2021
@jtcohen6 jtcohen6 added the Team:Adapters Issues designated for the adapter area of the code label Dec 13, 2021
@jtcohen6 jtcohen6 removed the triage label May 3, 2022
@jtcohen6
Copy link
Contributor

jtcohen6 commented May 3, 2022

@ChristopheDuong Thanks for opening the thoughtful issues, here and in dbt-labs/dbt-adapters#160, with clear reproduction cases! Sorry for the long delay on our end—we've been re-organizing our collective triage, and a few got lost in the shuffle.

This is definitely an edge case, and that's all the more reason to handle it. The tighter that dbt can be at handling these types of edge cases, the better for other tools (such as Airbyte) who wish to reliably perform the same operations across databases.

This edge case crops up if a database's quoting characters (most often "" or ``) appear within the identifier name itself. We need to escape the quote character in the column identifier with yet another quote character.

(On some databases, including a special character in the identifier name just isn't allowed to begin with: BigQuery, Spark. On others, such as Snowflake, we highly highly discourage it.)

To be more precise, this is cropping up in the get_quoted_csv macro ("get quoted columns as comma-separated values"). That doesn't do anything to add an escape character in this edge case.

{%- set dest_cols_csv = get_quoted_csv(dest_columns | map(attribute="name")) -%}

{% macro get_quoted_csv(column_names) %}
{% set quoted = [] %}
{% for col in column_names -%}
{%- do quoted.append(adapter.quote(col)) -%}
{%- endfor %}
{%- set dest_cols_csv = quoted | join(', ') -%}
{{ return(dest_cols_csv) }}
{% endmacro %}

I think we have two options here:

  1. Add logic to get_quoted_csv, in Jinja, that checks for and escapes any quote characters in the column name. While it might be gnarly to write, the appeal of this approach is that it requires a change only to this macro, in dbt-core. It works so long as all databases escape quote characters the same (extra " before ", extra backtick before backtick, and so on).
  2. Update the get_quoted_csv macro to use the quoted attribute of each column (instead of name). Then, update the quoted attribute to handle this escaping:

@property
def quoted(self) -> str:
return '"{}"'.format(self.column)

Reasons to prefer option 2:

  • This string-munging logic is written in Python, instead of Jinja, so it's much easier to sense-check and unit-test
  • If different databases do this type of quote escaping differently, they can reflect that in their quoted attribute logic

Main downside: This would require extension to any databases that have implemented their own Column.quote. (I know BigQuery + Spark both have, but they don't support this type of identifier naming to begin with!)

(If the distinction between adapter.quote and Column.quoted seems confusing, and potentially duplicative — I agree: #2986)

In any case, the test case for this is definitely something we'd want to have in the "adapter zone" of test cases, for extension and validation on other adapters.

Is this something you might be interested in helping us sort out?

@github-actions
Copy link
Contributor

github-actions bot commented Nov 1, 2022

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Nov 1, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Nov 9, 2022

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest; add a comment to notify the maintainers.

@github-actions github-actions bot closed this as completed Nov 9, 2022
@bm1549
Copy link

bm1549 commented Feb 2, 2023

I'm seeing this issue on Redshift as well. Has any of the pre-work been done for this yet?

@dbeatty10
Copy link
Contributor

Re-labeling to "closed as stale".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale Issues that have gone stale Team:Adapters Issues designated for the adapter area of the code
Projects
None yet
Development

No branches or pull requests

4 participants