Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quoting of column names in nested fields #71

Closed
philipp-heinrich opened this issue Nov 19, 2021 · 6 comments
Closed

quoting of column names in nested fields #71

philipp-heinrich opened this issue Nov 19, 2021 · 6 comments
Labels
Stale type:bug Something isn't working

Comments

@philipp-heinrich
Copy link

Describe the bug

the column type is overridden as a BigQueryColumn in BigQuery dbt projects

However, column names in nested fields are not quoted, which is ok for most coloums. However, in cases where reserved keywords are used as column names, the SQL code becomes invalid.

In order to circumvent this issue, all columns should be quoted.

There is already a function called "quoted".

In my opinion, line 86 should be changed, so that column names are quoted.

    @property
    def quoted(self):
        return '`{}`'.format(self.column)

    def literal(self, value):
        return "cast({} as {})".format(value, self.dtype)

    @property
    def data_type(self) -> str:
        if self.dtype.upper() == 'RECORD':
            subcols = [
                "{} {}".format(col.name, col.data_type) for col in self.fields
            ]
            field_type = 'STRUCT<{}>'.format(", ".join(subcols))

        else:
            field_type = self.dtype

        if self.mode.upper() == 'REPEATED':
            return 'ARRAY<{}>'.format(field_type)

        else:
            return field_type

Steps To Reproduce

We encoutered the issue, when running the dbt_utils.union_relations makro, which joined multiple nested tables, which had coloum names such as from and where deeply in their nested field structure.

Expected behavior

All column names (including the ones from nested fields) should be quoted, in order to prevent issues with reserved Keywords

Screenshots and log output

System information

The output of dbt --version:

installed version: 0.19.1
   latest version: 0.21.0

Plugins:
  - redshift: 0.19.1
  - snowflake: 0.19.1
  - bigquery: 0.19.1
  - postgres: 0.19.1

The operating system you're using:
Ubuntu

The output of python --version:
Python 3.8.10

Additional context

Add any other context about the problem here.

@philipp-heinrich philipp-heinrich added type:bug Something isn't working triage:product labels Nov 19, 2021
@philipp-heinrich
Copy link
Author

example of rendered sql code, which has no quoting:
Screenshot 2021-11-19 at 14 03 40

@jtcohen6
Copy link
Contributor

@philipp-heinrich Sure, this makes sense to me! There's no downside (that I'm aware of) to quoting quite aggressively on BigQuery.

It sounds like you already found the spot in the code where the change would need to happen. I'm going to mark this a good first issue. Would you be interested in giving it a go? You could start by forking the repo, cloning + installing in a local virtualenv (pip install -e .), making the change, and confirming that it works in your local project. From there, we'd want to add an integration test case, to be sure that we don't accidentally regress from the fix in the future.

@DigUpTheHatchet
Copy link
Contributor

DigUpTheHatchet commented Nov 20, 2021

@philipp-heinrich Sure, this makes sense to me! There's no downside (that I'm aware of) to quoting quite aggressively on BigQuery.

It sounds like you already found the spot in the code where the change would need to happen. I'm going to mark this a good first issue. Would you be interested in giving it a go? You could start by forking the repo, cloning + installing in a local virtualenv (pip install -e .), making the change, and confirming that it works in your local project. From there, we'd want to add an integration test case, to be sure that we don't accidentally regress from the fix in the future.

I'd be happy to contribute this change as well @philipp-heinrich (if you weren't planning to, let me know) :)

@jtcohen6 Quick question for you. I found instances of adapter.quote() in the core project, e.g. https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/include/global_project/macros/materializations/snapshots/snapshot.sql#L64

However it seems like in this BigQuery adapter (and I also checked the spark adapter) there is only a quoted function. I'm very new to contributing and I might be missing something simple with how the adapters work. Is this quoted function completely unrelated to the adapter.quote() calls in the core project?

@jtcohen6
Copy link
Contributor

jtcohen6 commented Nov 20, 2021

The quote method is defined on the BigQueryAdapter object here:

@classmethod
def quote(cls, identifier: str) -> str:
return '`{}`'.format(identifier)

Unfortunately, the naming here is pretty confusing: quote/quoted/quote_character + QuotePolicy/quoting are all needed for subtly different things (dbt-labs/dbt-core#2986), so there ends up being quite a bit of duplication:

quote_character: str = '`'

@property
def quoted(self):
return '`{}`'.format(self.column)

@philipp-heinrich
Copy link
Author

thanks @DigUpTheHatchet - i haven't had time yet, feel free to contribute. Let me know when you have something, so i can test it locally too.

@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale type:bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants