Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude UCX jobs from crawling #3733

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

JCZuurmond
Copy link
Member

@JCZuurmond JCZuurmond commented Feb 21, 2025

Changes

Exclude UCX jobs from crawling to avoid confusing for users when they see UCX jobs in their assessment report.

Linked issues

Fixes #3656
Resolves #3722
Follow up on #3732
Relates to #3731

Functionality

  • modified JobsCrawler
  • modified existing workflow: assessment

Tests

  • added unit tests
  • added integration tests

@JCZuurmond JCZuurmond added step/assessment go/uc/upgrade - Assessment Step migrate/jobs Step 5 - Upgrading Jobs for External Tables labels Feb 21, 2025
@JCZuurmond JCZuurmond self-assigned this Feb 21, 2025
@JCZuurmond JCZuurmond requested a review from a team as a code owner February 21, 2025 13:59
Copy link

❌ 81/83 passed, 3 flaky, 2 failed, 10 skipped, 5h33m19s total

❌ test_all_grant_types: AssertionError: assert {('CATALOG', ...dummy_txxiv')} == {('ANONYMOUS ..._fchlb'), ...} (11m51.03s)
AssertionError: assert {('CATALOG', ...dummy_txxiv')} == {('ANONYMOUS ..._fchlb'), ...}
  
  Extra items in the right set:
  ('ANONYMOUS FUNCTION', None)
  ('ANY FILE', None)
  
  Full diff:
    {
  -     (
  -         'ANONYMOUS FUNCTION',
  -         None,
  -     ),
  -     (
  -         'ANY FILE',
  -         None,
  -     ),
        (
            'CATALOG',
            'hive_metastore',
        ),
        (
            'DATABASE',
            'hive_metastore.dummy_s50rl',
        ),
        (
            'TABLE',
            'hive_metastore.dummy_s50rl.dummy_tjaoq',
        ),
        (
            'UDF',
            'hive_metastore.dummy_s50rl.dummy_fchlb',
        ),
        (
            'VIEW',
            'hive_metastore.dummy_s50rl.dummy_txxiv',
        ),
    }
14:05 INFO [databricks.labs.ucx.install] Creating ucx schemas...
[gw5] linux -- Python 3.10.16 /home/runner/work/ucx/ucx/.venv/bin/python
14:05 INFO [databricks.labs.ucx.install] Creating ucx schemas...
14:06 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_se7xb.grants] fetching grants inventory
14:06 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_se7xb.grants] crawling new set of snapshot data for grants
14:06 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_se7xb.tables] fetching tables inventory
14:06 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_se7xb.tables] crawling new set of snapshot data for tables
14:06 DEBUG [databricks.labs.ucx.hive_metastore.tables] [hive_metastore.dummy_s50rl] listing tables and views
14:06 DEBUG [databricks.labs.ucx.hive_metastore.tables] [hive_metastore.dummy_s50rl.dummy_tjaoq] fetching table metadata
14:06 DEBUG [databricks.labs.ucx.hive_metastore.tables] [hive_metastore.dummy_s50rl.dummy_txxiv] fetching table metadata
14:06 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_se7xb.tables] found 2 new records for tables
14:06 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_se7xb.udfs] fetching udfs inventory
14:06 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_se7xb.udfs] crawling new set of snapshot data for udfs
14:06 DEBUG [databricks.labs.ucx.hive_metastore.udfs] [hive_metastore.dummy_s50rl] listing udfs
14:06 DEBUG [databricks.labs.ucx.hive_metastore.udfs] [hive_metastore.dummy_s50rl.dummy_fchlb] fetching udf metadata
14:06 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_se7xb.udfs] found 1 new records for udfs
14:16 ERROR [databricks.labs.ucx.hive_metastore.grants] Couldn't fetch grants for object ANONYMOUS FUNCTION : TEMPORARILY_UNAVAILABLE: The service at /api/2.0/sql-acl/get-permissions is taking too long to process your request. Please try again later or try a faster operation. [TraceId: 00-fcf07fe18a72808e83f6222802abff8f-1cf94c48724ce585-00]
14:17 ERROR [databricks.labs.ucx.hive_metastore.grants] Couldn't fetch grants for object ANY FILE : TEMPORARILY_UNAVAILABLE: The service at /api/2.0/sql-acl/get-permissions is taking too long to process your request. Please try again later or try a faster operation. [TraceId: 00-4cf5cd06c9bb18539409e973256886ff-99a027d6abdf5c05-00]
14:17 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_se7xb.grants] found 41 new records for grants
14:05 INFO [databricks.labs.ucx.install] Creating ucx schemas...
14:06 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_se7xb.grants] fetching grants inventory
14:06 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_se7xb.grants] crawling new set of snapshot data for grants
14:06 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_se7xb.tables] fetching tables inventory
14:06 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_se7xb.tables] crawling new set of snapshot data for tables
14:06 DEBUG [databricks.labs.ucx.hive_metastore.tables] [hive_metastore.dummy_s50rl] listing tables and views
14:06 DEBUG [databricks.labs.ucx.hive_metastore.tables] [hive_metastore.dummy_s50rl.dummy_tjaoq] fetching table metadata
14:06 DEBUG [databricks.labs.ucx.hive_metastore.tables] [hive_metastore.dummy_s50rl.dummy_txxiv] fetching table metadata
14:06 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_se7xb.tables] found 2 new records for tables
14:06 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_se7xb.udfs] fetching udfs inventory
14:06 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_se7xb.udfs] crawling new set of snapshot data for udfs
14:06 DEBUG [databricks.labs.ucx.hive_metastore.udfs] [hive_metastore.dummy_s50rl] listing udfs
14:06 DEBUG [databricks.labs.ucx.hive_metastore.udfs] [hive_metastore.dummy_s50rl.dummy_fchlb] fetching udf metadata
14:06 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_se7xb.udfs] found 1 new records for udfs
14:16 ERROR [databricks.labs.ucx.hive_metastore.grants] Couldn't fetch grants for object ANONYMOUS FUNCTION : TEMPORARILY_UNAVAILABLE: The service at /api/2.0/sql-acl/get-permissions is taking too long to process your request. Please try again later or try a faster operation. [TraceId: 00-fcf07fe18a72808e83f6222802abff8f-1cf94c48724ce585-00]
14:17 ERROR [databricks.labs.ucx.hive_metastore.grants] Couldn't fetch grants for object ANY FILE : TEMPORARILY_UNAVAILABLE: The service at /api/2.0/sql-acl/get-permissions is taking too long to process your request. Please try again later or try a faster operation. [TraceId: 00-4cf5cd06c9bb18539409e973256886ff-99a027d6abdf5c05-00]
14:17 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_se7xb.grants] found 41 new records for grants
[gw5] linux -- Python 3.10.16 /home/runner/work/ucx/ucx/.venv/bin/python
❌ test_all_grants_for_other_objects: AssertionError: assert {'MODIFY', 'SELECT'} == set() (11m35.205s)
AssertionError: assert {'MODIFY', 'SELECT'} == set()
  
  Extra items in the left set:
  'MODIFY'
  'SELECT'
  
  Full diff:
  - set()
  + {
  +     'MODIFY',
  +     'SELECT',
  + }
[gw7] linux -- Python 3.10.16 /home/runner/work/ucx/ucx/.venv/bin/python
14:07 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_sjgpd.grants] fetching grants inventory
14:07 DEBUG [databricks.labs.ucx.framework.crawlers] Inventory table not found
Traceback (most recent call last):
  File "/home/runner/work/ucx/ucx/src/databricks/labs/ucx/framework/crawlers.py", line 152, in _snapshot
    cached_results = list(fetcher())
  File "/home/runner/work/ucx/ucx/src/databricks/labs/ucx/hive_metastore/grants.py", line 239, in _try_fetch
    for row in self._fetch(f"SELECT * FROM {escape_sql_identifier(self.full_name)}"):
  File "/home/runner/work/ucx/ucx/.venv/lib/python3.10/site-packages/databricks/labs/lsql/core.py", line 344, in fetch_all
    execute_response = self.execute(
  File "/home/runner/work/ucx/ucx/.venv/lib/python3.10/site-packages/databricks/labs/lsql/core.py", line 268, in execute
    self._raise_if_needed(status)
  File "/home/runner/work/ucx/ucx/.venv/lib/python3.10/site-packages/databricks/labs/lsql/core.py", line 478, in _raise_if_needed
    raise NotFound(error_message)
databricks.sdk.errors.platform.NotFound: [TABLE_OR_VIEW_NOT_FOUND] The table or view `hive_metastore`.`dummy_sjgpd`.`grants` cannot be found. Verify the spelling and correctness of the schema and catalog.
If you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog.
To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS. SQLSTATE: 42P01; line 1 pos 14
14:07 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_sjgpd.grants] crawling new set of snapshot data for grants
14:07 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_sjgpd.tables] fetching tables inventory
14:07 DEBUG [databricks.labs.ucx.framework.crawlers] Inventory table not found
Traceback (most recent call last):
  File "/home/runner/work/ucx/ucx/src/databricks/labs/ucx/framework/crawlers.py", line 152, in _snapshot
    cached_results = list(fetcher())
  File "/home/runner/work/ucx/ucx/src/databricks/labs/ucx/hive_metastore/tables.py", line 458, in _try_fetch
    for row in self._fetch(f"SELECT * FROM {escape_sql_identifier(self.full_name)}"):
  File "/home/runner/work/ucx/ucx/.venv/lib/python3.10/site-packages/databricks/labs/lsql/core.py", line 344, in fetch_all
    execute_response = self.execute(
  File "/home/runner/work/ucx/ucx/.venv/lib/python3.10/site-packages/databricks/labs/lsql/core.py", line 268, in execute
    self._raise_if_needed(status)
  File "/home/runner/work/ucx/ucx/.venv/lib/python3.10/site-packages/databricks/labs/lsql/core.py", line 478, in _raise_if_needed
    raise NotFound(error_message)
databricks.sdk.errors.platform.NotFound: [TABLE_OR_VIEW_NOT_FOUND] The table or view `hive_metastore`.`dummy_sjgpd`.`tables` cannot be found. Verify the spelling and correctness of the schema and catalog.
If you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog.
To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS. SQLSTATE: 42P01; line 1 pos 14
14:07 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_sjgpd.tables] crawling new set of snapshot data for tables
14:07 DEBUG [databricks.labs.ucx.hive_metastore.tables] [hive_metastore.dummy_ss3mo] listing tables and views
14:07 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_sjgpd.tables] found 0 new records for tables
14:07 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_sjgpd.udfs] fetching udfs inventory
14:07 DEBUG [databricks.labs.ucx.framework.crawlers] Inventory table not found
Traceback (most recent call last):
  File "/home/runner/work/ucx/ucx/src/databricks/labs/ucx/framework/crawlers.py", line 152, in _snapshot
    cached_results = list(fetcher())
  File "/home/runner/work/ucx/ucx/src/databricks/labs/ucx/hive_metastore/udfs.py", line 63, in _try_fetch
    for row in self._fetch(f"SELECT * FROM {escape_sql_identifier(self.full_name)}"):
  File "/home/runner/work/ucx/ucx/.venv/lib/python3.10/site-packages/databricks/labs/lsql/core.py", line 344, in fetch_all
    execute_response = self.execute(
  File "/home/runner/work/ucx/ucx/.venv/lib/python3.10/site-packages/databricks/labs/lsql/core.py", line 268, in execute
    self._raise_if_needed(status)
  File "/home/runner/work/ucx/ucx/.venv/lib/python3.10/site-packages/databricks/labs/lsql/core.py", line 478, in _raise_if_needed
    raise NotFound(error_message)
databricks.sdk.errors.platform.NotFound: [TABLE_OR_VIEW_NOT_FOUND] The table or view `hive_metastore`.`dummy_sjgpd`.`udfs` cannot be found. Verify the spelling and correctness of the schema and catalog.
If you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog.
To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS. SQLSTATE: 42P01; line 1 pos 14
14:07 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_sjgpd.udfs] crawling new set of snapshot data for udfs
14:07 DEBUG [databricks.labs.ucx.hive_metastore.udfs] [hive_metastore.dummy_ss3mo] listing udfs
14:07 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_sjgpd.udfs] found 0 new records for udfs
14:18 ERROR [databricks.labs.ucx.hive_metastore.grants] Couldn't fetch grants for object ANONYMOUS FUNCTION : TEMPORARILY_UNAVAILABLE: The service at /api/2.0/sql-acl/get-permissions is taking too long to process your request. Please try again later or try a faster operation. [TraceId: 00-17322d0c21629b638c0a303c92627b05-0ad058ab5db287f5-00]
14:18 ERROR [databricks.labs.ucx.hive_metastore.grants] Couldn't fetch grants for object ANY FILE : TEMPORARILY_UNAVAILABLE: The service at /api/2.0/sql-acl/get-permissions is taking too long to process your request. Please try again later or try a faster operation. [TraceId: 00-c646cd165c259c37485a4e164464f0b8-95a142cb1c76f3f1-00]
14:18 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_sjgpd.grants] found 2 new records for grants
14:07 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_sjgpd.grants] fetching grants inventory
14:07 DEBUG [databricks.labs.ucx.framework.crawlers] Inventory table not found
Traceback (most recent call last):
  File "/home/runner/work/ucx/ucx/src/databricks/labs/ucx/framework/crawlers.py", line 152, in _snapshot
    cached_results = list(fetcher())
  File "/home/runner/work/ucx/ucx/src/databricks/labs/ucx/hive_metastore/grants.py", line 239, in _try_fetch
    for row in self._fetch(f"SELECT * FROM {escape_sql_identifier(self.full_name)}"):
  File "/home/runner/work/ucx/ucx/.venv/lib/python3.10/site-packages/databricks/labs/lsql/core.py", line 344, in fetch_all
    execute_response = self.execute(
  File "/home/runner/work/ucx/ucx/.venv/lib/python3.10/site-packages/databricks/labs/lsql/core.py", line 268, in execute
    self._raise_if_needed(status)
  File "/home/runner/work/ucx/ucx/.venv/lib/python3.10/site-packages/databricks/labs/lsql/core.py", line 478, in _raise_if_needed
    raise NotFound(error_message)
databricks.sdk.errors.platform.NotFound: [TABLE_OR_VIEW_NOT_FOUND] The table or view `hive_metastore`.`dummy_sjgpd`.`grants` cannot be found. Verify the spelling and correctness of the schema and catalog.
If you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog.
To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS. SQLSTATE: 42P01; line 1 pos 14
14:07 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_sjgpd.grants] crawling new set of snapshot data for grants
14:07 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_sjgpd.tables] fetching tables inventory
14:07 DEBUG [databricks.labs.ucx.framework.crawlers] Inventory table not found
Traceback (most recent call last):
  File "/home/runner/work/ucx/ucx/src/databricks/labs/ucx/framework/crawlers.py", line 152, in _snapshot
    cached_results = list(fetcher())
  File "/home/runner/work/ucx/ucx/src/databricks/labs/ucx/hive_metastore/tables.py", line 458, in _try_fetch
    for row in self._fetch(f"SELECT * FROM {escape_sql_identifier(self.full_name)}"):
  File "/home/runner/work/ucx/ucx/.venv/lib/python3.10/site-packages/databricks/labs/lsql/core.py", line 344, in fetch_all
    execute_response = self.execute(
  File "/home/runner/work/ucx/ucx/.venv/lib/python3.10/site-packages/databricks/labs/lsql/core.py", line 268, in execute
    self._raise_if_needed(status)
  File "/home/runner/work/ucx/ucx/.venv/lib/python3.10/site-packages/databricks/labs/lsql/core.py", line 478, in _raise_if_needed
    raise NotFound(error_message)
databricks.sdk.errors.platform.NotFound: [TABLE_OR_VIEW_NOT_FOUND] The table or view `hive_metastore`.`dummy_sjgpd`.`tables` cannot be found. Verify the spelling and correctness of the schema and catalog.
If you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog.
To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS. SQLSTATE: 42P01; line 1 pos 14
14:07 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_sjgpd.tables] crawling new set of snapshot data for tables
14:07 DEBUG [databricks.labs.ucx.hive_metastore.tables] [hive_metastore.dummy_ss3mo] listing tables and views
14:07 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_sjgpd.tables] found 0 new records for tables
14:07 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_sjgpd.udfs] fetching udfs inventory
14:07 DEBUG [databricks.labs.ucx.framework.crawlers] Inventory table not found
Traceback (most recent call last):
  File "/home/runner/work/ucx/ucx/src/databricks/labs/ucx/framework/crawlers.py", line 152, in _snapshot
    cached_results = list(fetcher())
  File "/home/runner/work/ucx/ucx/src/databricks/labs/ucx/hive_metastore/udfs.py", line 63, in _try_fetch
    for row in self._fetch(f"SELECT * FROM {escape_sql_identifier(self.full_name)}"):
  File "/home/runner/work/ucx/ucx/.venv/lib/python3.10/site-packages/databricks/labs/lsql/core.py", line 344, in fetch_all
    execute_response = self.execute(
  File "/home/runner/work/ucx/ucx/.venv/lib/python3.10/site-packages/databricks/labs/lsql/core.py", line 268, in execute
    self._raise_if_needed(status)
  File "/home/runner/work/ucx/ucx/.venv/lib/python3.10/site-packages/databricks/labs/lsql/core.py", line 478, in _raise_if_needed
    raise NotFound(error_message)
databricks.sdk.errors.platform.NotFound: [TABLE_OR_VIEW_NOT_FOUND] The table or view `hive_metastore`.`dummy_sjgpd`.`udfs` cannot be found. Verify the spelling and correctness of the schema and catalog.
If you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog.
To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS. SQLSTATE: 42P01; line 1 pos 14
14:07 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_sjgpd.udfs] crawling new set of snapshot data for udfs
14:07 DEBUG [databricks.labs.ucx.hive_metastore.udfs] [hive_metastore.dummy_ss3mo] listing udfs
14:07 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_sjgpd.udfs] found 0 new records for udfs
14:18 ERROR [databricks.labs.ucx.hive_metastore.grants] Couldn't fetch grants for object ANONYMOUS FUNCTION : TEMPORARILY_UNAVAILABLE: The service at /api/2.0/sql-acl/get-permissions is taking too long to process your request. Please try again later or try a faster operation. [TraceId: 00-17322d0c21629b638c0a303c92627b05-0ad058ab5db287f5-00]
14:18 ERROR [databricks.labs.ucx.hive_metastore.grants] Couldn't fetch grants for object ANY FILE : TEMPORARILY_UNAVAILABLE: The service at /api/2.0/sql-acl/get-permissions is taking too long to process your request. Please try again later or try a faster operation. [TraceId: 00-c646cd165c259c37485a4e164464f0b8-95a142cb1c76f3f1-00]
14:18 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_sjgpd.grants] found 2 new records for grants
[gw7] linux -- Python 3.10.16 /home/runner/work/ucx/ucx/.venv/bin/python

Flaky tests:

  • 🤪 test_table_migration_job_refreshes_migration_status[hiveserde-migrate-external-hiveserde-tables-in-place-experimental] (8m36.302s)
  • 🤪 test_hiveserde_table_ctas_migration_job (6m21.629s)
  • 🤪 test_hiveserde_table_in_place_migration_job[migrate-external-tables-ctas] (3m31.742s)

Running from acceptance #8368

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
migrate/jobs Step 5 - Upgrading Jobs for External Tables step/assessment go/uc/upgrade - Assessment Step
Projects
Status: No status
1 participant