You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the xmin pseudo system column is used as watermark column for the initial load of a table. When the table is very large and it takes more than 6 hours to ingest the data it presents one of the following problems:
Job gets stuck on ingesting xmin because it never processes all rows with same xmin in a 6 hour window. This happens, depending on the number of columns in the table, around 50M rows with the same xmin.
Job does proceed on ingesting xmin with many rows but it typically takes two attempts. The first attempt happens near the end of the runtime window. This attempt will fail, the 2nd attempt will pass because there is a more runtime available because it's the first xmin being processed. However, the fist attempt will have loaded rows into the destination table and hence there will be duplicate data in the destination table that needs to be manually cleaned.
Adding support for a secondary watermark, either a timestamp column or an ID field will prevent both problems from happening.
The text was updated successfully, but these errors were encountered:
Currently, the
xmin
pseudo system column is used as watermark column for the initial load of a table. When the table is very large and it takes more than 6 hours to ingest the data it presents one of the following problems:Adding support for a secondary watermark, either a timestamp column or an ID field will prevent both problems from happening.
The text was updated successfully, but these errors were encountered: