-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#374 Incremental Ingestion #487
Merged
Merged
Changes from all commits
Commits
Show all changes
55 commits
Select commit
Hold shift + click to select a range
f5211b4
#374 Add a table in bookeeping for storing offsets.
yruslan d6d1a13
#374 Add bookeeping interfaces for offset management.
yruslan 9b09f15
#374 Implement offset management DB operations.
yruslan 5bda4c5
#374 Improve offset management DB operations and add test suites.
yruslan 45fec57
#374 Bump up the minor version number since breaking changes are to b…
yruslan 9d79f40
#374 Add the notion of 'batchId' and 'getCurrentBatch' for the metast…
yruslan 86df4b5
#374 Remove parenthesis of several get methods.
yruslan 71201ac
#374 Add interfaces for sources to fetch data based on offsets.
yruslan 533bd93
#374 Add table reader interfaces for incremental processing.
yruslan 27ab3cd
#374 Add an end to end test for incremental processing.
yruslan 7329801
Fixup
yruslan ba63e16
#374 Add initial support for incremental transformers.
yruslan 0232e9c
#374 Make normal transformers compatible with incremental ingestion, …
yruslan 239e1f1
#374 Add support for reruns for incremental ingestion with offsets an…
yruslan 307c0f1
#374 Add support for historical runs, and for re-committing uncommitt…
yruslan a8a77dc
#374 Implement the offset type: 'datetime' for incremental ingestion.
yruslan d6fa73c
Update Jacoco.
yruslan fe246e5
#374 Add integrations tests missing from the last fixup.
yruslan 91fb48a
#374 Another fixup from compile warnings.
yruslan 1f1d7a9
#374 Fixed Spark 2.4.8 support in integration tests.
yruslan edd90b3
Update Jacoco report version
yruslan 78a0675
Make Jacoco take into account integration tests by including them in …
yruslan 567973a
#374 Add offset configuration for JDBC sources.
yruslan 3f59329
Update Scala, Spark versions for sbt builds.
yruslan 537b85d
#374 Fix a corner case of running incremental ingestion out of order …
yruslan 83b3c7a
#374 Simplify the incremental ingestion interface for data sources.
yruslan 8fe53f1
#374 Implement offset queries generation in SQL generators.
yruslan 5b8983c
#374 Add implementation for incremental ingestion from JDBC, and adde…
yruslan 8b4bfb1
#374 Add server timezone to JDBC reader configuration to accommodate …
yruslan 8b3ed71
#374 Fix the logic of incremental ingestion when information date is …
yruslan 858138f
#374 Improve email notifications for incremental operations.
yruslan 7da2627
#374 Calculate throughput based on appended records for incremental j…
yruslan 6f13121
#374 Fix the way retrospective updates are determined.
yruslan 451f556
#374 Add number of appended records to journal.
yruslan 1ec4946
Improve performance of querying timestamp fields for PostgreSQL and M…
yruslan aeac279
Fix the check for retrospective updates.
yruslan 64e47b1
#421 Allow transient non-cached jobs not return record count.
yruslan f861bf9
#374 Improve the description of the getCurrentBatch() metastore inter…
yruslan f3704b8
#374 Update README with the new feature.
yruslan f8d9d6e
#374 Add more tests for SQL generation related to offsets.
yruslan 7bce016
#374 Implement offset management based on inclusive intervals.
yruslan 99b3689
#374 Fix new incremental ingestion and integration tests to match inc…
yruslan fdea270
#374 Fix new incremental ingestion and integration tests to match inc…
yruslan 745315b
Fix a timing dependency of a unit test.
yruslan 30ee28b
#374 Remove minimum values for offset types.
yruslan d44a8db
#374 Fix a scenario when uncommitted offsets are not properly handled.
yruslan b3f91b1
#374 Refactor validation for the incremental ingestion job.
yruslan 7df79cd
Add unit tests for the incremental scheduling strategy
yruslan 9e86134
Remove the nasty 'var' and possible error related to it.
yruslan f6676f2
Update pramen/api/src/main/scala/za/co/absa/pramen/api/Source.scala
yruslan d5f3267
Update pramen/core/src/test/scala/za/co/absa/pramen/core/tests/utils/…
yruslan f3a8b35
Apply suggestions from code review
yruslan df88ce5
Fix PR suggestions.
yruslan aecd5e5
Fix imports removed by IDE.
yruslan ba5e5de
Fix more PR suggestions regarding splitting complex identifiers.
yruslan File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it offsetFrom an exclusive interval if offsetTo is not given, but inclusive if also given?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we want to query new data the caller would specify only
offsetFrom
, and the query is going to look like:(exclusive)
When we want to do a rerun for a day, we get the minimum and maximum offsets for the day and run:
(inclusive)
The last case, when only
offsetTo
is available might not be used in practice. Added it for completion. Potentially it can be used to query the old database for all data that was already loaded. Maybe for reconciliation or something like this. In this case only inclusive interval makes sense for such queries.(inclusive)
I've could have added a flag, to make the caller choose, or split this method in 2, just for top 2 cases. But then it would require the implementer of a source to implement 2 very similar methods.
Added this reasoning to the method documentation