-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: expose peek next commit to python #1
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
{"commitInfo":{"timestamp":1587968586154,"operation":"WRITE","operationParameters":{"mode":"ErrorIfExists","partitionBy":"[]"},"isBlindAppend":true}} | ||
{"protocol":{"minReaderVersion":1,"minWriterVersion":2}} | ||
{"metaData":{"id":"5fba94ed-9794-4965-ba6e-6ee3c0d22af9","format":{"provider":"parquet","options":{}},"schemaString":"{\"type\":\"struct\",\"fields\":[{\"name\":\"id\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}}]}","partitionColumns":[],"configuration":{},"createdTime":1587968585495}} | ||
{"add":{"path":"part-00000-a72b1fb3-f2df-41fe-a8f0-e65b746382dd-c000.snappy.parquet","partitionValues":{},"size":262,"modificationTime":1587968586000,"dataChange":true}} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
{"commitInfo":{"timestamp":1587968596254,"operation":"MERGE","operationParameters":{"predicate":"(oldData.`id` = newData.`id`)"},"readVersion":0,"isBlindAppend":false}} | ||
{"add":{"path":"part-00004-95c9bc2c-ac85-4581-b3cc-84502b0c314f-c000.snappy.parquet","partitionValues":{},"size":429,"modificationTime":1587968596000,"dataChange":true}} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
{"commitInfo":{"timestamp":1587968614187,"operation":"UPDATE","operationParameters":{"predicate":"((id#697L % cast(2 as bigint)) = cast(0 as bigint))"},"readVersion":2,"isBlindAppend":false}} | ||
{"add":{"path":"part-00001-bb70d2ba-c196-4df2-9c85-f34969ad3aa9-c000.snappy.parquet","partitionValues":{},"size":429,"modificationTime":1587968614000,"dataChange":true}} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -493,6 +493,29 @@ def test_writer_fails_on_protocol(): | |
dt.to_pandas() | ||
|
||
|
||
@pytest.mark.parametrize("version, expected", [(2, (5, 3))]) | ||
def test_peek_next_commit(version, expected): | ||
table_path = "../crates/deltalake-core/tests/data/simple_table" | ||
dt = DeltaTable(table_path) | ||
actions, next_version = dt.peek_next_commit(version=version) | ||
assert (len(actions), next_version) == expected | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe we could add a test to cover DeltaLogNotFound. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess there's no way to test the loop? I'm thinking. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm a little confused here, do you mean use version=-1 as input or something else? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was thinking there'd be a unit test in which the underlying delta table has actual latest version
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added data that missed a commit and a test for it. |
||
|
||
def test_delta_log_not_found(): | ||
table_path = "../crates/deltalake-core/tests/data/simple_table" | ||
dt = DeltaTable(table_path) | ||
latest_version = dt.get_latest_version() | ||
_, version = dt.peek_next_commit(version=latest_version) | ||
assert version == latest_version | ||
|
||
|
||
def test_delta_log_missed(): | ||
table_path = "../crates/deltalake-core/tests/data/simple_table_missing_commit" | ||
dt = DeltaTable(table_path) | ||
_, version = dt.peek_next_commit(version=1) | ||
assert version == 3 # Missed commit version 2, should return version 3 | ||
|
||
|
||
class ExcPassThroughThread(Thread): | ||
"""Wrapper around `threading.Thread` that propagates exceptions.""" | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why return
Ok(Err(err))
instead ofErr(err)
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, where is
Ok(Err(err))
?Here we use the same method as
peek_next_commit
below to get thecommit_log_bytest
, then we return the fullcommit_log_bytes
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this line, we assign
commit_log_bytes
toErr(err)
. Then on line 470, we returnOk(commit_log_bytes)
. I believe as a result we will returnOk(Err(err))
if we reach line 468.Am I interpreting this correctly? If so, this seems wrong. I think you'd want to
return Err(err)
here and on line 462 you'd want to assigncommit_log_bytes
tobytes
, notOk(bytes)
(which would thus returnOk(Ok(bytes))
which is weird as well).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes seems weird we need to fix this, also need to add a test for error case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as mentioned by ohan, rust
?
seems to tidy the results, https://doc.rust-lang.org/rust-by-example/std/result/question_mark.htmlThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flat out missed the question mark at the end, ok great.