-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Iceberg] Enable affinity scheduling on file sections #24598
base: master
Are you sure you want to change the base?
[Iceberg] Enable affinity scheduling on file sections #24598
Conversation
ac6fce7
to
6dfb1f2
Compare
bc804bc
to
484408b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Pull branch, local doc build, looks good. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good. One minor correction: "splits not being scheduled to enough nodes" : It's not necessarily they were not scheduled to enough nodes, but in general it had more skew than Hive, even when the splits were scheduled to the same number of nodes. Scheduling to less nodes happened non-determistically when I ran the queries multiple times. More than half times they did were scheduled to all nodes, but even in such cases the load was not as balanced as Hive.
...rg/src/main/java/com/facebook/presto/iceberg/equalitydeletes/EqualityDeletesSplitSource.java
Show resolved
Hide resolved
484408b
to
fa2b10e
Compare
Thanks for the feedback @yingsu00 - I updated the PR description to be a bit more accurate |
fa2b10e
to
7cf8789
Compare
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergSplitSource.java
Outdated
Show resolved
Hide resolved
{ | ||
Session maxIdentifiers = Session.builder(getSession()) | ||
.setCatalogSessionProperty("iceberg", AFFINITY_SCHEDULING_FILE_SECTION_SIZE, "1B") | ||
.setCatalogSessionProperty("iceberg", TARGET_SPLIT_SIZE, "1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make TARGET_SPLIT_SIZE
a DataSize
instead of refering to bytes ? We do this for MAX_SPLIT_SIZE
. Can punt this to a new PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aaneja Good catch!
@ZacBlanco Could you please address Anant's comment in this PR? I'll merge once it's done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was done intentionally because the Iceberg library expects this configuration in bytes directly on the "read.target.split-size" table property. Iceberg has no notion of "DataSize" or unit-based shorthand for byte-typed values. In order to make setting the values for the table and session property consistent for the target split size I opted to not make this a DataSize-typed property.
This change moves the affinity scheduling file section size configuration from HiveClientConfig and HiveSessionProperties to HiveCommonClientConfig and HiveCommonSessionProperties so that the iceberg connector can benefit from this scheduling strategy when tables have a small number of files but a large number of splits.
7cf8789
to
d4ae7ad
Compare
Description
This change moves the affinity scheduling file section size
configuration from HiveClientConfig and HiveSessionProperties
to HiveCommonClientConfig and HiveCommonSessionProperties so
that the iceberg connector can benefit from this scheduling
strategy when tables have a small number of files but a large
number of splits.
Motivation and Context
On tables with a small number of large files, queries may perform poorly due to the distribution in split scheduling being skewed. This is more likely to occur when there is a limited number of values being hashed to determine the preferred nodes to schedule to. By changing the identifier used for selecting the preferred nodes we increase the probability that the splits are scheduled more evenly across the cluster.
Impact
Test Plan
Added a unit test to verify that the number of unique identifiers changes as we scale up the file section size
Contributor checklist
Release Notes