-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: tune heuristics for choosing the next package to solve #10146
Conversation
Reviewer's Guide by SourceryThis pull request improves the heuristics for selecting the next package to resolve during dependency resolution. It prioritizes packages with dependencies, especially those with upper-bound constraints, as these are more likely to cause conflicts. The changes also remove a less effective criterion based on specific markers. Class diagram for updated Preference classclassDiagram
class Preference {
+enum Priority
-_provider
-_solution
-_dependency_cache
+_get_min(dependency) tuple
}
class Priority {
<<enumeration>>
NO_CHOICE
DIRECT_ORIGIN
USE_LATEST
LOCKED
DEFAULT
}
note for Preference "Modified _get_min to consider:
- Number of dependencies
- Upper bound constraints
- Removed marker specificity check"
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @radoering - I've reviewed your changes - here's some feedback:
Overall Comments:
- Please add tests to verify the new package selection heuristics, particularly around packages with upper bound constraints being prioritized.
Here's what I looked at during the review
- 🟢 General issues: all looks good
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟡 Complexity: 1 issue found
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
365263e
to
add03e5
Compare
Interesting idea! It makes some sort of intuitive sense, which is nice. If what you have makes things better, and you have had enough of trying things out here, then feel free to ignore the rest of this - which is mostly speculation / curiosity.
If what you have makes things better - or sometimes much better and mostly not worse - then perhaps it is not worth trying too hard to optimize further. But if you feel like experimenting I guess I am curious about the effect of these three changes (either independently or together): (i) drop the count of upper-bound-dependencies (ii) |
This feels like something you have never tried enough. I have considered some of the above points, but not others - and did not measure/document it systematically. I think I will do some further tests. |
After having done some additional measurements and analysis, it still looks like the current version is the best version considering the known examples:
It seems to be a worse signal than the number of available versions (especially for #9956 (comment)). The only thing that is sure is that no dependencies is a clear sign that it should be chosen last. I think it is quite the same as with "number of available versions": Only one available version is a clear sign that it should be chosen first, but otherwise it is unclear.
That is correct.
Good point. I had not thought about that. However, the original algorithm still seems to be worse for #9956 (comment).
Even if it does not come into play often it is still worth for determinism. See poetry/src/poetry/mixology/version_solver.py Lines 438 to 440 in fa5c26e
But as it seems the boolean has-dependencies (so that the fourth value comes into play more often) does better anyway.
"has dependencies" is sufficient to handle boto3/urllib3, but it is not sufficient to handle other similar cases:
To put it in a nutshell, I would keep the current state of the PR since it seems to be the best considering the currently known examples. |
Interesting stuff, thanks for investigating! I still wonder whether simplification - or at least compactification - is possible. eg if we think that the actual count of dependencies is uninteresting: perhaps we do not care either about the actual count of has-upper-bound dependencies. Just dealing first with the packages that have any upper bound dependency may be enough? (and then with the packages that have any sort of dependency, and finally with leaf packages). then the tuple could be simplified to a triple with middle element 0/1/2 eg as and I confess I am surprised if it remains favourable to go most-available-versions-first. At the time I thought that making that swap was likely giving up a little performance, in most cases, in order to get it back on the known terrible case. But data trumps intuition (though I wonder whether we have enough examples to call it compelling...) |
It think it had to be I tried and it makes no difference in many cases, but it makes #9956 (comment) slightly slower (28 s vs 31 s) and #9956 (comment) significantly slower (480 s vs 620 s).
I assume that there are more smaller boto3 vs. urllib3 cases and the gain in these cases is much higher than the loss in the standard case.
We surely not have enough examples but as long as we have no counter-examples, I think it is the best to work with what we have:
|
Yeah. I feel slighly uncomfortable about the possibility of overfitting to a few examples, or to the state of the world as it is exactly today. But not enough to disagree with you. |
- Do not consider anymore if a dependency has a specific marker or not, since there is no good explanation for this criterion. - Consider if a package has dependencies and especially how many dependencies with a constraint with an upper bound, because these are more likely to cause conflicts later.
add03e5
to
f23ec65
Compare
Pull Request Check List
Related to: #9956
Requires: python-poetry/poetry-core#833
This reduces the time to lock the first example from #9956 (comment) from over 400 s to less than 30 s while not increasing the time for any other example. See #9956 (comment) for alternative (simpler but less explainable) variants, which all had a negative influence on at least one example.
Summary by Sourcery
Enhancements: