perf: tune heuristics for choosing the next package to solve #10146

radoering · 2025-02-04T18:17:50Z

Do not consider anymore if a dependency has a specific marker or not, since there is no good explanation for this criterion.
Consider if a package has dependencies and especially how many dependencies with a constraint with an upper bound, because these are more likely to cause conflicts later.

Pull Request Check List

Related to: #9956
Requires: python-poetry/poetry-core#833

Added tests for changed code.
Updated documentation for changed code.

This reduces the time to lock the first example from #9956 (comment) from over 400 s to less than 30 s while not increasing the time for any other example. See #9956 (comment) for alternative (simpler but less explainable) variants, which all had a negative influence on at least one example.

Summary by Sourcery

Enhancements:

Prioritize packages with dependencies, especially those with upper-bound constraints, when selecting the next package to resolve.

sourcery-ai · 2025-02-04T18:17:54Z

Reviewer's Guide by Sourcery

This pull request improves the heuristics for selecting the next package to resolve during dependency resolution. It prioritizes packages with dependencies, especially those with upper-bound constraints, as these are more likely to cause conflicts. The changes also remove a less effective criterion based on specific markers.

Class diagram for updated Preference class

classDiagram
    class Preference {
        +enum Priority
        -_provider
        -_solution
        -_dependency_cache
        +_get_min(dependency) tuple
    }

    class Priority {
        <<enumeration>>
        NO_CHOICE
        DIRECT_ORIGIN
        USE_LATEST
        LOCKED
        DEFAULT
    }

    note for Preference "Modified _get_min to consider:
    - Number of dependencies
    - Upper bound constraints
    - Removed marker specificity check"

File-Level Changes

Change	Details	Files
Modified the package selection logic to prioritize dependencies with upper bound constraints and the existence of dependencies.	Removed the check for specific markers on dependencies. Added logic to count dependencies with upper bound constraints. Prioritized packages with more dependencies having upper bounds. Prioritized packages with dependencies over those without. Modified the return tuple of `_get_min` to include the new criteria.	`src/poetry/mixology/version_solver.py`
Fixed an incompatibility message in a test case.	Reordered the dependencies in the error message to match the actual conflict.	`tests/mixology/version_solver/test_unsolvable.py`
Cached the `package` method of the pool.	Cached the `_pool.package` method using `functools.cache` and renamed it to `get_package_from_pool`. Used the cached method `get_package_from_pool` when completing a package.	`src/poetry/puzzle/provider.py`
Updated the poetry-core dependency to a specific branch.	Changed the poetry-core dependency to a specific branch on a fork.	`pyproject.toml`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!
Generate a plan of action for an issue: Comment @sourcery-ai plan on
an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey @radoering - I've reviewed your changes - here's some feedback:

Overall Comments:

Please add tests to verify the new package selection heuristics, particularly around packages with upper bound constraints being prioritized.

Here's what I looked at during the review

🟢 General issues: all looks good
🟢 Security: all looks good
🟢 Testing: all looks good
🟡 Complexity: 1 issue found
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

src/poetry/mixology/version_solver.py

dimbleby · 2025-02-05T00:03:50Z

Consider if a package has dependencies and especially how many dependencies with a constraint with an upper bound, because these are more likely to cause conflicts later.

Interesting idea! It makes some sort of intuitive sense, which is nice.

If what you have makes things better, and you have had enough of trying things out here, then feel free to ignore the rest of this - which is mostly speculation / curiosity.

I wonder why you are only checking whether a package has any dependencies and not the count? If this is a good signal: then I would expect that having 20 dependencies is a better signal than having 2?
Previously we used "has many versions available" as a sort of proxy for "is boto3" but without having to be quite so specific about it. I guess now it is "has no dependencies" that handles this case without quite saying so: presumably we now do not select urllib3 until very late
but in that case, does our reversal of "how many versions are available" still make sense? The original fewest-versions-first still sounds likely to find conflicts sooner. So if we now have found a different way of avoiding the known pathological cases then perhaps our reversal of that original algorithm choice should be undone
though I wonder how often the fourth value of this tuple is ever used as a tie-break anyway. Especially if we were to count dependencies, rather than only use a boolean has-dependencies, perhaps it rarely comes into play?
also: if it is "has dependencies" that is sufficient to handle the boto3 / urllib3 case - well then I wonder when "has upper bound dependencies" is even making a difference?

If what you have makes things better - or sometimes much better and mostly not worse - then perhaps it is not worth trying too hard to optimize further.

But if you feel like experimenting I guess I am curious about the effect of these three changes (either independently or together): (i) drop the count of upper-bound-dependencies (ii) len() rather than bool() of all dependencies (iii) un-reverse our ordering based on count of available versions

radoering · 2025-02-05T05:06:01Z

you have had enough of trying things out here

This feels like something you have never tried enough. I have considered some of the above points, but not others - and did not measure/document it systematically. I think I will do some further tests.

radoering · 2025-02-06T18:50:37Z

After having done some additional measurements and analysis, it still looks like the current version is the best version considering the known examples:

I wonder why you are only checking whether a package has any dependencies and not the count? If this is a good signal: then I would expect that having 20 dependencies is a better signal than having 2?

It seems to be a worse signal than the number of available versions (especially for #9956 (comment)). The only thing that is sure is that no dependencies is a clear sign that it should be chosen last. I think it is quite the same as with "number of available versions": Only one available version is a clear sign that it should be chosen first, but otherwise it is unclear.

Previously we used "has many versions available" as a sort of proxy for "is boto3" but without having to be quite so specific about it. I guess now it is "has no dependencies" that handles this case without quite saying so: presumably we now do not select urllib3 until very late

That is correct.

but in that case, does our reversal of "how many versions are available" still make sense? The original fewest-versions-first still sounds likely to find conflicts sooner. So if we now have found a different way of avoiding the known pathological cases then perhaps our reversal of that original algorithm choice should be undone

Good point. I had not thought about that. However, the original algorithm still seems to be worse for #9956 (comment).

though I wonder how often the fourth value of this tuple is ever used as a tie-break anyway. Especially if we were to count dependencies, rather than only use a boolean has-dependencies, perhaps it rarely comes into play?

Even if it does not come into play often it is still worth for determinism. See

poetry/src/poetry/mixology/version_solver.py

Lines 438 to 440 in fa5c26e

    
                   In order to provide results that are as deterministic as possible 
        
                   and consistent between `poetry lock` and `poetry update`, the return value 
        
                   of two different dependencies should not be equal if possible.

But as it seems the boolean has-dependencies (so that the fourth value comes into play more often) does better anyway.

also: if it is "has dependencies" that is sufficient to handle the boto3 / urllib3 case - well then I wonder when "has upper bound dependencies" is even making a difference?

"has dependencies" is sufficient to handle boto3/urllib3, but it is not sufficient to handle other similar cases:

boto3/botocore vs. urllib3
- boto3: 3 deps, 3 upper bound
- botocore: 4 deps, 4 upper bound
- urllib3: no deps
sphinx/sphinx-rtd-theme vs. docutils
- sphinx: 17 deps, 1 upper bound
- sphinx-rtd-theme: 3 deps, 3 upper bound
- docutils: no deps
dbt-semantic-interfaces/opentelemetry-api vs. importlib-metadata
- dbt-semantic-interfaces: 9 deps, 9 upper bound
- opentelemetry-api: 2 deps, 1 upper bound
- importlib-metadata: 2 deps, 0 upper bound

To put it in a nutshell, I would keep the current state of the PR since it seems to be the best considering the currently known examples.

dimbleby · 2025-02-06T19:10:02Z

Interesting stuff, thanks for investigating!

I still wonder whether simplification - or at least compactification - is possible. eg if we think that the actual count of dependencies is uninteresting: perhaps we do not care either about the actual count of has-upper-bound dependencies. Just dealing first with the packages that have any upper bound dependency may be enough? (and then with the packages that have any sort of dependency, and finally with leaf packages).

then the tuple could be simplified to a triple with middle element 0/1/2 eg as int(has-upper-bound-dependencies) + int(has-any-dependencies)

and I confess I am surprised if it remains favourable to go most-available-versions-first. At the time I thought that making that swap was likely giving up a little performance, in most cases, in order to get it back on the known terrible case. But data trumps intuition (though I wonder whether we have enough examples to call it compelling...)

radoering · 2025-02-07T05:43:11Z

then the tuple could be simplified to a triple with middle element 0/1/2 eg as int(has-upper-bound-dependencies) + int(has-any-dependencies)

It think it had to be 0 if has_upper_bound_deps else (1 if has_deps else 2) or the inverse of your version.

I tried and it makes no difference in many cases, but it makes #9956 (comment) slightly slower (28 s vs 31 s) and #9956 (comment) significantly slower (480 s vs 620 s).

and I confess I am surprised if it remains favourable to go most-available-versions-first.

I assume that there are more smaller boto3 vs. urllib3 cases and the gain in these cases is much higher than the loss in the standard case.

though I wonder whether we have enough examples to call it compelling...

We surely not have enough examples but as long as we have no counter-examples, I think it is the best to work with what we have:

It definitively makes sense to consider if there are dependencies with an upper bound.
It definitively makes sense to consider if there are any dependencies at all.
All the other decisions highest/lowest number of versions first, bool or integer number of dependencies are based on 1-2 examples but as long as we do not have a counter-example I think it makes sense to stick with what is best for these.

dimbleby · 2025-02-07T11:38:19Z

Yeah. I feel slighly uncomfortable about the possibility of overfitting to a few examples, or to the state of the world as it is exactly today. But not enough to disagree with you.

- Do not consider anymore if a dependency has a specific marker or not, since there is no good explanation for this criterion. - Consider if a package has dependencies and especially how many dependencies with a constraint with an upper bound, because these are more likely to cause conflicts later.

radoering mentioned this pull request Feb 4, 2025

add has_upper_bound method to VersionConstraint python-poetry/poetry-core#833

Merged

2 tasks

sourcery-ai bot reviewed Feb 4, 2025

View reviewed changes

src/poetry/mixology/version_solver.py Show resolved Hide resolved

radoering force-pushed the perf-next-package-heuristics branch from 365263e to add03e5 Compare February 4, 2025 18:29

radoering force-pushed the perf-next-package-heuristics branch from add03e5 to f23ec65 Compare February 8, 2025 05:34

radoering requested a review from a team February 8, 2025 10:25

abn approved these changes Feb 8, 2025

View reviewed changes

abn merged commit 5dc292b into python-poetry:main Feb 8, 2025
53 checks passed

radoering mentioned this pull request Feb 9, 2025

release: bump version to 2.1.0 #10154

Draft

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: tune heuristics for choosing the next package to solve #10146

perf: tune heuristics for choosing the next package to solve #10146

radoering commented Feb 4, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Feb 4, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

dimbleby commented Feb 5, 2025

radoering commented Feb 5, 2025

radoering commented Feb 6, 2025

dimbleby commented Feb 6, 2025 •

edited

Loading

radoering commented Feb 7, 2025

dimbleby commented Feb 7, 2025

perf: tune heuristics for choosing the next package to solve #10146

perf: tune heuristics for choosing the next package to solve #10146

Conversation

radoering commented Feb 4, 2025 • edited by sourcery-ai bot Loading

Pull Request Check List

Summary by Sourcery

sourcery-ai bot commented Feb 4, 2025 • edited Loading

Reviewer's Guide by Sourcery

Class diagram for updated Preference class

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

Choose a reason for hiding this comment

dimbleby commented Feb 5, 2025

radoering commented Feb 5, 2025

radoering commented Feb 6, 2025

dimbleby commented Feb 6, 2025 • edited Loading

radoering commented Feb 7, 2025

dimbleby commented Feb 7, 2025

radoering commented Feb 4, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Feb 4, 2025 •

edited

Loading

dimbleby commented Feb 6, 2025 •

edited

Loading