Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] feat: implement integer comparison #13571

Merged
merged 13 commits into from
Oct 4, 2024

Conversation

Slyces
Copy link
Contributor

@Slyces Slyces commented Sep 30, 2024

Summary

Implements the comparison operator for [Type::IntLiteral] and [Type::BoolLiteral] (as an artifact of special handling of True and False in python).
Sets the framework to implement more comparison for types known at static time (e.g. BoolLiteral, StringLiteral), allowing us to only implement cases of the triplet <left> Type, <right> Type, CmpOp.
Contributes to #12701 (without checking off an item yet).

Implementation Details

I couldn't avoid allocating a Vec<Type> for the first evaluation of initial expressions before iterating over pairs of expression types (.windows(2)). Currently, I couldn't use .windows on iterators. This seems to be possible using some crates or an experimental API. If anyone has an idea on how to achieve this without allocation, I'd love to hear it.

Test Plan

  • Added a test for the comparison of literals that should include most cases of note.
  • Added a test for the comparison of int instances

Please note that the cases do not cover 100% of the branches as there are many and the current testing strategy with variables make this fairly confusing once we have too many in one test.

I'll probably open a separate PR with a candidate util function for cases like this (checking the public type of many separate statements).

@carljm
Copy link
Contributor

carljm commented Sep 30, 2024

I'll probably open a separate PR with a candidate util function for cases like this (checking the public type of many separate statements).

You can do this, but note that I'm also working on a test framework (see discussion in #11664 ) which should also make this a lot less verbose. So it may not be worth investing a lot into test utils like this one in the meantime.

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

crates/red_knot_python_semantic/src/types/infer.rs Outdated Show resolved Hide resolved
crates/red_knot_python_semantic/src/types/infer.rs Outdated Show resolved Hide resolved
crates/red_knot_python_semantic/src/types/infer.rs Outdated Show resolved Hide resolved
crates/red_knot_python_semantic/src/types/infer.rs Outdated Show resolved Hide resolved
crates/red_knot_python_semantic/src/types/infer.rs Outdated Show resolved Hide resolved
crates/red_knot_python_semantic/src/types/infer.rs Outdated Show resolved Hide resolved
crates/red_knot_python_semantic/src/types/infer.rs Outdated Show resolved Hide resolved
@carljm carljm added the red-knot Multi-file analysis & type inference label Sep 30, 2024
@Slyces
Copy link
Contributor Author

Slyces commented Oct 2, 2024

Just dropping this here for future reference (@AlexWaygood sent a similar article for #13590):

@Slyces
Copy link
Contributor Author

Slyces commented Oct 3, 2024

I think I have covered all comments (to my knowledge) except the tuple_windows, I'll try to give more details on why I couldn't get that to work with code comments.

@Slyces Slyces requested a review from carljm October 3, 2024 15:46
Copy link
Contributor

github-actions bot commented Oct 3, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@brettcannon
Copy link
Contributor

  • Couldn't find one for in, not in

https://snarky.ca/unravelling-membership-testing/

All the blog posts can be found under https://snarky.ca/tag/syntactic-sugar/ . If you want to look the posts up by syntax, https://github.com/brettcannon/desugar/blob/main/README.md is a good way to do that.

// 3. True and bool and A - evaluate outcome types
// 4. True and bool and bool - evaluate truthiness
// 5. bool | A - union of "true" types
assert_public_ty(&db, "src/a.py", "b", "bool | A");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should that be Literal[False] | A? There is no runtime path where we can end up with Literal[True] here as if the second comparison 1 < A() evaluates to True we would return the value of the last comparison which is A.
Only path where we don't take the value of A here is if we had a Literal[False] in the earlier steps.

But that is more of a comment on how we handle chained and I guess?

Copy link
Contributor

@carljm carljm Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point! I think this can be done as a separate follow-up so we can go ahead and get this PR merged; it's kind of separate from this PR as it's an improvement to the existing chained-boolean-expression logic.

I think the logic here is that when we have an actual bool type in a chained boolean expression, in non-last-position, rather than adding bool to the union we can instead add Literal[False] (for AND) or Literal[True] (for OR).

There's a more generalized version of this where we add Falsy and Truthy types, and then we'd always intersect any type in that position (not just bool) with Falsy or Truthy, and then for bools that would simplify out in the intersection (e.g. bool & Falsy is Literal[False]). I think it's likely we end up doing this for type narrowing anyway, but I don't have strong feelings about whether we go straight for the generalized solution or start with a bool-specific implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #13632 to track this as a follow-up.

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really excellent work, thank you so much!!

I have a few small nits; but since they were small (and I already implemented some locally to make sure they'd work) I'll just push them myself and then land this.

/// Computes the output of a chain of (one) boolean operation, consuming as input an iterator
/// of types. The iterator is consumed even if the boolean evaluation can be short-circuited,
/// in order to ensure the invariant that all expressions are evaluated when inferring types.
fn infer_chained_boolean_types<T: Into<Type<'db>>>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to be generic over Into<Type<'db>>, it can just take an IntoIterator over Type<'db>. This change compiles and passes all tests:

--- a/crates/red_knot_python_semantic/src/types/infer.rs
+++ b/crates/red_knot_python_semantic/src/types/infer.rs
@@ -2446,10 +2446,10 @@ impl<'db> TypeInferenceBuilder<'db> {
     /// Computes the output of a chain of (one) boolean operation, consuming as input an iterator
     /// of types. The iterator is consumed even if the boolean evaluation can be short-circuited,
     /// in order to ensure the invariant that all expressions are evaluated when inferring types.
-    fn infer_chained_boolean_types<T: Into<Type<'db>>>(
+    fn infer_chained_boolean_types(
         db: &'db dyn Db,
         op: ast::BoolOp,
-        values: impl IntoIterator<Item = T>,
+        values: impl IntoIterator<Item = Type<'db>>,
         n_values: usize,
     ) -> Type<'db> {
         let mut done = false;
@@ -2460,17 +2460,16 @@ impl<'db> TypeInferenceBuilder<'db> {
                     Type::Never
                 } else {
                     let is_last = i == n_values - 1;
-                    let value_ty: Type<'db> = ty.into();
-                    match (value_ty.bool(db), is_last, op) {
-                        (Truthiness::Ambiguous, _, _) => value_ty,
+                    match (ty.bool(db), is_last, op) {
+                        (Truthiness::Ambiguous, _, _) => ty,
                         (Truthiness::AlwaysTrue, false, ast::BoolOp::And) => Type::Never,
                         (Truthiness::AlwaysFalse, false, ast::BoolOp::Or) => Type::Never,
                         (Truthiness::AlwaysFalse, _, ast::BoolOp::And)
                         | (Truthiness::AlwaysTrue, _, ast::BoolOp::Or) => {
                             done = true;
-                            value_ty
+                            ty
                         }
-                        (_, true, _) => value_ty,
+                        (_, true, _) => ty,
                     }
                 }
             }),

Comment on lines 2508 to 2513
let left_ty = self
.types
.expression_ty(left.scoped_ast_id(self.db, self.scope));
let right_ty = self
.types
.expression_ty(right.scoped_ast_id(self.db, self.scope));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've got enough occurrences of this verbose pattern now, it's high time for a helper method:

+    /// Get the already-inferred type of an expression node.
+    ///
+    /// PANIC if no type has been inferred for this node.
+    fn expression_ty(&self, expr: &ast::Expr) -> Type<'db> {
+        self.types
+            .expression_ty(expr.scoped_ast_id(self.db, self.scope))
+    }
+

AnyNodeRef::ExprCompare(compare),
"operator-unsupported",
format_args!(
"Operator \"{}\" is not supported for types {} and {}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, to match our usual diagnostic message style:

Suggested change
"Operator \"{}\" is not supported for types {} and {}",
"Operator `{}` is not supported for types `{}` and `{}`",

Comment on lines 2557 to 2559
// Note: identity (is, is not) is unreliable in Python and not part of the language specs.
// - `[ast::CompOp::Is]`: return `false` if different, `bool` if the same
// - `[ast::CompOp::IsNot]`: return `true` if different, `bool` if the same
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor clarification

Suggested change
// Note: identity (is, is not) is unreliable in Python and not part of the language specs.
// - `[ast::CompOp::Is]`: return `false` if different, `bool` if the same
// - `[ast::CompOp::IsNot]`: return `true` if different, `bool` if the same
// Note: identity (is, is not) for equal builtin types is unreliable and not part of the language spec.
// - `[ast::CompOp::Is]`: return `false` if unequal, `bool` if equal
// - `[ast::CompOp::IsNot]`: return `true` if unequal, `bool` if equal

Comment on lines +3138 to +3141
/// Rich comparison in Python are the operators `==`, `!=`, `<`, `<=`, `>`, and `>=`. Their
/// behaviour can be edited for classes by implementing corresponding dunder methods.
/// This function performs rich comparison between two instances and returns the resulting type.
/// see `<https://docs.python.org/3/reference/datamodel.html#object.__lt__>`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and all the TODO comments in this function) are a fantastic resource for someone to flesh this out later -- thank you!!

}

#[test]
fn comparison_non_bool_returns() -> anyhow::Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test, and its comments, are fantastic!

// 3. True and bool and A - evaluate outcome types
// 4. True and bool and bool - evaluate truthiness
// 5. bool | A - union of "true" types
assert_public_ty(&db, "src/a.py", "b", "bool | A");
Copy link
Contributor

@carljm carljm Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point! I think this can be done as a separate follow-up so we can go ahead and get this PR merged; it's kind of separate from this PR as it's an improvement to the existing chained-boolean-expression logic.

I think the logic here is that when we have an actual bool type in a chained boolean expression, in non-last-position, rather than adding bool to the union we can instead add Literal[False] (for AND) or Literal[True] (for OR).

There's a more generalized version of this where we add Falsy and Truthy types, and then we'd always intersect any type in that position (not just bool) with Falsy or Truthy, and then for bools that would simplify out in the intersection (e.g. bool & Falsy is Literal[False]). I think it's likely we end up doing this for type narrowing anyway, but I don't have strong feelings about whether we go straight for the generalized solution or start with a bool-specific implementation.

carljm and others added 2 commits October 4, 2024 10:13
match (value_ty.bool(self.db), is_last, op) {
let is_last = i == n_values - 1;
let value_ty: Type<'db> = ty.into();
match (value_ty.bool(db), is_last, op) {
(Truthiness::Ambiguous, _, _) => value_ty,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment on the last test, but maybe we could fix this by having a special case for builtins.bool (which will be a very common type here)

Suggested change
(Truthiness::Ambiguous, _, _) => value_ty,
(Truthiness::Ambiguous, false, ast::BoolOp::And) => match value_ty {
// Ambiguous types that are not the last in the `and` chain can only be
// returned if they are falsy. In the special case of `builtins.bool`,
// being falsy is `Literal[False]`.
// TODO: we could do this optimisation for other literal that have a
// single falsy value (`""`, `0`, ...?)
Type::Instance(class) => class
.is_stdlib_symbol(db, "builtins", "bool")
.then(|| Type::BooleanLiteral(false))
.unwrap_or_else(|| value_ty),
_ => value_ty,
},
(Truthiness::Ambiguous, _, _) => value_ty,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do this as a follow-up, I created #13632 with some comments about it.

@carljm carljm merged commit 888930b into astral-sh:main Oct 4, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants