Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Add control flow for try/except blocks #13338

Closed
wants to merge 12 commits into from

Conversation

AlexWaygood
Copy link
Member

Summary

This PR adds control flow for try/except/else/finally blocks to red-knot.

The semantics of try/except blocks are as follows:

  1. If an except branch is taken, this indicates that an exception in the try block caused that block to terminate early. This means that potentially 0 of the definitions in the try block were executed; or potentially all of them were; or potentially some number in between.
  2. All except and else branches are mutually exclusive: if a single except or else branch is taken, it means none of the others were taken.
  3. If an else branch was taken, it means that the try block was fully executed without any exceptions having occurred; we can in this branch count on all definitions in the try block having been executed.
  4. If there is a finally branch, this branch is always executed last, before completion of the block, regardless of whether the block has any except or else branches and (if so) which one was taken.

In order to model (1), two new fields are added to the SemanticIndexBuilder:

  • A new boolean visiting_try_block flag is added, to keep track of whether we are visiting the try branch of a try/except/else/finally block.
  • A new try_block_definition_states field is added, which is a Vec<FlowSnapshot>. If the visiting_try_block flag is set to true, the add_definition() method pushes a snapshot to this vec after each new definition is added.

Test Plan

cargo test.

I believe I still may be missing some test coverage for nested try blocks. An early version of this PR looked like it had some subtle bugs there (where the visitor would no longer remember it was in a try block if it had finished visiting a try block inside a try block). But I couldn't find a test that would fail as a result of the apparent bug (which I've now fixed in this PR branch, anyway).

@AlexWaygood AlexWaygood added the red-knot Multi-file analysis & type inference label Sep 12, 2024
Copy link
Contributor

github-actions bot commented Sep 12, 2024

ruff-ecosystem results

Linter (stable)

ℹ️ ecosystem check encountered linter errors. (no lint changes; 1 project error)

langchain-ai/langchain (error)

Failed to clone langchain-ai/langchain: error: RPC failed; curl 56 GnuTLS recv error (-54): Error in the pull function.
error: 1179 bytes of body are still expected
fetch-pack: unexpected disconnect while reading sideband packet
fatal: early EOF
fatal: fetch-pack: invalid index-pack output

Linter (preview)

✅ ecosystem check detected no linter changes.

@AlexWaygood
Copy link
Member Author

AlexWaygood commented Sep 12, 2024

The tomllib benchmark failures indicate that this would lead to too many false positives unless we also understand all of these patterns:

def foo():
    try:
        x = 42
    except TypeError:
        return
    # x is guaranteed to be defined here

def bar():
    try:
        x = 42
    except TypeError:
        raise RuntimeError
    # x is guaranteed to be defined here

def baz():
    for i in range(5):
        try:
            x = 42
        except TypeError:
            continue
        # x is guaranteed to be defined here

def eggs():
    for i in range(5):
        try:
            x = 42
        except TypeError:
            break
        # x is guaranteed to be defined here

@AlexWaygood
Copy link
Member Author

The tomllib benchmark failures indicate that this would lead to too many false positives unless we also understand all of these patterns:

I discussed this in person with @carljm -- for now, we'll defer fixing this, since it's a general issue that also applies to other kinds of control flow such as if/else. It's important to fix and it comes up much more often with try/except than with if/else, but it's orthogonal to what this PR is trying to fix, so it's best kept out of this PR. I'll open a followup issue for this specifically.

@AlexWaygood
Copy link
Member Author

I believe I still may be missing some test coverage for nested try blocks. An early version of this PR looked like it had some subtle bugs there (where the visitor would no longer remember it was in a try block if it had finished visiting a try block inside a try block). But I couldn't find a test that would fail as a result of the apparent bug (which I've now fixed in this PR branch, anyway).

To be more specific here: the buggy thing I was doing in an early version of the PR was this:

--- a/crates/red_knot_python_semantic/src/semantic_index/builder.rs
+++ b/crates/red_knot_python_semantic/src/semantic_index/builder.rs
@@ -725,7 +725,6 @@ where
                 // states during the `try` block before visiting the `except` blocks.
                 let pre_try_block_state = self.flow_snapshot();
                 let saved_definition_states = std::mem::take(&mut self.try_block_definition_states);
-                let visiting_nested_try_block = self.visiting_try_block;
 
                 // Visit the `try` block!
                 //
@@ -733,7 +732,7 @@ where
                 // in `self.try_block_definition_states` after each new definition is recorded.
                 self.visiting_try_block = true;
                 self.visit_body(body);
-                self.visiting_try_block = visiting_nested_try_block;
+                self.visiting_try_block = false;

I'm pretty sure that this was incorrect (I don't think it handles nested try blocks properly at all?), but couldn't find a test that actually demonstrated the bugginess.

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uff, excepts are hard haha. Great work!

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic!! Probably the most complex control flow we'll need to handle, due to the "control flow can depart anywhere in the middle of a block" wrinkle.

@@ -4488,8 +4488,8 @@ mod tests {
",
)?;

assert_public_ty(&db, "src/a.py", "e", "NameError");
assert_public_ty(&db, "src/a.py", "f", "error");
assert_public_ty(&db, "src/a.py", "e", "Unbound | NameError");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle I think these tests would be better if they asserted the inferred type of the name inside the except block, rather than at end of scope.

But that's quite a major pain to do in our current test framework, so I would just leave the tests as-is for now and wait for the new test framework with reveal_type :)

}

#[test]
fn except_handler_multiple_definitions_in_try_block() -> anyhow::Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a test that is missing here is something verifying that within the else block, names bound in the try block are definitely bound. (This is along the same lines as the above comment; we're generally weak on tests demonstrating inferred types inside a particular block, rather than at end of scope.)

But as mentioned above, this is difficult to do with current test infra. (You can assign the name to another name, but then that other name is still potentially unbound, so the test isn't super clear.) But it would be nice to have a TODO to add this test, once we have the new test framework.

@AlexWaygood
Copy link
Member Author

AlexWaygood commented Sep 13, 2024

Noting it here for myself because I'm context-switching to something else: even not considering nested try blocks, the logic in this PR is currently incorrect. The logic currently assumes that at least one of the except and/or else branches must have been executed prior to the finally block being executed. That's only actually true if one of the except branches is a catch-all branch (either a bare except or an except BaseException branch). Otherwise, we could skip straight from the try block to the finally block in something like this if e.g. RuntimeError is raised during the try block:

try:
    x = 1
except TypeError:
    x = 2
else:
    x = 3
finally:
    pass

This means that the inferred type of x needs to be Unbound | Literal[1, 2, 3] after this try/except block, rather than Literal[2, 3].

@AlexWaygood
Copy link
Member Author

Re-requesting review as I had to change the approach quite radically to fix issues with nested try/except blocks and finally clauses!

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work tracking down all these different test cases and sorting out how to make them all pass!

Comment on lines +763 to +766
// If `visiting_try_block == true`, this informs `.add_definition()` to record the state
// in `self.try_block_definition_states` after each new definition is recorded.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment looks a bit outdated relative to the implementation?

Comment on lines +860 to +865
// That's actually an irrelevant detail for us here, though, because
// a nonexistent `else` or `finally` block is just represented
// as an empty `Vec` in our AST, and visiting an empty `Vec` will not
// change the flow state at all.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be relevant to perf. Snapshotting and restoring flow states is sufficiently expensive (if there are lots of symbols) that it might be worth explicitly checking for empty else and finally, if that would allow us to skip any snapshots or restores/merges. And it looks like we could skip a fair amount of those, if there is no finally body and no parent try block?

self.flow_merge(state);
}
if !exhaustive_except_branches {
for state in visiting_try_block_states.snapshots {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need to merge the pre-try-block state here? (We could jump straight from before completing any statement in the try block, to executing the finally block and/or into an outer try block.)

// Account for the fact that we could be visiting a nested `try` block,
// in which case the outer `try` block must be kept informed about all the possible
// between-definition states we could have encountered in the inner `try` block(!)
if let Some(current_try_block) =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it slightly confusing that within the same try block visit, above we refer to an outer try block as parent_try_block and here we refer to the same thing as current_try_block. Let's be consistent about what is "current" and what is "parent"?

&db,
"src/a.py",
"x",
"Literal[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think x can be Unbound here; it's never set in an exhaustive except, or in a finally.

I think this might be the issue I pointed out above, where we don't include the pre-try-block state as a pre-finally possibility.

)?;

assert_file_diagnostics(&db, "src/a.py", &[]);
assert_public_ty(&db, "src/a.py", "x", "Unbound | Literal[1, 2, 101]");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how x could actually be unbound in this scenario, but I'm also not sure if it's worth fixing. We assume that we could exit the outer try with an exception before executing the inner try at all, but that's not actually true in this example; nothing happens in the outer try before we enter the inner try, so in actual fact we are guaranteed to either complete the assignment x = 1 in the inner try, or have an exception caught by the exhaustive inner handler, which also binds x.

I think we would handle this correctly if, instead of saving a pre-try state, and then pushing a state post each definition, we instead pushed a state pre each definition, and then saved a post-try state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is wrong; given that we don't currently try to distinguish non-raising statements, we have to assume that x = 2 in the inner exhaustive except could also raise, so it is correct for x to be possibly unbound.

Comment on lines +4968 to +4969
// would not necessarily have been overridden by the otherwise-exhaustive
// `except`/`else` branches in the inner block.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"otherwise-exhausive except/else branches in the inner block" doesn't seem right -- those aren't exhaustive

// block due to another exception, in which case the assignments in the inner `try` block
// would not necessarily have been overridden by the otherwise-exhaustive
// `except`/`else` branches in the inner block.
assert_public_ty(&db, "src/a.py", "x", "Unbound | Literal[1, 2, 3, 4]");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this assertion different from if the outer try/except weren't there?

)?;

assert_file_diagnostics(&db, "src/a.py", &[]);
assert_public_ty(&db, "src/a.py", "x", "Unbound | Literal[4, 5]");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this assertion different from if the outer try/except weren't there?

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it's unclear why we need the extra except handler stack vs just using the function stack (all you need is the current context and the parent's). Keeping an extra Vec around isn't expensive but I think it's slightly more complicated than just making use of the stack.

"/src/tomllib/_parser.py:573:12: Name 'char' used when possibly not defined.",
"/src/tomllib/_parser.py:579:12: Name 'char' used when possibly not defined.",
"/src/tomllib/_parser.py:580:63: Name 'char' used when possibly not defined.",
"/src/tomllib/_parser.py:629:38: Name 'datetime_obj' used when possibly not defined.",
];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend moving this comment next to the relevant errors. It increases the likelihood that the comment gets deleted when the errors are resolved (I would miss the comment up there)

Suggested change
];
static EXPECTED_DIAGNOSTICS: &[&str] = &[
"/src/tomllib/_parser.py:5:24: Module '__future__' has no member 'annotations'",
// The failed import from 'collections.abc' is due to lack of support for 'import *'.
"/src/tomllib/_parser.py:7:29: Module 'collections.abc' has no member 'Iterable'",
"Line 69 is too long (89 characters)",
"Use double quotes for strings",
"Use double quotes for strings",
"Use double quotes for strings",
"Use double quotes for strings",
"Use double quotes for strings",
"Use double quotes for strings",
"Use double quotes for strings",
// The "possibly not defined" errors in `tomllib/_parser.py` are because
// we don't yet understand terminal statements (`return`/`raise`/etc.) in our control-flow analysis
"/src/tomllib/_parser.py:66:18: Name 's' used when possibly not defined.",
"/src/tomllib/_parser.py:98:12: Name 'char' used when possibly not defined.",
"/src/tomllib/_parser.py:101:12: Name 'char' used when possibly not defined.",
"/src/tomllib/_parser.py:104:14: Name 'char' used when possibly not defined.",
"/src/tomllib/_parser.py:104:14: Name 'char' used when possibly not defined.",
"/src/tomllib/_parser.py:115:14: Name 'char' used when possibly not defined.",
"/src/tomllib/_parser.py:115:14: Name 'char' used when possibly not defined.",
"/src/tomllib/_parser.py:126:12: Name 'char' used when possibly not defined.",
"/src/tomllib/_parser.py:348:20: Name 'nest' used when possibly not defined.",
"/src/tomllib/_parser.py:353:5: Name 'nest' used when possibly not defined.",
"/src/tomllib/_parser.py:453:24: Name 'nest' used when possibly not defined.",
"/src/tomllib/_parser.py:455:9: Name 'nest' used when possibly not defined.",
"/src/tomllib/_parser.py:482:16: Name 'char' used when possibly not defined.",
"/src/tomllib/_parser.py:566:12: Name 'char' used when possibly not defined.",
"/src/tomllib/_parser.py:573:12: Name 'char' used when possibly not defined.",
"/src/tomllib/_parser.py:579:12: Name 'char' used when possibly not defined.",
"/src/tomllib/_parser.py:580:63: Name 'char' used when possibly not defined.",
"/src/tomllib/_parser.py:629:38: Name 'datetime_obj' used when possibly not defined.",
];

if let Some(parent_try_block) =
self.try_block_contexts().borrow_mut().current_try_block()
{
parent_try_block.record_visiting_nested_try_stmt();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method name here seems misleading to me because it is called after we visit the try body. It seems it's about visiting the except handlers, finaly , and or else blocks of a try.

Related: What's the reason that we don't record definitions in a nested try? E.g. what if

try:
    try:
        x =  10:
        maybe_panic()
    finally:
        y = panic()
finally:
    y

shouldn't the use of y be an error?

but

try:
    try:
        x =  10:
        maybe_panic()
    finally:
        y = 10
finally:
    y

should be okay?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented elsewhere that it seems we currently don't have any tests that fail if we stop paying attention to visiting_nested_try_stmt entirely.

And it seems to me that Micha is right here -- outside the inner try block body itself, it does seem like intermediate states in the except/else/finally should all get registered with the outer try block, because we could have an exception happen in any of those places. (And such an exception would not cause us to still enter an inner finally block, which only applies to the inner try body; it would jump straight to being handled by the outer try block.) So it does seem to me like visiting_nested_try_stmt might be inherently semantically not right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Cases to consider for tests here are e.g. x = 1; x = 2 inside an except, else, or finally block in a nested try statement -- the x = 1 possibility should be visible to the outer try block.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(And such an exception would not cause us to still enter an inner finally block, which only applies to the inner try body; it would jump straight to being handled by the outer try block.)

What do you mean? The inner finally block is indeed always executed, even if the exception is caught by the outer try block:

>>> try:
...     try:
...         raise TypeError()
...     except NameError:
...         print(1)
...     finally:
...         print(2)
... except TypeError:
...     print(3)
... finally:
...     print(4)
... 
2
3
4

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that if an exception is raised inside an except/else/finally block in the nested try (not inside the try body), in that case the inner finally is not executed, the exception unwinds directly to the outer try statement handling.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... except this is just wrong 😆 apparently finally does still apply, even if the except is raised in an except handler or else block! So never mind.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the one thing to consider still is the case where an exception is raised within the finally block itself? I.e. is it important that we register those intermediate states within the nested finally block with the outer try statement?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still can't find any cases where the inner finally is not executed, though it is of course true that if an exception is raised in an except handler then we might switch to the inner finally block before the except handler has finished, and indeed if an exception is raised in the inner finally block then that finally block might not run to completion. I might not handle that properly right now...

>>> try:
...     try:
...         raise TypeError
...     except TypeError:
...         raise RuntimeError
...         print(1)
...     finally:
...         print(2)
... except RuntimeError:
...     print(3)
... finally:
...     print(4)
... 
2
3
4

// in which case the outer `try` block must be kept informed about all the possible
// between-definition states we could have encountered in the inner `try` block(!)
if let Some(current_try_block) =
self.try_block_contexts().borrow_mut().current_try_block()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we call self.try_block_contexts_mut() here to avoid the RefCell in the contexts (and all the borrow_mut stuff)?

@MichaReiser MichaReiser self-requested a review September 16, 2024 13:27

impl TryBlockContext {
pub(super) fn record_definition(&mut self, builder: &SemanticIndexBuilder) {
if !self.visiting_nested_try_stmt {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remove this check in the current version of this PR, all tests pass. So either we don't need this, or we need more tests.

Copy link
Member Author

@AlexWaygood AlexWaygood Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test fails if you make this change:

diff --git a/crates/red_knot_python_semantic/src/semantic_index/builder.rs b/crates/red_knot_python_semantic/src/semantic_index/builder.rs
index c7e8a6428..74cbd6b2d 100644
--- a/crates/red_knot_python_semantic/src/semantic_index/builder.rs
+++ b/crates/red_knot_python_semantic/src/semantic_index/builder.rs
@@ -873,16 +873,6 @@ where
                 }
 
                 self.visit_body(finalbody);
-
-                // Account for the fact that we could be visiting a nested `try` block,
-                // in which case the outer `try` block must be kept informed about all the possible
-                // between-definition states we could have encountered in the inner `try` block(!)
-                if let Some(current_try_block) =
-                    self.try_block_contexts().borrow_mut().current_try_block()
-                {
-                    current_try_block.record_exiting_nested_try_stmt();
-                    current_try_block.record_definition(self);
-                }
             }
             _ => {
                 walk_stmt(self, stmt);
diff --git a/crates/red_knot_python_semantic/src/semantic_index/except_handlers.rs b/crates/red_knot_python_semantic/src/semantic_index/except_handlers.rs
index 2eded0089..ce4e76ca0 100644
--- a/crates/red_knot_python_semantic/src/semantic_index/except_handlers.rs
+++ b/crates/red_knot_python_semantic/src/semantic_index/except_handlers.rs
@@ -30,9 +30,7 @@ pub(super) struct TryBlockContext {
 
 impl TryBlockContext {
     pub(super) fn record_definition(&mut self, builder: &SemanticIndexBuilder) {
-        if !self.visiting_nested_try_stmt {
-            self.snapshots.push(builder.flow_snapshot());
-        }
+        self.snapshots.push(builder.flow_snapshot());
     }

The if !self.visiting_nested_try_stmt check isn't necessary for correctness. But it's unnecessary to push all those snapshots to be recorded by the outer TryBlockContexts, since it's more efficient -- and more correct -- to simply record the state once at the completion of the inner try/except/else/finally block. I.e., the if !self.visiting_nested_try_stmt check is simply an optimisation to avoid unnecessary snapshotting.

I see that at the moment this is quite unclear; I'll add some more comments.

@AlexWaygood AlexWaygood force-pushed the alex/except-flow branch 2 times, most recently from 81b9bd2 to 4b385c7 Compare September 16, 2024 15:43
@AlexWaygood
Copy link
Member Author

Closing in favour of #13633

@AlexWaygood AlexWaygood closed this Oct 4, 2024
@AlexWaygood AlexWaygood deleted the alex/except-flow branch October 4, 2024 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants