Split-init allowed past guaranteed-exit control flow #26968

DanilaFe · 2025-03-23T02:15:28Z

In production, this program compiles just fine:

proc test() {
  var x;
  if true then
    return;
  x = 1;
}

but the following program does not:

proc test() {
  var x;
  if true then
    return;
}

I find this surprising, because, as far as Chapel is concerned, the code below return doesn’t happen. This is particularly true when we use early-returns to avoid compilerError:

proc bla() {
  if true then return;
  compilerError("Unexpected condition!"); // does not error
}

combining all this, I think we end up with odd behavior in the following case:

proc test() {
  var x;
  if true then
    return;
  compilerWarning("no");
  x = 1;
}

I believe that we should emit an in both cases about missing split-initialization for x (whether x = 11 is present or not). We should also clarify the specification to say that split init is not allowed across returns.

The text was updated successfully, but these errors were encountered:

bradcray · 2025-03-23T13:55:38Z

Thinking about this a bit more over the weekend, I'm increasingly convinced that the first two programs should be equivalent to:

proc foo() {
  var x;
}

foo();

which generates the reasonable error in production ATO:

/ATO/code.chpl:1: In function 'foo':
/ATO/code.chpl:2: error: 'x' is not initialized and has no type
/ATO/code.chpl:2: note: cannot find initialization point to split-init this variable

mppf · 2025-03-24T13:46:31Z

Spec says

[split init applies through a conditional if] every branch without an applicable assignment statement always returns or throws. Note that an if statement written without an else is considered to have an empty else branch. Similarly, a select without an otherwise is considered to have an empty otherwise.

Here is an example:

proc foo() {
  var x;
  if isOK { // assuming isOK is not param here
    x = createSomeValue();
  } else {
    return false;
  }
  // do something with x
  return true;
}

It's similar if, instead of return false, it is a throw (or propagating a throw from a call to a thowing function).

This other example is similar and also intended to work AFAIK:

proc foo() {
  var x;
  if !isOK { // assuming isOK is not param here
    return false;
  }
  x = createSomeValue();
  // do something with x
  return true;
}

As a result, I would suggest solving the problem differently from this suggestion in the OP.

I believe that we should emit an in both cases about missing split-initialization for x (whether x = 11 is present or not). We should also clarify the specification to say that split init is not allowed across returns.

proc test() {
  var x;
  if true then
    return;
  x = 1;
}

IMO this should compile regardless of whether the condition is param. That's exercising an intentional feature of split-init that is documented in the spec.

Regarding this:

proc test() {
  var x;
  if true then
    return;
}

I think we should focus on making decisions about this even simpler program as a starting point:

proc test() {
  var x;
  return;
}

It behaves similarly (doesn't compile with the same error).

My preference would be to keep this an error, of one sort or another. Conceptually, we might consider this legal for purposes of split-init. (Why would we do that? because it would make the rules have a nicer separation of concerns between conditionals and other code). After deciding x is split inited, we could consider it an error because there is a variable that is not initialized on any path through generated function. Or, we could leave it as it is now: the current implementation makes it an error because compilation can't proceed if we have a variable of unknown type.

It would be OK with me to make that a warning rather than an error (since we can certainly compile it and just ignore that var x was there at all; similarly we probably want to have unused variable warnings in the future for variables that are not used after they are initialized). If we wanted to move it to a warning, we would issue the warning and then remove the variable of unknown type.

Note that we have discussed some odd behavior around split init + param conditionals before. See #14996.

DanilaFe · 2025-03-24T16:59:36Z

IMO this should compile regardless of whether the condition is param. That's exercising an intentional feature of split-init that is documented in the spec.

My understanding is that this conflicts fundamentally with other parts of the compilation process. E.g., we frequently use an early-return pattern + compilerError in our module code. My final example above is one such instance. But if we do some special-case processing in a param conditional, then return early to avoid resolving subsequent (non-working code), we can't also expect that code to be resolved for the purposes of split-init.

E.g.:

proc foo(type t) {
  var x;
  if requiresSpecialHandling(t) {
    return handleSpecially(t);
  }

  x = handleNormally(t); // will throw a compiler error if `requiresSpecialHandling` is true
}

Currently, we do not resolve code after early returns, because early returns often do type-sensitive processing and rule out cases in which code after them will not work. However, determining the type of x via "dead" code would require doing so. I'm not sure how to reconcile these in a principled manner.

mppf · 2025-03-24T17:45:59Z

IMO this should compile regardless of whether the condition is param. That's exercising an intentional feature of split-init that is documented in the spec.

Currently, we do not resolve code after early returns, because early returns often do type-sensitive processing and rule out cases in which code after them will not work. However, determining the type of x via "dead" code would require doing so. I'm not sure how to reconcile these in a principled manner.

To be clear, what I meant was, "it should count as split init regardless...". See more below.

We could just determine x is dead and forget about doing anything with it. Leave it with unknown type.

I would be OK with making it be an error to have a variable that's split-init when the initialization point can never be reached in the generated code. I think this needs to be an error about the variable being unused & not exactly one about split init.

proc test() {
  var x;
  return;
}

currently an error, I think it should stay an error, but a different error. I think the error should say something like "You have a variable x which is never initialized and cannot be used".

proc test() {
  var x;
  if true then
    return;
}

currently the same error. I think it should get the same treatment: "You have a variable x which is never initialized and cannot be used".

proc test() {
  var x;
  if true then
    return;
  x = 1;
}

currently compiles, but it would be reasonable to make it an error because x is never initialized in the generated function (IMO this shouldn't be a split-init error; as a result it would fail to compile for the same reason as var x; return;: "You have a variable x which is never initialized and cannot be used".)

Depending on what we learn if we try to implement that, we should consider ignoring var x; once we establish that it is never initialized / is dead. In particular, if we discover some important use cases of this pattern in testing, I think we should think about viewing it as a kind of dead-code-elimination to forget about x.

bradcray · 2025-03-24T17:51:30Z

Another simple program we should consider is:

proc test() {
  var x;
  return;
  x = 1;
}

which passes today, unfortunately for the stance I was taking above, as that suggests that perhaps

proc test() {
  var x;
  if true then return;
  x = 1;
}

should as well. Breaking code post-2.0 questions aside, my instinct would still be to make both of the above illegal, similar to what Michael's saying here:

I would be OK with making it be an error to have a variable that's split-init when the initialization point can never be reached in the generated code.

(which seems like a change from what you said previously, is that right Michael?)

IMO this should compile regardless of whether the condition is param.

mppf · 2025-03-24T18:16:27Z

(which seems like a change from what you said previously, is that right Michael?)

Yes,

what I meant was, "it should count as split init regardless...".

@jabraham17

…inference, and split init etc. (#26955) Makes progress on Cray/chapel-private#7229, but does not resolve it. This PR takes the first step towards handling early-return or early-error patterns in the standard library. At the same time, it improves various passes' handling of diverse control flow (enables `continue`, `return`, etc in resolution, `break` and `continue` in init/deinit etc.). It also reduces cases in which default init etc. was resolved for declarations that are not reachable. See future work below for cases this PR does not handle. I approached this by making the following observation: a large number of passes / visitors rely on an approximating control flow information. * `ReturnTypeInferrer` (used for computing function return types) keeps a stack of frames along with information about whether they continued, had a `break`, or returned. This is used for iterating over `param` loops and ignoring unreachable `return`s. * `VarScopeVisitor` (used for not handling init/deinit/etc. in dead code, but also for enabling more split-init cases) * `Resolver` (not used prior to this PR, but would need to reason about returns etc. to avoid resolving dead code) In each case, this is done by creating a stack of frames (which contain data like `returnsOrThrows`, `continues`, etc), and updating that stack during traversals. In each case, there is distinct logic to reconcile information from various frames, and to handle constructs like `if` and `select` where `param`-true branches cause other branches to be ignored. This PR unifies those traversals by defining a new abstract base class called `BranchSensitiveVisitor`. This class does not have any `visit` methods of its own (the subclasses are intended to provide that). It provides methods such as `enterScope` and `exitScope` which take care of tracking and combining control flow information (e.g., deciding if all branches return/throw). It also provides methods `branchSensitivelyTraverse`, which implement common `param`-folding logic for conditionals and if/elses. It then switches _all frame-based traversals_ to use this new `BranchSensitiveVisitor`. This has numerous advantages: 1. There is less duplicated code (resolving a TODO in `return-type-inference.cpp`, which started as a watered down duplication of `VarScopeVisitor`'s code, but evolved in an orthogonal direction). 2. Each pass is guaranteed to behave the same way w.r.t. control flow. If, e.g., `Resolver` skips code because it's dead, we know that `ReturnTypeInferrer` and `VarScopeVisitor` will skip that code too). 3. It's easy to enhance analyses (like `Resolver`) with control flow handling without much additional code. As a nice side effect, `VarScopeVisitor` became aware of `continue` and `break` (previously only handled by return type inference), and `Resolver` became better at handling `break` and `continue` in `param` loop). This PR takes an opinionated stance on #26968 and disallows split-init across returns. This is a consequence of not visiting dead code for computing split-init types. ## AST Traversal Changers In all traversals, we want to skip visiting code when it's dead. This is both a performance concern (why waste time resolving functions that are not reached?) and a correctness concern (e.g., `compilerError` is guaranteed not to do anything directly after a `return`). How does one add this logic to every single `visit` / `traverse` call? I considered several options: 1. Adding a "prelude" to each `visit` method that returns early if the `BranchSensitiveVisitor` has determined that it will not be reached. This prelude could be a macro to avoid duplicating a lot of code. However, this approach is undesirable because it leads to some duplication (invocations of the macro at least; duplicate code at worst) in every single `visit` method. Further, authors of `visit` methods must remember to include this prelude lest the dead-code sensitivity be affected. 2. Following the lead of `ResolvedVisitor` and having a custom `visit` method that dispatches to some other method (e.g., `visitImpl`). This way, the `visit` can be defined once, and the dead-code checking can happen before the dispatch to `visitImpl`. However, this introduces a second dispatch, introduces churn (changing all `visit` to `visitImpl` on resolver), and is anti-compositional (if `visit` has different signatures, like `visit(node)` in Resolver and `visit(node, RV)` in `VarScopeVisitor`, `visitImpl` would have to also have a different signature; this introduces an m-by-n implementation problem). 3. Configuring the `traverse` function to exit early before even invoking `enter` and `exit`. This can be done using a template specialization to provide a "default value", and to avoid runtime overhead. I settled for 3. Specifically, all invocations of `traverse` now reference a template class with a static method, `AstVisitorPrecondition<T>::skipSubtree(node, visitor)` to decide whether to invoke `enter` / `exit` on a node and its children. The default template for `AstVisitorPrecondition` always returns true, which means the default behavior of `traverse` remains the same. However, visitors that want to return early / skip visiting code altogether can specialize the template and define a condition. `Resolver`, `VarScopeVisitor`, and `ReturnTypeInferrer` all do this; they check the current frame for signs that `return` has already been encountered, and if so, do not descend into any more AST nodes. For `Resolver`, this does not happen when `scopeResolveOnly` is set, since we always want to enter every node in that mode. I find this appealing because user code saw very little churn (10 lines to specialize the template for `Resolver`, e,g.), and each new `enter` / `exit` method added to `Resolver` will automatically work with early returns (since the `traverse` call which invokes `enter`/ `exit` knows to check). The precondition could be easily defined for all visitors affected by this PR. ## Macro Tweaks @jabraham17 and I have settled on a pattern in `chapel-py` in which the X-macro files (e.g., `uast-methods.h`) auto-defined missing X-macros and undefine them automatically. This helps reduce a lot of boilerplate. As I was experimenting with dispatch (approach 2) above, I found it convenient to apply this pattern to the Dyno code proper. Rather than polluting `uast-classes-list.h` and `type-classes-list.h` (which IMO serve as good references), I added wrapper files that handle the aforementioned niceties. I then switched `AstNode.h` and `Type.h` to using these wrappers, saving a considerable amount of noise. I did not end up with new uses of the adapters in my code (since I went with template specialization instead), but I thought the cleanup is worth including here. ## Module Changes This PR reverts (some) changes to module code from b789f5b and ee57b16, which were previously required to work around the resolver's inability to reason about early returns. Some code, however (`ChapelDomain.chpl` from b789f5b), relies on `compilerError` being treated as early return, which is future work. Reviewed by @arezaii -- thanks! ## Future Work - [ ] `compilerError`, which is considered an "early return" in production - [ ] branches that both "exit" but differently (e.g., code after `if bla then continue else break` does not execute) - [ ] param loops in the `VarScopeVisitor` family of passes (init/deinit, split init, copy elision) (see Cray/chapel-private#4209) - [ ] labeled `break` / `continue` (Cray/chapel-private#7245) ## Testing - [x] dyno tests - [x] performance testing -- frontend tests did not get any slower in CI or locally - [x] paratest ## Tested cases - [x] `VarScopeVisitor` now knows about `break` and `continue` - [x] `VarScopeVisitor` now stops traversing after `return`, which should enable reasoning about early returns (already tested) - [x] `Resolver` now reasons about control flow and skips resolving dead code - [x] `Resolver` now properly handles `continue`, `break`, and `return` in `param` loops

DanilaFe added area: Compiler area: Language labels Mar 23, 2025

DanilaFe mentioned this issue Mar 23, 2025

Dyno: unify return, break, throw etc. handling across Resolver, type inference, and split init etc. #26955

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split-init allowed past guaranteed-exit control flow #26968

Split-init allowed past guaranteed-exit control flow #26968

DanilaFe commented Mar 23, 2025

bradcray commented Mar 23, 2025

mppf commented Mar 24, 2025

DanilaFe commented Mar 24, 2025

mppf commented Mar 24, 2025

bradcray commented Mar 24, 2025

mppf commented Mar 24, 2025

Split-init allowed past guaranteed-exit control flow #26968

Split-init allowed past guaranteed-exit control flow #26968

Comments

DanilaFe commented Mar 23, 2025

bradcray commented Mar 23, 2025

mppf commented Mar 24, 2025

DanilaFe commented Mar 24, 2025

mppf commented Mar 24, 2025

bradcray commented Mar 24, 2025

mppf commented Mar 24, 2025