Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split-init allowed past guaranteed-exit control flow #26968

Open
DanilaFe opened this issue Mar 23, 2025 · 6 comments
Open

Split-init allowed past guaranteed-exit control flow #26968

DanilaFe opened this issue Mar 23, 2025 · 6 comments

Comments

@DanilaFe
Copy link
Contributor

In production, this program compiles just fine:

proc test() {
  var x;
  if true then
    return;
  x = 1;
}

but the following program does not:

proc test() {
  var x;
  if true then
    return;
}

I find this surprising, because, as far as Chapel is concerned, the code below return doesn’t happen. This is particularly true when we use early-returns to avoid compilerError:

proc bla() {
  if true then return;
  compilerError("Unexpected condition!"); // does not error
}

combining all this, I think we end up with odd behavior in the following case:

proc test() {
  var x;
  if true then
    return;
  compilerWarning("no");
  x = 1;
}

I believe that we should emit an in both cases about missing split-initialization for x (whether x = 11 is present or not). We should also clarify the specification to say that split init is not allowed across returns.

@bradcray
Copy link
Member

Thinking about this a bit more over the weekend, I'm increasingly convinced that the first two programs should be equivalent to:

proc foo() {
  var x;
}

foo();

which generates the reasonable error in production ATO:

/ATO/code.chpl:1: In function 'foo':
/ATO/code.chpl:2: error: 'x' is not initialized and has no type
/ATO/code.chpl:2: note: cannot find initialization point to split-init this variable

@mppf
Copy link
Member

mppf commented Mar 24, 2025

Spec says

[split init applies through a conditional if] every branch without an applicable assignment statement always returns or throws. Note that an if statement written without an else is considered to have an empty else branch. Similarly, a select without an otherwise is considered to have an empty otherwise.

Here is an example:

proc foo() {
  var x;
  if isOK { // assuming isOK is not param here
    x = createSomeValue();
  } else {
    return false;
  }
  // do something with x
  return true;
}

It's similar if, instead of return false, it is a throw (or propagating a throw from a call to a thowing function).

This other example is similar and also intended to work AFAIK:

proc foo() {
  var x;
  if !isOK { // assuming isOK is not param here
    return false;
  }
  x = createSomeValue();
  // do something with x
  return true;
}

As a result, I would suggest solving the problem differently from this suggestion in the OP.

I believe that we should emit an in both cases about missing split-initialization for x (whether x = 11 is present or not). We should also clarify the specification to say that split init is not allowed across returns.

proc test() {
  var x;
  if true then
    return;
  x = 1;
}

IMO this should compile regardless of whether the condition is param. That's exercising an intentional feature of split-init that is documented in the spec.

Regarding this:

proc test() {
  var x;
  if true then
    return;
}

I think we should focus on making decisions about this even simpler program as a starting point:

proc test() {
  var x;
  return;
}

It behaves similarly (doesn't compile with the same error).

My preference would be to keep this an error, of one sort or another. Conceptually, we might consider this legal for purposes of split-init. (Why would we do that? because it would make the rules have a nicer separation of concerns between conditionals and other code). After deciding x is split inited, we could consider it an error because there is a variable that is not initialized on any path through generated function. Or, we could leave it as it is now: the current implementation makes it an error because compilation can't proceed if we have a variable of unknown type.

It would be OK with me to make that a warning rather than an error (since we can certainly compile it and just ignore that var x was there at all; similarly we probably want to have unused variable warnings in the future for variables that are not used after they are initialized). If we wanted to move it to a warning, we would issue the warning and then remove the variable of unknown type.

Note that we have discussed some odd behavior around split init + param conditionals before. See #14996.

@DanilaFe
Copy link
Contributor Author

IMO this should compile regardless of whether the condition is param. That's exercising an intentional feature of split-init that is documented in the spec.

My understanding is that this conflicts fundamentally with other parts of the compilation process. E.g., we frequently use an early-return pattern + compilerError in our module code. My final example above is one such instance. But if we do some special-case processing in a param conditional, then return early to avoid resolving subsequent (non-working code), we can't also expect that code to be resolved for the purposes of split-init.

E.g.:

proc foo(type t) {
  var x;
  if requiresSpecialHandling(t) {
    return handleSpecially(t);
  }

  x = handleNormally(t); // will throw a compiler error if `requiresSpecialHandling` is true
}

Currently, we do not resolve code after early returns, because early returns often do type-sensitive processing and rule out cases in which code after them will not work. However, determining the type of x via "dead" code would require doing so. I'm not sure how to reconcile these in a principled manner.

@mppf
Copy link
Member

mppf commented Mar 24, 2025

IMO this should compile regardless of whether the condition is param. That's exercising an intentional feature of split-init that is documented in the spec.

Currently, we do not resolve code after early returns, because early returns often do type-sensitive processing and rule out cases in which code after them will not work. However, determining the type of x via "dead" code would require doing so. I'm not sure how to reconcile these in a principled manner.

To be clear, what I meant was, "it should count as split init regardless...". See more below.

We could just determine x is dead and forget about doing anything with it. Leave it with unknown type.

I would be OK with making it be an error to have a variable that's split-init when the initialization point can never be reached in the generated code. I think this needs to be an error about the variable being unused & not exactly one about split init.

proc test() {
  var x;
  return;
}

currently an error, I think it should stay an error, but a different error. I think the error should say something like "You have a variable x which is never initialized and cannot be used".

proc test() {
  var x;
  if true then
    return;
}

currently the same error. I think it should get the same treatment: "You have a variable x which is never initialized and cannot be used".

proc test() {
  var x;
  if true then
    return;
  x = 1;
}

currently compiles, but it would be reasonable to make it an error because x is never initialized in the generated function (IMO this shouldn't be a split-init error; as a result it would fail to compile for the same reason as var x; return;: "You have a variable x which is never initialized and cannot be used".)

Depending on what we learn if we try to implement that, we should consider ignoring var x; once we establish that it is never initialized / is dead. In particular, if we discover some important use cases of this pattern in testing, I think we should think about viewing it as a kind of dead-code-elimination to forget about x.

@bradcray
Copy link
Member

Another simple program we should consider is:

proc test() {
  var x;
  return;
  x = 1;
}

which passes today, unfortunately for the stance I was taking above, as that suggests that perhaps

proc test() {
  var x;
  if true then return;
  x = 1;
}

should as well. Breaking code post-2.0 questions aside, my instinct would still be to make both of the above illegal, similar to what Michael's saying here:

I would be OK with making it be an error to have a variable that's split-init when the initialization point can never be reached in the generated code.

(which seems like a change from what you said previously, is that right Michael?)

IMO this should compile regardless of whether the condition is param.

@mppf
Copy link
Member

mppf commented Mar 24, 2025

(which seems like a change from what you said previously, is that right Michael?)

Yes,

what I meant was, "it should count as split init regardless...".

DanilaFe added a commit that referenced this issue Mar 26, 2025
…inference, and split init etc. (#26955)

Makes progress on Cray/chapel-private#7229,
but does not resolve it.

This PR takes the first step towards handling early-return or
early-error patterns in the standard library. At the same time, it
improves various passes' handling of diverse control flow (enables
`continue`, `return`, etc in resolution, `break` and `continue` in
init/deinit etc.). It also reduces cases in which default init etc. was
resolved for declarations that are not reachable.

See future work below for cases this PR does not handle.

I approached this by making the following observation: a large number of
passes / visitors rely on an approximating control flow information.

* `ReturnTypeInferrer` (used for computing function return types) keeps
a stack of frames along with information about whether they continued,
had a `break`, or returned. This is used for iterating over `param`
loops and ignoring unreachable `return`s.
* `VarScopeVisitor` (used for not handling init/deinit/etc. in dead
code, but also for enabling more split-init cases)
* `Resolver` (not used prior to this PR, but would need to reason about
returns etc. to avoid resolving dead code)

In each case, this is done by creating a stack of frames (which contain
data like `returnsOrThrows`, `continues`, etc), and updating that stack
during traversals. In each case, there is distinct logic to reconcile
information from various frames, and to handle constructs like `if` and
`select` where `param`-true branches cause other branches to be ignored.

This PR unifies those traversals by defining a new abstract base class
called `BranchSensitiveVisitor`. This class does not have any `visit`
methods of its own (the subclasses are intended to provide that). It
provides methods such as `enterScope` and `exitScope` which take care of
tracking and combining control flow information (e.g., deciding if all
branches return/throw). It also provides methods
`branchSensitivelyTraverse`, which implement common `param`-folding
logic for conditionals and if/elses. It then switches _all frame-based
traversals_ to use this new `BranchSensitiveVisitor`. This has numerous
advantages:

1. There is less duplicated code (resolving a TODO in
`return-type-inference.cpp`, which started as a watered down duplication
of `VarScopeVisitor`'s code, but evolved in an orthogonal direction).
2. Each pass is guaranteed to behave the same way w.r.t. control flow.
If, e.g., `Resolver` skips code because it's dead, we know that
`ReturnTypeInferrer` and `VarScopeVisitor` will skip that code too).
3. It's easy to enhance analyses (like `Resolver`) with control flow
handling without much additional code.

As a nice side effect, `VarScopeVisitor` became aware of `continue` and
`break` (previously only handled by return type inference), and
`Resolver` became better at handling `break` and `continue` in `param`
loop).

This PR takes an opinionated stance on
#26968 and disallows
split-init across returns. This is a consequence of not visiting dead
code for computing split-init types.

## AST Traversal Changers

In all traversals, we want to skip visiting code when it's dead. This is
both a performance concern (why waste time resolving functions that are
not reached?) and a correctness concern (e.g., `compilerError` is
guaranteed not to do anything directly after a `return`). How does one
add this logic to every single `visit` / `traverse` call?

I considered several options:

1. Adding a "prelude" to each `visit` method that returns early if the
`BranchSensitiveVisitor` has determined that it will not be reached.
This prelude could be a macro to avoid duplicating a lot of code.
However, this approach is undesirable because it leads to some
duplication (invocations of the macro at least; duplicate code at worst)
in every single `visit` method. Further, authors of `visit` methods must
remember to include this prelude lest the dead-code sensitivity be
affected.
2. Following the lead of `ResolvedVisitor` and having a custom `visit`
method that dispatches to some other method (e.g., `visitImpl`). This
way, the `visit` can be defined once, and the dead-code checking can
happen before the dispatch to `visitImpl`. However, this introduces a
second dispatch, introduces churn (changing all `visit` to `visitImpl`
on resolver), and is anti-compositional (if `visit` has different
signatures, like `visit(node)` in Resolver and `visit(node, RV)` in
`VarScopeVisitor`, `visitImpl` would have to also have a different
signature; this introduces an m-by-n implementation problem).
3. Configuring the `traverse` function to exit early before even
invoking `enter` and `exit`. This can be done using a template
specialization to provide a "default value", and to avoid runtime
overhead.

I settled for 3. Specifically, all invocations of `traverse` now
reference a template class with a static method,
`AstVisitorPrecondition<T>::skipSubtree(node, visitor)` to decide
whether to invoke `enter` / `exit` on a node and its children. The
default template for `AstVisitorPrecondition` always returns true, which
means the default behavior of `traverse` remains the same. However,
visitors that want to return early / skip visiting code altogether can
specialize the template and define a condition. `Resolver`,
`VarScopeVisitor`, and `ReturnTypeInferrer` all do this; they check the
current frame for signs that `return` has already been encountered, and
if so, do not descend into any more AST nodes. For `Resolver`, this does
not happen when `scopeResolveOnly` is set, since we always want to enter
every node in that mode.

I find this appealing because user code saw very little churn (10 lines
to specialize the template for `Resolver`, e,g.), and each new `enter` /
`exit` method added to `Resolver` will automatically work with early
returns (since the `traverse` call which invokes `enter`/ `exit` knows
to check). The precondition could be easily defined for all visitors
affected by this PR.

## Macro Tweaks
@jabraham17 and I have settled on a pattern in `chapel-py` in which the
X-macro files (e.g., `uast-methods.h`) auto-defined missing X-macros and
undefine them automatically. This helps reduce a lot of boilerplate. As
I was experimenting with dispatch (approach 2) above, I found it
convenient to apply this pattern to the Dyno code proper. Rather than
polluting `uast-classes-list.h` and `type-classes-list.h` (which IMO
serve as good references), I added wrapper files that handle the
aforementioned niceties. I then switched `AstNode.h` and `Type.h` to
using these wrappers, saving a considerable amount of noise. I did not
end up with new uses of the adapters in my code (since I went with
template specialization instead), but I thought the cleanup is worth
including here.

## Module Changes
This PR reverts (some) changes to module code from
b789f5b and
ee57b16, which were previously required
to work around the resolver's inability to reason about early returns.
Some code, however (`ChapelDomain.chpl` from
b789f5b), relies on `compilerError`
being treated as early return, which is future work.

Reviewed by @arezaii -- thanks!

## Future Work
- [ ] `compilerError`, which is considered an "early return" in
production
- [ ] branches that both "exit" but differently (e.g., code after `if
bla then continue else break` does not execute)
- [ ] param loops in the `VarScopeVisitor` family of passes
(init/deinit, split init, copy elision) (see
Cray/chapel-private#4209)
- [ ] labeled `break` / `continue`
(Cray/chapel-private#7245)

## Testing
- [x] dyno tests
- [x] performance testing -- frontend tests did not get any slower in CI
or locally
- [x] paratest

## Tested cases

- [x] `VarScopeVisitor` now knows about `break` and `continue`
- [x] `VarScopeVisitor` now stops traversing after `return`, which
should enable reasoning about early returns (already tested)
- [x] `Resolver` now reasons about control flow and skips resolving dead
code
- [x] `Resolver` now properly handles `continue`, `break`, and `return`
in `param` loops
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants