Replies: 3 comments
-
Thanks @saem for providing feedback on the draft of this proposal! |
Beta Was this translation helpful? Give feedback.
-
A+, would review and agree with again. |
Beta Was this translation helpful? Give feedback.
-
So, many things the proposal talks about were implemented, and there were also some changes of plan. The original proposal/plan had a strong focus on the C backend, neglecting the other backends. There was a high risk of the implementation becoming over-fitted to the needs of the C target and code generator, and thus the focus shifted towards unifying first rather than unifying at the end. Here, the introduction of code generation orchestrators (#712), lowering modules into procedures (#714), and moving discovery of alive procedures out of the code generators (#777) were big steps towards achieving a unified architecture. Another important change that happened in the meantime was the removal of the legacy GCs -- a change that made large parts of the PR the proposal is based around (#424) obsolete. In general, the focus is now more on porting over the (good) ideas and concepts from #424, rather than on porting over the implementation. To summarize: many parts of #424 are outdated or obsolete, and unifying the code generator pre-processing is in full progress. One of the next steps, at the time of writing, is decoupling the evolution of the code generators and the MIR from At first, said IR is going to be very similar to
Using the back-end IR was a hard-requirement for neither closure iterator nor |
Beta Was this translation helpful? Give feedback.
-
Introduction
First, a small summary of the referenced bits from the two relevant PRs:
The
injectdestructors
PR (#450) adds an IR for program code (i.e. the content of a procedure's body), currently named Mid-end Intermediate Representation (abbreviated as MIR). It's derived fromPNode
AST as it looks like pasttransf
and is bi-directonal, in the sense that it supports the translation from and to post-transf
PNode
AST.The procedure-local control-flow primitives are:
case
,while true
,block
, single-branchif
, andtry
/except
/finally
-- expressions are modeled as operations on values, of which the latter are either named by entities such as locals, globals, etc. or produced by other operations.Data-structure-wise, the MIR is a flat
seq
ofMirNode
s, the latter being a variant object. The MIR might currently have a small bias towards the needs of the move-analyser, as both are being developed together.The back-end PR (#424) adds:
Generally speaking, the IRs used in the new-back-end are referred to as "back-end IR", although for the rest of this write-up, I'll refer to each of the separate IRs as their own entity, and am also going to abbreviate back-end IR as BIR.
The code BIR is based around instructions, with some of them being declarative annotations. Procedure-local control-flow is encoded via
goto
,branch
,join
, andgoto-link
/resume
, the latter two being used to "call" into a span of code inside the same procedure (goto-link
) and later "return" to where the earlier call happened (resume
). They're used to implementfinally
and allow for an arbitrary amount of nesting. Each non-control-flow, non-annotation instruction yields a value which is referenced/named via the instruction's position.The current plan
After finishing and merging the
injectdestuctors
rewrite, makeirgen
operate onMirTree
instead ofPNode
AST as it does now. Then make all tests succeed, document everything, put the PR through review, and after refinement, merge it as a whole. Until the back-end PR is merged, the MIR is only used for code that requires theinjectdestructors
pass to run.A proposal for a different plan
The summary: instead of developing the new back-end in isolation and merging everything at once, incrementally merge pieces of it over time.
A core part of the new back-end are the transformation passes. Most of them work by iterating over the input instructions searching for a specific magic and then expanding it into a new instruction sequence. This works okay, but I now consider the BIR to be a bit too low-level for many of the transformations -- a slightly higher level IR would work better for them. The MIR fits this description, and it also allows for the same search-and-expand approach.
The first step would be to get a working version of the MIR merged. There are still some parts of it that need adjustments in order for the move-analyser/
injectdestructors
to be able to efficiently make use of it, but once those things are figured out, everything MIR-related could be moved from theinjecdestructors
PR to a separate PR.Instead of only activating the MIR tranlsation for procedures that require the
injectdestructors
pass, all alive code (for all code-generators, i.e.cgen
,jsgen
, andvmgen
) would be translated to MIR first and then, after all passes are applied, back toPNode
AST. The resulting flow would look like:... -> sem -> transf -> PNode-to-MIR -> transform/lowering passes -> MIR-to-PNode -> cgen/jsgen/vmgen
.Using MIR as an intermediate step is important for the next steps to work. It also acts as a good test for the translation layers and makes sure that changes to
PNode
that would break them won't go unnoticed.The next step is to transplant/port each BIR pass, for which it makes sense, to operate on the MIR. This would roughly work as follows:
irpasses.nim
cgen
that previously implemented the respective transformationThe above steps are repated until all relevant BIR passes are transplanted. Once this is done, the only logic remaining in
cgen
is that of the actual code-generator, the RTTI generation (the new back-end has its own implementation of it), and all the transformations that the new back-end doesn't yet implement. For the latter, we can then decide on what to do with them.irgen.nim
also contains some transformations/lowerings that should be moved into MIR passes. There also two important passes not yet implemented as BIR passes: theseqsv2
type and code lowering passes. Both are similar to theirseqsv1
counterpart and simple to implement -- they would be directly implemented to operate on the MIR.One thing to note is that all BIR passes operate on
irtypes.Type
and notPType
.irtypes.Type
is designed to be efficient for the type-related tasks, such as the type lowering and querying, previously performed by the new back-end. However, translating fromirtypes.Type
back toPType
is not possible because relevant information is lost, and due to the canonicalization (which the BIR passes depend on), there also doesn't exist a one-to-one mapping between them anymore.To still make the transplantation work, the BIR passes would be (temporarily) adjusted to use
PType
. This works, asPType
has all information also available withirtypes.Type
- it's just not as time and memory efficient.After all passes are transplanted, the back-end PR would be cleaned up and the remaing missing bits implemented. The new back-end is then in a state where it should support the same (remaining) features as the current
cgen
, and when merging the PR, should thus be able to directly replace it. As part of merging it, the previously-BIR-now-MIR passes which were adjusted to usePType
are changed back to work withirtypes.Type
.Post-merge
The flow through the compiler would look like:
... sem -> transf -> PNode-to-MIR -> lowering/transformation passes -> cgen
.With the new back-end (which will only have a C code-generator making use of it at first) merged, the other code-generators can be rewritten to make use of the BIR and its facilities. This will unblock further progress on the VM, as
vmgen
is one of the main blockers in that area.Moving
vmgen
to use the back-end IR will also not only significantly reduce its code in terms of complexity and size, we will also get.closure
iterator and, to some degree,method
support essentially "for free".Footnotes
a test with the current unfinished version showed that the 2nd and 3rd compiler bootstrapping iteration take ~2 seconds longer each (when booting with
--gc:refc --d:danger --exceptions:goto
) ↩Beta Was this translation helpful? Give feedback.
All reactions