-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge Preparation: Precise Tagging + Enhanced Orthogonal Persistence (32-Bit) #4383
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…laudio/small-tags
Extending the random GC tests to also allocate regions. The same tests can also be used for long-running and larger-scaling tests. Therefore, the maximum number of randomly created regions is limited - until region garbage collection will be supported.
Co-authored-by: Gabor Greif <gabor@dfinity.org>
## Changelog for motoko-base: Branch: next-moc Commits: [dfinity/motoko-base@b772c9e4...520ccf5d](dfinity/motoko-base@b772c9e...520ccf5) * [`0f14b175`](dfinity/motoko-base@0f14b17) Unused Declaration Cleanup ([dfinity/motoko-base#614](https://togithub.com/dfinity/motoko-base/issues/614))
Replaces `sdk@dfinity.org` with the recently introduced `team-motoko@dfinity.org`.
## Changelog for motoko-base: Branch: next-moc Commits: [dfinity/motoko-base@520ccf5d...712d0587](dfinity/motoko-base@520ccf5...712d058) * [`cba05e81`](dfinity/motoko-base@cba05e8) Publish on Mops ([dfinity/motoko-base#618](https://togithub.com/dfinity/motoko-base/issues/618)) * [`d81f5527`](dfinity/motoko-base@d81f552) Add commit hash to `matchers` dependency ([dfinity/motoko-base#621](https://togithub.com/dfinity/motoko-base/issues/621)) * [`c86d76ff`](dfinity/motoko-base@c86d76f) doc: update `List.mo` ([dfinity/motoko-base#616](https://togithub.com/dfinity/motoko-base/issues/616)) * [`4c2a90e7`](dfinity/motoko-base@4c2a90e) Fix compiler warning in `Array.take()` method ([dfinity/motoko-base#611](https://togithub.com/dfinity/motoko-base/issues/611))
* add flag to enable rtti * fix bugs in can_tag_i32/i64 tests and sanity checks * adjust test assert on heap size * update perf numbers * revert change * revert test * optimized clearing of all-zero tags * update perf numbers
# Unused Declaration Detection Detection of unused program declarations with compiler warnings. Program example `example.mo`: ``` import Array "mo:base/Array"; import Debug "mo:base/Debug"; actor { var variable1 = 0; var variable2 = "TEST"; func testUnusedFunction(parameter1 : Bool, parameter2 : Int) { var variable2 = 2; var variable3 = 3; let variable4 = 4; if (variable1 == 0 and variable3 == 3) { let variable2 = parameter1; Debug.print(debug_show(variable2)); }; }; }; ``` Compiler messages: ``` example.mo:1.8-1.13: warning [M0194], Unused declaration Array example.mo:6.9-6.18: warning [M0194], Unused declaration variable2 example.mo:8.10-8.28: warning [M0194], Unused declaration testUnusedFunction example.mo:8.48-8.58: warning [M0194], Unused declaration parameter2 example.mo:9.13-9.22: warning [M0194], Unused declaration variable2 example.mo:11.13-11.22: warning [M0194], Unused declaration variable4 ``` ## Coverage The analysis detects the following unused declarations: * Variables * Parameters, including shared context * Functions * Classes * Objects * Modules * Imports * Private fields in objects and classes Special aspects: * System functions are considered implicitly used. * Non-accessed stable variables are considered unused, even if they could be accessed in a future upgraded program version. ## Warnings The warning of an unused declaration can be suppressed by prefixing the identifier by an underscore. Example: ``` object Silence { public func log(_message: Text) { // Suppress the warning for the unused `_message` parameter. } } ``` ## Tweaks from #4407 * don't warn about unused declarations in code from packages (assuming packaces are third party you can't silence them anyway): * annotate LibPath Ast nodes with source package, if any, as tracked and determined during import resolution. * predicate unused declaration warnings on package origin. * don't reject unused declarations in the repl treating top-level code as belonging to fake package "<top-level>" (a mild hack). The repl can't know the rest of the interaction so any warning is premature and a nuisance. * change terminology of declarations/variables to bindings/indentifiers (for consistency with rest of code) * add error-code description in M0194.md * add changelog entry. Future: we could suppress all warnings, not just unused declarations - from imported package code this way, should we want to. A --lint mode could re-enable them for further auditing. The rationale is that the warnings are of interest to and actionable on only by the author of the package, not the client. ## Future Work The following analyses are not yet implemented but would be beneficial to support: * Unused recursive function calls (direct or indirect recursion). * Unused type definitions, unused type parameters * Unused branch labels * Unused variant options * Unused public fields: Additional aspects to consider: - Accesses via paths outside the declaration scope. - Possible usage before declaration. - Polymorphism of structural typing. - A library module may expose more (directly or indirectly) public declarations than used. * Write-only mutable variables: Mutable variables that are never read but only written * Unnecessary mutability of read-only variables: Recommend `let` instead of `var`.
The Motoko runtime representation of values is largely untyped, distinguishing only between scalar and boxed values using a single bit of the 32-bit value representation. The tagging is only to support garbage collection, not precise runtime type information. In the existing value encoding, a Motoko value in vanilla form is a 32-bit value that is either: * false `(0b0)`, * true `(0b1)`, * a word-aligned (encoded) pointer to a heap allocated value. Encoded by subtracting 1 from the pointer value (ensuring the 2 LSBs are 0b11), pointing heap allocated value * null (some well-known skewed pointer). * a 31-bit scalar value, stored in the top bits of the value with LSB 0. Scalar values encode `Nat8/16` and `Int8/16` values and chars, and 31-bit subranges of `Nat32`, `Int32`, `Nat64`, `Int64`, `Nat` and `Int`. Large integer values that don't fit in a 31-bit scalar are boxed on the heap. Observe that, in Motoko, some types are always scalar (eg. `Nat8`), some types are always boxed (e.g. `Blob`), and some types have a mixed scalar/boxed representation (e.g. `Nat32` and `Nat`), depending on the size of the value. This PR adds exact runtime type information to all[*] scalar values, making the scalar values self describing. Making the _entire_ heap fully self-describing requires refining the heap tags use to identify heap objects, distinguishing boxed `Nat32` from boxed `Int32`, `Blob` from `Principal` and `Text`, tuples from (mutable and immutable) arrays etc. That work of refining heap tags will need to be completed in a follow on or sibling PR, but is hopefully less involved than the changes herein. To add precise scalar type info, we extend the scalar tagging scheme with a richer set of (inline) type descriptors, using some of the least significant bits of the 31-bit scalar representation. To avoid dedicating a fix-length suffix (say 1 byte) to the scalar tag, scalar tags are actually variable length, using shorter tags for larger payload types, and longer tags for shorter payload types. This gives us a reasonable tag space (set of possible tags, some still unused), without reducing the scalar range of mixed representation types too much. At one extreme, the tag of `Int` (and `Nat`) is just `0b10`, leaving a 30-bit payload for compact `Nat/Int`, losing just `1` bit from the current representation's 31-bit compact range. This is important because `Int`s are common, and `Nat`s are used to index arrays, so we should avoid boxing more than necessary. In the middle, the tag of `Nat16`, `Int16` is `0b10(0^12)00` and `0b11(0^12)00`, leaving a 16-bit payload in the MSB. At the other extreme, the tag of the unit value, `()`, is 32-bit `0x01(0^28)00`, occupying the entire value. The primary motivation of this work is to support value, not type driven, serialization of stable values to a precisely typed stable format, without loss of type information, so that upgrades can still accommodate type dependent changes of representation from one in-memory format to another. Secondary motivations are live and post-mortem heap inspection tools and light-weight debugging tools, that can parse values in locals, arguments and on the heap using tags. [*] There remain some raw, untagged 31-bit scalars whose type is only known to the compiler. These are used to encode the state of text and blob iterators, hidden in dedicated iterator closure environments. Note that these are not stable types, so need not be precisely tagged for stabilization. # Tagging Scheme | Value | Type | Payload bits | |-------| ------| --------------| | `((O,O,O,O,O,O,O,O), (O,O,O,O,O,O,O,O), (O,O,O,O,O,O,O,O), (O,O,O,O,O,O,O,O))` | TBool (* false *) | 0 | | `((O,O,O,O,O,O,O,O), (O,O,O,O,O,O,O,O), (O,O,O,O,O,O,O,O), (O,O,O,O,O,O,O,I))` | TBool (* true *) | 0 | | `((_,_,_,_,_,_,_,_), (_,_,_,_,_,_,_,_), (_,_,_,_,_,_,_,_), (_,_,_,_,_,_,I,I))` | TRef | 30 | | `((_,_,_,_,_,_,_,_), (_,_,_,_,_,_,_,_), (_,_,_,_,_,_,_,_), (_,_,_,_,_,_,I,O))` | TNum | 30 | | `((_,_,_,_,_,_,_,_), (_,_,_,_,_,_,_,_), (_,_,_,_,_,_,_,_), (_,_,_,_,O,I,O,O))` | TNat64 | 28 | | `((_,_,_,_,_,_,_,_), (_,_,_,_,_,_,_,_), (_,_,_,_,_,_,_,_), (_,_,_,_,I,I,O,O))` | TInt64 | 28 | | `((_,_,_,_,_,_,_,_), (_,_,_,_,_,_,_,_), (_,_,_,_,_,_,_,_), (_,_,_,O,I,O,O,O))` | TNat32 | 27 | | `((_,_,_,_,_,_,_,_), (_,_,_,_,_,_,_,_), (_,_,_,_,_,_,_,_), (_,_,_,I,I,O,O,O))` | TInt32 | 27 | | ... unused tags .... | ... | ... | | `((_,_,_,_,_,_,_,_), (_,_,_,_,_,_,_,_), (_,_,_,_,_,O,I,O), (O,O,O,O,O,O,O,O))` | TChar | 21 | | ... unused tags .... | ... | ... | | `((_,_,_,_,_,_,_,_), (_,_,_,_,_,_,_,_), (O,I,O,O,O,O,O,O), (O,O,O,O,O,O,O,O))`| TNat16 | 16 | | `((_,_,_,_,_,_,_,_), (_,_,_,_,_,_,_,_), (I,I,O,O,O,O,O,O), (O,O,O,O,O,O,O,O))` | TInt16 | 16 | | ... unused tags .... | ... | ... | | `((_,_,_,_,_,_,_,_), (O,I,O,O,O,O,O,O), (O,O,O,O,O,O,O,O), (O,O,O,O,O,O,O,O))` | TNat8 | 8 | | `((_,_,_,_,_,_,_,_), (I,I,O,O,O,O,O,O), (O,O,O,O,O,O,O,O), (O,O,O,O,O,O,O,O))` | TInt8 | 8 | | ... unused tags .... | ... | ... | | `((O,I,O,O,O,O,O,O), (O,O,O,O,O,O,O,O), (O,O,O,O,O,O,O,O), (O,O,O,O,O,O,O,O))` | TUnit | 0 | # Implementation The implementation was carried out in a number of precursor PRs: * #4098: Added 1-byte tags to small values, untagging an retagging on every operation, with many code changes. * #4278: Made the payload/tag size for scalar values configurable using a fixed compile time constant. * #4322: Added tags to compact `Nat32/Int32` and `Nat32/Nat64`, making the payload size type-dependent. The previously untyped _StackReps_ `UnboxedWord32` and `UnboxedWord64` were extended to carry a type argument. The argument is used to remember and re-introduce the precise tag on unboxing and boxing. It can also be used to verify the tag on unboxing, for sanity checking. * #4345: Tag compact Int and Nat (both as Int due to subtyping) * #4353: Extended the range of compact `Int/Nat` from 29 to 30-bit, by adjusting the tagging scheme. This is just 1 bit less than with the existing scheme (31-bit, untagged scalars). * #4354: Improved the tagging scheme to use the longest possible tags for the required payload size, upping the ranges of unused tags (for future use) * #4357: Merge with master, fixs bugs in sanity checking of tags. Fix bugs revealing by more stringent sanity checks. * #4363: Uses the `UnboxedWord32/Word64` stack reps also for untagged, 0-right-padded small tagged values, tagging/untagging only on exit to and from stack. This alone reduces the (large) 80% overhead in bench/nat16.mo to 55%. It also has the advantage of reverting almost all changes to the arithmetic code, which can now (again) assume values are right, 0-padded as it did previously, * #4369: (this PR) does a small tweak so that mutable locals containing small tagged values in untagged form, extending the existing optimization done for mutable locals containing unboxed `Nat32`/`Int32` and `Int64`/`Nat64`. This reduces the `bench/nat16.mo` overhead from 55% to just 6% (the benchmark use repeated in-place updates in a tight loop so benefits greatly). This PR also makes use of the previously unused bit in the the compact representation of `Nat32s` and `Nat64s` which previously had to concur with the representation of `Int32` and `Nat64` and could only represent half the unsigned range. With the typed StackRep, we now know whether the values are signed or not and can choose distinct compact representation for `Nat32` vs `Int32`, and `Nat64` vs `Int64` rather that shared ones. Note however, that the compact representation for `Nat` cannot recover the missing bit because of subtyping. A compact `Nat` **must** have the same representation as a compact `Int` to support non-coercive subtyping. * #4375 (incoming): rewrite array iter optimization to respect compact bignum representation invariants. * #4400 : gate feature behind `Mo_config.Flags.rtti (default off)`, avoiding overhead for now. * added (unadvertised) flag `--experimental-rtti` to enable feature for performance feedback from users. # Overheads These are the cycle count and code size differences measured using `test/bench` and `test/perf`, compared against master (see spreadsheet for perf of interim PRs). Summarized from: https://docs.google.com/spreadsheets/d/1zC2Hsl9gGUzJESQmSABPiu-XIsICEw1I3O-JKHNWVQs/edit?usp=sharing ## perf ## test/perf Master | | | Widening | | Widening vs Master | | Gated | | Gated vs Master -- | -- | -- | -- | -- | -- | -- | -- | -- | -- gas/assetstorage | 10013950 | | gas/assetstorage | 10013950 | 0.00% | | gas/assetstorage | 10013950 | 0.00% size/assetstorage | 186455 | | size/assetstorage | 186705 | 0.13% | | size/assetstorage | 186520 | 0.03% gas/dao | 4413634512 | | gas/dao | 4413744976 | 0.00% | | gas/dao | 4413743944 | 0.00% size/dao | 265797 | | size/dao | 266385 | 0.22% | | size/dao | 265922 | 0.05% gas/qr | 1302744688 | | gas/qr | 1305067118 | 0.18% | | gas/qr | 1302750018 | 0.00% size/qr | 256049 | | size/qr | 256925 | 0.34% | | size/qr | 256285 | 0.09% gas/reversi | 80920993 | | gas/reversi | 81019001 | 0.12% | | gas/reversi | 80927129 | 0.01% size/reversi | 175956 | | size/reversi | 176421 | 0.26% | | size/reversi | 176084 | 0.07% gas/sha224 | 460197621 | | gas/sha224 | 498978947 | 8.43% | | | | size/sha224 | 191929 | | size/sha224 | 192859 | 0.48% | | | | gas/sha256 | 14487063673 | | gas/sha256 | 15568532694 | 7.47% | | gas/sha256 | 14486916565 | 0.00% size/sha256 | 179075 | | size/sha256 | 180167 | 0.61% | | size/sha256 | 179223 | 0.08% ## test/bench Master | | | Widening | | Widening vs Master | | Gated | | Gated vs Master -- | -- | -- | -- | -- | -- | -- | -- | -- | -- gas/alloc | 9,243,068,120.00 | | gas/alloc | 10,350,366,461.00 | 11.98% | | gas/alloc | 9243068126 | 0.00% size/alloc | 181,066.00 | | size/alloc | 180,759.00 | -0.17% | | size/alloc | 180464 | -0.33% gas/bignum | 130,604,743.00 | | gas/bignum | 130,606,013.00 | 0.00% | | gas/bignum | 130604779 | 0.00% size/bignum | 184,420.00 | | size/bignum | 184,093.00 | -0.18% | | size/bignum | 183790 | -0.34% gas/heap-32 | 1,610,218,447.00 | | gas/heap-32 | 1,695,702,521.00 | 5.31% | | gas/heap-32 | 1609469958 | -0.05% size/heap-32 | 182,167.00 | | size/heap-32 | 181,856.00 | -0.17% | | size/heap-32 | 181556 | -0.34% gas/nat16 | 61,393,031.00 | | gas/nat16 | 65,587,813.00 | 6.83% | | gas/nat16 | 61393019 | 0.00% size/nat16 | 181,010.00 | | size/nat16 | 180,727.00 | -0.16% | | size/nat16 | 180408 | -0.33% gas/palindrome | 10,131,340.00 | | gas/palindrome | 10,133,866.00 | 0.02% | | gas/palindrome | 10131268 | 0.00% size/palindrome | 185,338.00 | | size/palindrome | 185,024.00 | -0.17% | | size/palindrome | 184695 | -0.35% gas/region0-mem | 6,402,149,937.00 | | gas/region0-mem | 6,452,495,054.00 | 0.79% | | gas/region0-mem | 6402149955 | 0.00% size/region0-mem | 181,898.00 | | size/region0-mem | 181,602.00 | -0.16% | | size/region0-mem | 181281 | -0.34% gas/region-mem | 5,974,331,587.00 | | gas/region-mem | 6,024,676,752.00 | 0.84% | | gas/region-mem | 5974331605 | 0.00% size/region-mem | 181,539.00 | | size/region-mem | 181,252.00 | -0.16% | | size/region-mem | 180931 | -0.33% gas/stable-mem | 3,885,566,188.00 | | gas/stable-mem | 3,935,898,195.00 | 1.30% | | gas/stable-mem | 3885566206 | 0.00% size/stable-mem | 181,896.00 | | size/stable-mem | 181,600.00 | -0.16% | | size/stable-mem | 181279 | -0.34% gas/xxx-nat32 | 57,198,791.00 | | gas/xxx-nat32 | 57,199,237.00 | 0.00% | | gas/xxx-nat32 | 57198779 | 0.00% size/xxx-nat32 | 181,001.00 | | size/xxx-nat32 | 180,694.00 | -0.17% | | size/xxx-nat32 | 180399 | -0.33%
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Preparing merging #4369 in #4193