Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v20.x] backport unflagging of require(esm), part 1 (of 4?) #56730

Open
wants to merge 19 commits into
base: v20.x-staging
Choose a base branch
from

Conversation

joyeecheung
Copy link
Member

@joyeecheung joyeecheung commented Jan 23, 2025

This is what I have so far by following the worklist in #52697 (comment) - not sure how we are going to manage the backport merges and whether this strategy is the best, but just opening a PR for now to leave some traces and get some discussion before I get too deep in the rabbit hole.

Notable things in this backport:

  1. The principle that I've been following is to follow the worklist I generated in Tracking issue: require(esm) #52697 (by filtering logs of lib/internal/modules/, src/node_contextify.cc and src/module_wrap.cc, I am fairly certain all the relevant changes must have touched at least one of these), and picking all the commits that are affecting code paths of require(esm), no matter they are directly done for require(esm) or not, to reduce conflicts and avoid introducing disconnected bits that might cause bugs. The commits that are not crossing paths with require(esm) under my assessment and the ones that obviously should not land (e.g. mass refactoring, semver-major) are omitted.
  2. In this batch (from roughly 1/4 of the worklist), there is a semver-minor change that supports NODE_COMPILE_CACHE. I feel that this is reasonably safe to backport since it's opt-in and has been on v22 since v22.1.0, and I don't think it would break anyone. This needs to be accompanied with a backport of some V8 bug fixes that are not on v20.x, which I opened a separate PR for in [v20.x] backport V8 changes related to compile cache #56711 because they are helpful for users hitting these bugs (e.g. Jest) even if we don't include Node.js's own on-disk compile cache.
  3. I did some modifications to lib,src: iterate module requests of a module wrap in JS #52058 to remove the freezing of module.dependencySpecifiers() (which is semver-major) and added a internal polyfill for FromV8Array in util.h since the new V8 API isn't on v20.x.
  4. I patched module: print amount of load time of a module #52213 a bit to wire the tracing into internalRequire() which is still a thing on v20.x since policy is still there.
  5. If we go down with this strategy, in future batches these two semver-minor changes would also be backported to help reducing conflicts:
    1. Unflagging of detect-module. There is a non-zero risk of breakage, but on the other hand, this has been unflagged since v22.7.0 and I think all the breakages reported were just bugs that are already fixed. We just need to make sure that the bug fixes are backported together with the unflagging. Unflagging require(esm) without also unflagging detect-module can be risky because the code are very intertwined, not just source-level but also logic-level.
    2. Type stripping. I think there were some reports about the breakage introduced by unflagging and it has not been unflagged on v22 yet, so it may be safer to just backport it for reducing the conflicts, but keep the flag on 20.

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/gyp
  • @nodejs/loaders
  • @nodejs/performance
  • @nodejs/security-wg
  • @nodejs/startup
  • @nodejs/v8-update

@nodejs-github-bot nodejs-github-bot added lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. v20.x v20.x Issues that can be reproduced on v20.x or PRs targeting the v20.x-staging branch. labels Jan 23, 2025
@joyeecheung joyeecheung added the request-ci Add this label to start a Jenkins CI on a PR. label Jan 23, 2025
@joyeecheung
Copy link
Member Author

cc @nodejs/releasers

@aduh95 aduh95 added semver-minor PRs that contain new features and should be released in the next minor version. and removed request-ci Add this label to start a Jenkins CI on a PR. labels Jan 23, 2025
@aduh95
Copy link
Contributor

aduh95 commented Jan 23, 2025

aduh95 removed the request-ci Add this label to start a Jenkins CI on a PR. label

Let's wait for #56727 to land as currently the CI is not passing on the staging branch

joyeecheung and others added 19 commits January 23, 2025 18:07
Original commit message:

    Reland "[cache] Don't compare host defined options if the script was deserialized"

    This is a reland of commit b9cfb8b7cfdbf195c3baf87735865948dfa5907e

    Original change's description:
    > [cache] Don't compare host defined options if the script was deserialized
    >
    > We don't serialize host defined options (see
    > CodeSerializer::SerializeObjectImpl()).
    >
    > Change-Id: Icab9fe910a5ec93b5eacc4869aba75034ad8b447
    > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/4805329
    > Reviewed-by: Camillo Bruni <cbruni@chromium.org>
    > Reviewed-by: Toon Verwaest <verwaest@chromium.org>
    > Commit-Queue: Tao Pan <tao.pan@intel.com>
    > Cr-Commit-Position: refs/heads/main@{#90698}

    Change-Id: I7ea5e9355056104ebd25b385aba63c1233d42260
    Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/4998581
    Reviewed-by: Camillo Bruni <cbruni@chromium.org>
    Reviewed-by: Toon Verwaest <verwaest@chromium.org>
    Commit-Queue: Tao Pan <tao.pan@intel.com>
    Cr-Commit-Position: refs/heads/main@{#90711}

Refs: v8/v8@7b677a5
Original commit message:

    [compiler] support isolate compilation cache in CompileFunction()

    Previously there was no isolate compilation cache support for
    scripts compiled Script::CompileFunction() with wrapped arguments.
    This patch adds support for that.

    Refs: nodejs#35375

    Change-Id: Id1849961ecd1282eb2dac95829157d167a3aa9a1
    Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/4962094
    Reviewed-by: Camillo Bruni <cbruni@chromium.org>
    Commit-Queue: Joyee Cheung <joyee@igalia.com>
    Cr-Commit-Position: refs/heads/main@{#91681}

Refs: v8/v8@f4b3f6e
Original commit message:

    [compiler] reset script details in functions deserialized from code cache

    During the serialization of the code cache, V8 would wipe out the
    host-defined options, so after a script id deserialized from the
    code cache, the host-defined options need to be reset on the script
    using what's provided by the embedder when doing the deserializing
    compilation, otherwise the HostImportModuleDynamically callbacks
    can't get the data it needs to implement dynamic import().

    Change-Id: I33cc6a5e43b6469d3527242e083f7ae6d8ed0c6a
    Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/5401780
    Reviewed-by: Leszek Swirski <leszeks@chromium.org>
    Commit-Queue: Joyee Cheung <joyee@igalia.com>
    Cr-Commit-Position: refs/heads/main@{#93323}

Refs: v8/v8@cd10ad7
Co-authored-by: Joyee Cheung <joyeec9h3@gmail.com>
Original commit message:

    [snapshot] Check if a cached data has wrapped arguments

    Fixes that ScriptCompiler::CreateCodeCacheForFunction aborts on
    a deserialized shared function info from a cached data accepted
    with ScriptCompiler::CompileFunction. If the wrapped argument list
    does not match, the cached data should be rejected.

    Refs: nodejs#56366
    Change-Id: I3f0376216791d866fac8eed1ce88dfa05e919b48
    Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/6140933
    Commit-Queue: Chengzhong Wu <cwu631@bloomberg.net>
    Reviewed-by: Leszek Swirski <leszeks@chromium.org>
    Cr-Commit-Position: refs/heads/main@{#97942}

Refs: v8/v8@96ee9bb
Co-authored-by: Joyee Cheung <joyeec9h3@gmail.com>
PR-URL: nodejs#50322
Backport-PR-URL: nodejs#56590
Reviewed-By: Jacob Smith <jacob@frende.me>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Geoffrey Booth <webadmin@geoffreybooth.com>
Co-authored-by: Daniel Lemire <daniel@lemire.me>
PR-URL: nodejs#50322
Backport-PR-URL: nodejs#56590
Reviewed-By: Jacob Smith <jacob@frende.me>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Geoffrey Booth <webadmin@geoffreybooth.com>
PR-URL: nodejs#50322
Backport-PR-URL: nodejs#56590
Reviewed-By: Jacob Smith <jacob@frende.me>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Geoffrey Booth <webadmin@geoffreybooth.com>
PR-URL: nodejs#52093
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz.nizipli@sentry.io>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Jacob Smith <jacob@frende.me>
Reviewed-By: Richard Lau <rlau@redhat.com>
Otherwise re-entering V8 doesn't work as expected after exceptions
were thrown.

Refs: https://chromium-review.googlesource.com/c/v8/v8/+/5050065

Co-Authored-By: Toon Verwaest <verwaest@chromium.org>
Co-Authored-By: deepak1556 <hop2deep@gmail.com>
PR-URL: nodejs#51362
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
This patch implements automatic on-disk code caching that can be enabled
via an environment variable NODE_COMPILE_CACHE.

When set, whenever Node.js compiles a CommonJS or a ECMAScript Module,
it will use on-disk [V8 code cache][] persisted in the specified
directory to speed up the compilation. This may slow down the first
load of a module graph, but subsequent loads of the same module graph
may get a significant speedup if the contents of the modules do not
change. Locally, this speeds up loading of
test/fixtures/snapshot/typescript.js from ~130ms to ~80ms.

To clean up the generated code cache, simply remove the directory.
It will be recreated the next time the same directory is used for
`NODE_COMPILE_CACHE`.

Compilation cache generated by one version of Node.js may not be used
by a different version of Node.js. Cache generated by different versions
of Node.js will be stored separately if the same directory is used
to persist the cache, so they can co-exist.

Caveat: currently when using this with V8 JavaScript code coverage, the
coverage being collected by V8 may be less precise in functions that are
deserialized from the code cache. It's recommended to turn this off when
running tests to generate precise coverage.

Implementation details:

There is one cache file per module on disk. The directory layout
is:

- Compile cache directory (from NODE_COMPILE_CACHE)
  - 8b23c8fe: CRC32 hash of CachedDataVersionTag + NODE_VERESION
  - 2ea3424d:
     - 10860e5a: CRC32 hash of filename + module type
     - 431e9adc: ...
     - ...

Inside the cache file, there is a header followed by the actual
cache content:

```
[uint32_t] code size
[uint32_t] code hash
[uint32_t] cache size
[uint32_t] cache hash
... compile cache content ...
```

When reading the cache file, we'll also check if the code size
and code hash match the code that the module loader is loading
and whether the cache size and cache hash match the file content
read. If they don't match, or if V8 rejects the cache passed,
we'll ignore the mismatch cache, and regenerate the cache after
compilation succeeds and rewrite it to disk.

PR-URL: nodejs#52535
Refs: nodejs#47472
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz.nizipli@sentry.io>
Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com>
Instead of using an async function wrapper, just try compiling code with
unknown module format as SourceTextModule when it cannot be compiled
as CJS and the error message indicates that it's worth a retry. If
it can be parsed as SourceTextModule then it's considered ESM.

Also, move shouldRetryAsESM() to C++ completely so that
we can reuse it in the CJS module loader for require(esm).

Drive-by: move methods that don't belong to ContextifyContext
out as static methods and move GetHostDefinedOptions to
ModuleWrap.

PR-URL: nodejs#52413
Reviewed-By: Geoffrey Booth <webadmin@geoffreybooth.com>
Reviewed-By: Jacob Smith <jacob@frende.me>
It might be worth designing a policy for the compilation cache. For
now, just skip the cache when policy is enabled.

PR-URL: nodejs#52577
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Avoid repetitively calling into JS callback from C++ in
`ModuleWrap::Link`. This removes the convoluted callback style of the
internal `ModuleWrap` link step.

PR-URL: nodejs#52058
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
This patch:

1. Adds ESM syntax detection to compileFunctionForCJSLoader()
  for --experimental-detect-module and allow it to emit the
  warning for how to load ESM when it's used to parse ESM as
  CJS but detection is not enabled.
2. Moves the ESM detection of --experimental-detect-module for
  the entrypoint from executeUserEntryPoint() into
  Module.prototype._compile() and handle it directly in the
  CJS loader so that the errors thrown during compilation *and
  execution* during the loading of the entrypoint does not
  need to be bubbled all the way up. If the entrypoint doesn't
  parse as CJS, and detection is enabled, the CJS loader will
  re-load the entrypoint as ESM on the spot asynchronously using
  runEntryPointWithESMLoader() and cascadedLoader.import(). This
  is fine for the entrypoint because unlike require(ESM) we don't
  the namespace of the entrypoint synchronously, and can just
  ignore the returned value. In this case process.mainModule is
  reset to undefined as they are not available for ESM entrypoints.
3. Supports --experimental-detect-module for require(esm).

PR-URL: nodejs#52047
Reviewed-By: Geoffrey Booth <webadmin@geoffreybooth.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
So that if there are circular dependencies in the synchronous
module graph, they could be resolved using the cached jobs.
In case linking fails and the error gets caught, reset the
cache right after linking. If it succeeds, the caller will
cache it again. Otherwise the error bubbles up to users,
and since we unset the cache for the unlinkable module
the next attempt would still fail.

PR-URL: nodejs#52868
Fixes: nodejs#52864
Reviewed-By: Moshe Atlow <moshe@atlow.co.il>
Reviewed-By: Vinícius Lourenço Claro Cardoso <contact@viniciusl.com.br>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
PR-URL: nodejs#53050
Reviewed-By: Geoffrey Booth <webadmin@geoffreybooth.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Unifies the CJS and ESM source map cache map with SourceMapCacheMap
and allows the CJS cache entries to be queried more efficiently with
a source url without iteration on an IterableWeakMap.

Add a test to verify that the CJS source map cache entry can be
reclaimed.

PR-URL: nodejs#51711
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
PR-URL: nodejs#52213
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
PR-URL: nodejs#52658
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Daniel Lemire <daniel@lemire.me>
@joyeecheung joyeecheung force-pushed the backport-require-esm-20 branch from a51b482 to 5d03fce Compare January 23, 2025 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. semver-minor PRs that contain new features and should be released in the next minor version. v20.x v20.x Issues that can be reproduced on v20.x or PRs targeting the v20.x-staging branch.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants