Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: fix generation of path objects in Windows #56696

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

yamachu
Copy link

@yamachu yamachu commented Jan 22, 2025

fix: #56650
ref: #56657

This PR fixes a problem that caused a segmentation fault in module resolution when creating a require object with multibyte characters in a non-English environment.

Since the issue occurred when generating a std::filesystem::path object, some patches were applied to the changed areas in the following PR.

a7dad43

Since the issue was reproduced only in different locales, such as ja-JP on Windows, I rewrote the locale in the CI of coverage-windows to run the test.
https://github.com/yamachu/node/pull/2/files#diff-29094741d50149aa772b3e577ad509116bad722ad2de85689b6cb2c01e806a46

.github/workflows/coverage-windows.yml

+      - name: Change locale ja-JP for testing on SJIS environment
+        run: Set-WinSystemLocale -SystemLocale "ja-JP"
+      # to avoid configure, nobuild and noprojgen is needed
+      - name: Test on SJIS environment
+        run: ./vcbuild.bat nobuild noprojgen test-ci-js; node -e 'process.exit(0)'
+        env:
+          NODE_V8_COVERAGE: ./coverage/tmp

The previous PR did not solve the problem when using unicode, so the process was separated for windows and the path object was generated.

The part where std::filesystem::path is used can be fixed by changing this PR.

The results of the test conducted in a Japanese environment can be seen below.
https://github.com/yamachu/node/actions/runs/12903928231/job/35980111577#step:12:16844

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. labels Jan 22, 2025
Copy link

codecov bot commented Jan 22, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.22%. Comparing base (d978610) to head (24ac7b9).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #56696   +/-   ##
=======================================
  Coverage   89.21%   89.22%           
=======================================
  Files         662      662           
  Lines      191945   191950    +5     
  Branches    36948    36950    +2     
=======================================
+ Hits       171238   171258   +20     
+ Misses      13549    13542    -7     
+ Partials     7158     7150    -8     
Files with missing lines Coverage Δ
src/node_modules.cc 78.91% <100.00%> (+0.13%) ⬆️

... and 25 files with indirect coverage changes

Copy link
Member

@anonrig anonrig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A similar already landed last August: #54653

We should use ToU8StringView() rather than introducing a new API.

@yamachu
Copy link
Author

yamachu commented Jan 23, 2025

@anonrig

Thanks for the review!
Does this mean in short that I should use std::u8string?

However, there is a problem with the u8string approach.
As I show in the description, the test-require-unicode.js test fails when I run code using u8string in SJIS and en-US(en-US...? e.g. GHA Windows Runner default locale)environment on Windows (so affected ALL WINDOWS).

Therefore, I provided and used an API to handle wstrings without using ToU8StringView.

@anonrig
Copy link
Member

anonrig commented Jan 23, 2025

I recommend improving existing solution to fit both cases rather than implementing a new solution.

@yamachu
Copy link
Author

yamachu commented Jan 23, 2025

What are "both cases" presented here?

I am sure that ToU8StringView is not used in current codebase.
Since it is not being used, it does not currently solve the problem.
It is not a solution because I know that using it again will cause problems.

I don't think that trying hard to use u8string in path is a better way to go....

@yamachu
Copy link
Author

yamachu commented Jan 23, 2025

The base branch is still bf59539 , so the associated test seems to be fail. (experimental flag)
Should I rebase the branch and force push?

@lpinca
Copy link
Member

lpinca commented Jan 23, 2025

Should I rebase the branch and force push?

Yes, thank you.

@yamachu yamachu force-pushed the fix-windows-path-string branch from 0697f02 to cba8830 Compare January 23, 2025 12:57
@lpinca lpinca added the request-ci Add this label to start a Jenkins CI on a PR. label Jan 23, 2025
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Jan 23, 2025
@nodejs-github-bot
Copy link
Collaborator

@yamachu
Copy link
Author

yamachu commented Jan 23, 2025

I had to rush to write my comments before work.
I apologize if any of my comments offended you.

I have rebased and force pushed the file, so please check with CI.

I would have liked to have been able to do this beforehand, but I looked over the entire code.
There I found the following helper functions.

node/src/node_file.cc

Lines 3149 to 3160 in d978610

std::wstring ConvertToWideString(const std::string& str) {
int size_needed = MultiByteToWideChar(
CP_UTF8, 0, &str[0], static_cast<int>(str.size()), nullptr, 0);
std::wstring wstrTo(size_needed, 0);
MultiByteToWideChar(CP_UTF8,
0,
&str[0],
static_cast<int>(str.size()),
&wstrTo[0],
size_needed);
return wstrTo;
}

This was exactly what I was looking for.
I rewrote what I wrote in this PR and added the CP settings and the test passed(on SJIS Windows environment).

@@ -326,6 +330,22 @@ const BindingData::PackageConfig* BindingData::TraverseParent(
return nullptr;
}

#ifdef _WIN32
std::wstring ConvertToWideString(const std::string& str) {
auto cp = GetACP();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section was copied from node_file.cc, but this GetACP is particularly important.
As shown in the description, when the Windows locale is changed and executed, if the original UTF-8 code is left here, the strings cannot be handled properly.

Copy link
Member

@lpinca lpinca Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this utility function can be moved to util-inl.h? so that the same implementation (taking into account GetACP) is used both here and in node_file.cc without duplication. I think this is also what @anonrig is requesting.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do the same for node_fs, test-fs-cp.mjs will falil...

It is very strange, but I have found that if I run the failed path(utf/~~~) in addition to the test-require-~ , it can be handled correctly.
It may be that the pre or post-process (if there is one) regarding path generation in node_fs is having an effect.

Therefore, this time I confined the implementation within node_modules instead of applying it to the whole system.

The following is the error log when applied to the entire system.
https://github.com/yamachu/node/actions/runs/12941644792/job/36098035649?pr=3#step:7:5033

code diff
yamachu@0041793

Copy link
Member

@lpinca lpinca Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about making cp a function parameter and use CP_UTF8 in node_file.cc and GetACP() here. Does it make sense?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see.
Do you mean to make these changes?

util-inl.h

#ifdef _WIN32
inline std::wstring ConvertToWideString(const std::string& str, UINT cp) {
  int size_needed = MultiByteToWideChar(
      cp, 0, &str[0], static_cast<int>(str.size()), nullptr, 0);
  std::wstring wstrTo(size_needed, 0);
  MultiByteToWideChar(
      cp, 0, &str[0], static_cast<int>(str.size()), &wstrTo[0], size_needed);
  return wstrTo;
}
#endif  // _WIN32

node_file.cc

#define BufferValueToPath(str)                                                 \
  std::filesystem::path(ConvertToWideString(str.ToString(), CP_UTF8))

I thought that this implementation would certainly have less impact and could be implemented with less code duplication.

Copy link
Member

@lpinca lpinca Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly. FWIW, I'm also ok with this as is.

Copy link
Author

@yamachu yamachu Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apply 👍 : ef0ade1

@yamachu
Copy link
Author

yamachu commented Jan 23, 2025

Copy link
Member

@lpinca lpinca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RSLGTM

@lpinca
Copy link
Member

lpinca commented Jan 23, 2025

Can you please squash the last 3 commits into the second or everything into one?

@yamachu yamachu force-pushed the fix-windows-path-string branch from 24ac7b9 to ea943d8 Compare January 24, 2025 04:07
@yamachu
Copy link
Author

yamachu commented Jan 24, 2025

I squashed and added a test case for the newly found problem.

@yamachu yamachu force-pushed the fix-windows-path-string branch from ea943d8 to 4ddd7ae Compare January 24, 2025 09:33
take a similar approach to node_file and allow the creation of paths
code point must be specified to convert from wchar_t to utf8.
@yamachu yamachu force-pushed the fix-windows-path-string branch from 4ddd7ae to ef0ade1 Compare January 24, 2025 10:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Segmentation Fault When Passing Paths with Japanese Characters to createRequire in Node.js 22+
5 participants