src: simplify base64 decoding in the lenient case #56905

lemire · 2025-02-03T16:47:21Z

The simdutf library can be instructed to ignore invalid characters, which suggests that we may be able to simplify some of our C++ code.

nodejs-github-bot · 2025-02-03T16:50:33Z

CI: https://ci.nodejs.org/job/node-test-pull-request/64958/

anonrig · 2025-02-03T17:56:52Z

@lemire it seems the tests are failing

codecov · 2025-02-03T18:21:16Z

Codecov Report

Attention: Patch coverage is 25.00000% with 6 lines in your changes missing coverage. Please review.

Project coverage is 89.07%. Comparing base (6857dbc) to head (516aaee).
Report is 21 commits behind head on main.

Files with missing lines	Patch %	Lines
src/string_bytes.cc	25.00%	4 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #56905      +/-   ##
==========================================
+ Coverage   88.70%   89.07%   +0.37%     
==========================================
  Files         663      665       +2     
  Lines      192012   192586     +574     
  Branches    36613    37014     +401     
==========================================
+ Hits       170317   171543    +1226     
+ Misses      14520    13838     -682     
- Partials     7175     7205      +30

Files with missing lines	Coverage Δ
src/string_bytes.cc	`65.96% <25.00%> (-2.48%)`	⬇️

... and 116 files with indirect coverage changes

lemire · 2025-02-03T18:51:16Z

I am going to drop this PR because, as I feared, the expectation is that Buffer.from('put anything here', 'base64')==Buffer.from('put anything here', 'base64url'). That is, even though we have two distinct encoding superficially, there are treated the same.

So it is not simply that Node.js does not validate, but it is also that it does not distinguish (at all) between 'base64url' and 'base64'. This seems to contradict the documentation:

'base64': Base64 encoding. When creating a Buffer from a string, this encoding will also correctly accept "URL and Filename Safe Alphabet" as specified in RFC 4648, Section 5. Whitespace characters such as spaces, tabs, and new lines contained within the base64-encoded string are ignored.
'base64url': base64url encoding as specified in RFC 4648, Section 5. When creating a Buffer from a string, this encoding will also correctly accept regular base64-encoded strings. When encoding a Buffer to a string, this encoding will omit padding.

Somewhat ironically, here is what RFC 4648 section 5 says...

This encoding may be referred to as "base64url". This encoding should not be regarded as the same as the "base64" encoding and should not be referred to as only "base64". Unless clarified otherwise, "base64" refers to the base 64 in the previous section.

What Node.js does is to confuse the two encodings. Currently, the simdutf does not do that: it treats base64url and base64 as two different encodings (as per RFC 4648).

So we need the current fallback code and my PR is not going to work.

Note that Node.js can decode invalid sequences in interesting ways...

Buffer.from('__-+_--_--/_', 'base64url') == Buffer.from('__--_--_--__', 'base64url') == Buffer.from('__--_--_--__', 'base64') == Buffer.from('__-+_--_--/_', 'base64')

That's definitively not in RFC 4648.

src: simplify base64 decoding in the lenient case

516aaee

lemire requested review from targos and anonrig February 3, 2025 16:47

nodejs-github-bot added buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. labels Feb 3, 2025

lemire added request-ci Add this label to start a Jenkins CI on a PR. and removed buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. labels Feb 3, 2025

github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Feb 3, 2025

anonrig approved these changes Feb 3, 2025

View reviewed changes

lemire closed this Feb 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src: simplify base64 decoding in the lenient case #56905

src: simplify base64 decoding in the lenient case #56905

lemire commented Feb 3, 2025

nodejs-github-bot commented Feb 3, 2025

anonrig commented Feb 3, 2025

codecov bot commented Feb 3, 2025 •

edited

Loading

lemire commented Feb 3, 2025

src: simplify base64 decoding in the lenient case #56905

src: simplify base64 decoding in the lenient case #56905

Conversation

lemire commented Feb 3, 2025

nodejs-github-bot commented Feb 3, 2025

anonrig commented Feb 3, 2025

codecov bot commented Feb 3, 2025 • edited Loading

Codecov Report

lemire commented Feb 3, 2025

codecov bot commented Feb 3, 2025 •

edited

Loading