Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor archiving to go through single determinstic function #310

Merged
merged 1 commit into from
Feb 13, 2025

Conversation

lencioni
Copy link
Contributor

We had three places that were interfacing with the archiver package, and they all need to do basically the same thing--create a deterministic zip file. If the zip file is nondeterministic, then downstream problems with caching will arise. Since creating a deterministic zip requires some special handling, I decided that it would be best to refactor this code to all run through the same function that guarantees a deterministic output.

While working on this I noticed that the remoteRunner was using archiver's directory function, which I believe can by nondeterministic, so this should help to stabilize those zip files.

Comment on lines 69 to +71
'1x1.jpg', // this is in the root because the css is always served from the root on happo workers
'one.jpg',
'inlineResources/1x1.png',
'one.jpg',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order changes here because we are now sorting alphabetically by path at the end, and o is after i.

if (isDir) {
const files = await fs.promises.readdir(dirOrFile, {
withFileTypes: true,
recursive: true,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Node 18 adds support for recursive: true here, so we can use that and simplify our directory scanning.

for (const file of contentToArchiveSorted) {
archive.append(file.content, {
name: file.name,
prefix: '',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't have prefix when using archive.append before, but since we are using it with archive.file I figured we might as well add it here for consistency.

});

return deterministicArchive([], files.filter(Boolean));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are generating an archive with a file structure that is different than what is on the filesystem, we need to use the second argument for everything so we can control the paths more specifically.

resolve(pathToZipFile);
});
archive.pipe(output);
archive.directory(dir, false, { date: FILE_CREATION_DATE });
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe archive.directory may be nondeterministic, so this should be a good improvement for folks using remote runner.

Comment on lines -339 to -340
'static/',
'static/media/',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We no longer add directory entries to this zip file, just files.

@lencioni lencioni force-pushed the readdir-recursive branch 2 times, most recently from 32c7ae8 to 37e8689 Compare February 13, 2025 17:45
We had three places that were interfacing with the archiver package, and
they all need to do basically the same thing--create a deterministic zip
file. If the zip file is nondeterministic, then downstream problems with
caching will arise. Since creating a deterministic zip requires some
special handling, I decided that it would be best to refactor this code
to all run through the same function that guarantees a deterministic
output.

While working on this I noticed that the remoteRunner was using
archiver's directory function, which I believe can by nondeterministic,
so this should help to stabilize those zip files.
@lencioni lencioni merged commit f4bade3 into main Feb 13, 2025
3 checks passed
@lencioni lencioni deleted the readdir-recursive branch February 13, 2025 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant