Skip to content

Commit

Permalink
Adding file IO to SHA256 WASM program
Browse files Browse the repository at this point in the history
  • Loading branch information
ChrisWhealy committed Jan 30, 2025
1 parent 9ff7c61 commit c02e063
Show file tree
Hide file tree
Showing 20 changed files with 829 additions and 4 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ _site
.sass-cache
.jekyll-cache
.jekyll-metadata
.idea/
Gemfile.lock
vendor
.DS_Store
.obsidian
49 changes: 49 additions & 0 deletions _posts/2025-01-30-sha256-extended.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
---
layout: post
title: "Adding File I/O to the SHA256 Hash Algorithm Implemented in WebAssembly Text"
date: 2025-01-30 12:00:00 +0000
category: chriswhealy
author: Chris Whealy
excerpt: My original WebAssembly Text implementation of the SHA256 algorithm focused only on the core functionality. The WAT program has now been extended to include all file I/O.
---

# Prerequisites

Before diving into this blog, please check that the following prerequisites have been met.

1. Are you comfortable writing directly in WebAssembly Text? (Be honest)

If the answer is "No", then please read my [Introduction to WebAssembly Text](/chriswhealy/Introduction%20to%20WebAssembly%20Text)

2. Install [`wasmtime`](https://wasmtime.dev/).
This is an Open Source project by the Bytecode Alliance that provides both the WebAssembly development tools we will be using, and the WebAssembly System Interface (WASI) that will be the focus of our attention in this blog.

3. In order to understand how to code against the WASI interface, it is very helpful to look at the Rust source code that implements the WASI functions you will be calling from your WebAssembly Text program.

This code can be found in the `wasmtime` GitHub repo <https://github.com/bytecodealliance/wasmtime>.
The specific file to look in is `crates/wasi-preview1-component-adapter/src/lib.rs`

# Explanation of Update

The original version of this program focused only on implementation the core the SHA256 algorithm.
This was a good starting point, but it meant that the JavaScript wrapper used to invoke the WASM module had to read the file from disk, then write it to memory in the special format required by the SHA256 algorithm.

The purpose of this update therefore is to significantly reduce the degree of coupling between the JavaScript wrapper and the underlying WASM module by moving all the file IO into WebAssembly.

# Overview of Steps

1. [Getting Started](/chriswhealy/sha256-extended/00-getting-started/README.md)
2. [Start WASI](/chriswhealy/sha256-extended/10-start-wasi/README.md)
3. [Import WASI Functions into WebAssembly](/chriswhealy/sha256-extended/20-import-wasi/README.md)
4. [Count the Command Line Arguments](/chriswhealy/sha256-extended/30-count-cmd-line-args/README.md)
5. [Extract the filename from the command line arguments](/chriswhealy/sha256-extended/40-parse-cmd-line-args/README.md)
6. [Open the file](/chriswhealy/sha256-extended/50-open-file/README.md)
7. [Read the File Size](/chriswhealy/sha256-extended/60-read-file-size/README.md)
8. [Do We Have Enough Memory?](/chriswhealy/sha256-extended/70-grow-memory/README.md)
9. [Read the file into memory](/chriswhealy/sha256-extended/80-read-file/README.md)
10. [Close the file](/chriswhealy/sha256-extended/90-close-file/README.md)

# Extras

1. [WebAssembly Coding Tips and Tricks](/chriswhealy/sha256-extended/wat_tips_and_tricks/README.md)
2. [Debugging WASM](/chriswhealy/sha256-extended/debugging_wasm/README.md)
13 changes: 9 additions & 4 deletions chriswhealy/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,18 @@

[@LogaRhythm](https://twitter.com/LogaRhythm)

As a backend developer and technical specialist, my favourite languages are Rust, WebAssembly and Kotlin
As a backend developer and technical specialist, my favourite languages are Rust, WebAssembly and Kotlin.
I have also used a lot of TypeScript on Node.js and more JavaScript than I can remember...

* Open Source Contributor and Job Seeker
* Jan 2024 -
* Backend Developer and Technical Specialist: [Lighthouse Consulting](https://lighthouse.no)
* March 2023 -
* Senior Engineer: Red Badger
* March 2023 - Dec 2023
* Senior Engineer: [Red Badger](https://red-badger.com)
* April 2020 - Dec 2022
* Technical Specialist: [SAP](https://sap.com)
* May 1995 - May 2019

Outside work I play the drums, do audio engineering/live streaming/post production/FoH and tinker with room acoustics (but not all at the same time...)
Outside work, I play the drums, do audio engineering/live streaming/post production/FoH and tinker with room acoustics (but not all at the same time...)

I also act as the UK technical facilitator for Neal Morse's [Morsefest London](https://www.facebook.com/watch/?v=582818961004815) concerts.
43 changes: 43 additions & 0 deletions chriswhealy/sha256-extended/00-getting-started/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Getting Started With WASI File IO

If a WebAssembly program needs to interact with the filesystem, it can only do so using the interface provided by WASI.
In addition to this, the level of file system interaction provided by WASI exists at a lower level than you might be used to when working in a high level language such as Python or JavaScript.

## Understanding WebAssembly Sandboxing

All WebAssembly programs run within their own isolated sandbox.
This not only prevents the WebAssembly program from damaging areas of memory that belong to other programs, but it also prevents the WASM program from making inappropriate operating system calls such as accessing the filesystem or the network.

However, there are many situations in which it is perfectly appropriate for a WebAssembly program to interact with the operating system.
In this case, our WebAssembly program has a legitimate need to read a file from disk.

This is one of the areas in which WASI bridges the gap between the isolated world of WebAssembly and the "outside world", so to speak.

## File IO from WebAssembly

Broadly speaking, the following steps are needed for a WebAssembly program to read a file into memory:

1. Obtain a file descriptor to the target directory
2. Call `path_open` to open the file
3. Discover how large the file is by calling `fd_seek` and reading to the end of the file
4. Call `fd_seek` a second time to reset the file IO pointer back to the start of the file
5. Based on the size of the file, it may be necessary to allocate more WebAssembly memory by calling `memory.grow`
6. Now that we know we have sufficient space, call `fd_read` to read the file into memory
7. Finally, close the file by calling `fd_close`

The key point to understand here, is that WebAssembly ***cannot*** create its own file descriptors.
This step ***must*** be done for it by the WASI interface running in the host environment.

This means that the host environment retains complete control over the files a WebAssembly program is permitted to access.

## Standard File Descriptors

A file descriptor is simply a small integer that acts as a handle to some object in the file system: typically a file or a directory.
When a WebAssembly program starts, the host environment automatically makes three file descriptors available to it.

* fd `0` = Standard in (`stdin`)
* fd `1` = Standard out (`stdout`)
* fd `2` = Standard error (`stderr`)

Any other files needed by the WebAssembly program will be identified by file descriptor `3` and higher.
It is usual (but not a requirement) for file descriptors to be created in sequential order.
122 changes: 122 additions & 0 deletions chriswhealy/sha256-extended/10-start-wasi/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# Step 1: Start WASI in the Host Environment

***IMPORTANT***<br>
The steps described here need to happen in parallel with steps described in the next chapter.

## 1.1) Create a WASI Instance

In our case, the host environment is a JavaScript program running within Node.js.[^1]

Inside a JavaScript module (that is a `.mjs` file), a WASI instance can be created using code similar to this:

```javascript
import { WASI } from "wasi"

const wasi = new WASI({
args: process.argv,
version: "unstable",
preopens: { ".": process.cwd() }, // This directory is available as fd 3 when calling WASI path_open
})
```

In addition to creating a WASI instance, this code does two other important things:

1. The line `args: process.argv` makes the entire Node.js command line available to the WebAssembly module
2. The value of the `preopens` property is an object containing one or more directories that WASI will preopen on behalf of WebAssembly

The property names within the object passed to `preopens` are the directory names as seen by WebAssembly.

The property values are the directories to which we are granting WebAssembly access.

In this case, we are granting WebAssembly access to read files in (or beneath) the directory in which we start Node.js (that is, the relative path `"."`).

## 1.2) Understanding WASI Prerequisites

The above coding is all well and good, but it will not work unless your WebAssembly module has fulfilled certain prerequisites imposed by WASI.

1. There are two ways memory can be shared between a WebAssembly module and the host environment that started it; either:
* The WASM modules allocates some memory then shares it with the host environment using an `export` statement, or
* The host environment allocates some memory then allows the WebAssembly module to access it via an `import` statement.

WASI requires you to use the first option.
The WebAssembly module is required to allocate some memory, then export it using the specific name `memory`.

So in our WAT coding, we must have a statement much like this:

```wat
(memory $memory (export "memory") 2)
```

2. WASI also expects the WebAssembly module to export a function called `_start`.
In the host environment, you must start your WASI instance by calling `wasi.start()`, and this in turn, automatically invokes the WebAssembly function `_start`.

***IMPORTANT***<br>
If such a function does not exist, then an exception will be thrown.

If you have no need for a `_start` function, then simply declare it as a no-op function like this:

```wat
(func (export "_start"))
```

However, in our case, the `_start` function is needed because this is where we will implement the functionality to parse the command line arguments.

## 1.3) Instantiate the WebAssembly Module

We now create an instance of the WebAssembly module by calling `WebAssembly.instatiate`

The first argument is the contents of the `.wasm` file stored as a `Uint8Array`.

The second argument is a host environment object.
As long as the WebAssembly module knows about the property names, the properties in the environment object can have any names you like.

In this case, we are using the property name `wasi` and setting its value to be the entire set of operating system functions exposed via the `wasiImports` object.

```javascript
let { instance } = await WebAssembly.instantiate(
new Uint8Array(readFileSync(pathToWasmMod)),
{
wasi: wasi.wasiImport
},
)
```

This grants the WebAssembly module access to all the operating system calls listed in the `.wasiImport` object.

## 1.4) Use WASI to Start the WebAssembly Module Instance

After we have waited for the `instance` to be created, the last step is to use `wasi` to start the WASM module:

```javascript
wasi.start(instance)
```

Since this whole process is asynchronous, the finished JavaScript module to start the WebAssembly module will look much like this:

```javascript
import { readFileSync } from "fs"
import { WASI } from "wasi"

const startWasm =
async pathToWasmMod => {
// Define WASI environment
const wasi = new WASI({
args: process.argv,
version: "unstable",
preopens: { ".": process.cwd() }, // This directory is available as fd 3 when calling WASI path_open
})

let { instance } = await WebAssembly.instantiate(
new Uint8Array(readFileSync(pathToWasmMod)),
{
wasi: wasi.wasiImport,
},
)

wasi.start(instance)
}

await startWasm("./bin/sha256_opt.wasm")
```

[^1]: In Node.js versions 18 and higher, the WASI interface is available by default. In versions from 12 to 16, WASI will only be available if you start `node` with the flag `--experimental-wasi-unstable-preview1`
37 changes: 37 additions & 0 deletions chriswhealy/sha256-extended/20-import-wasi/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Step 2: Import WASI Functions Into WebAssembly

In parallel with the instructions in step 1, you should also add the following statements to your WebAssembly Text program.

## 2.1) Declare Function Signature Types

Although it is not a requirement, it is more idiomatic to declare WebAssembly function signature types.
This also improves code reusability.

Both of the following sets of declarations must occur right at the start of the WAT coding.

```wat
(module
;; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
;; Function types for WASI calls
(type $type_wasi_args (func (param i32 i32) (result i32)))
(type $type_wasi_path_open (func (param i32 i32 i32 i32 i32 i64 i64 i32 i32) (result i32)))
(type $type_wasi_fd_seek (func (param i32 i64 i32 i32) (result i32)))
(type $type_wasi_fd_io (func (param i32 i32 i32 i32) (result i32)))
(type $type_wasi_fd_close (func (param i32) (result i32)))
```

Once the function signature types have been declared, you can then declare which WASI functions need to be imported into your WAT program.

```wat
;; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
;; Import OS system calls via WASI
(import "wasi" "args_sizes_get" (func $wasi_args_sizes_get (type $type_wasi_args)))
(import "wasi" "args_get" (func $wasi_args_get (type $type_wasi_args)))
(import "wasi" "path_open" (func $wasi_path_open (type $type_wasi_path_open)))
(import "wasi" "fd_seek" (func $wasi_fd_seek (type $type_wasi_fd_seek)))
(import "wasi" "fd_read" (func $wasi_fd_read (type $type_wasi_fd_io)))
(import "wasi" "fd_write" (func $wasi_fd_write (type $type_wasi_fd_io)))
(import "wasi" "fd_close" (func $wasi_fd_close (type $type_wasi_fd_close)))
```

The WAT `(import)` statement creates a proxy function within WebAssembly (E.G. `$wasi_fd_seek`) that is mapped to an external WASI function (E.G. `wasi.fd_seek`).
72 changes: 72 additions & 0 deletions chriswhealy/sha256-extended/30-count-cmd-line-args/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Step 3: Count the Command Line Arguments

One of the properties in the host environment object is called `args` and has the value `process.argv`.
This makes the entire command line received by NodeJS available to the WASM module.

In the WASM function `_start`, we must first determine how many arguments we have received by calling the WASI function `args_sizes_get`.

This function is imported into WebAssembly at the start of the module and is known internally as `$wasi_args_sizes_get`:

```wat
(type $type_wasi_args (func (param i32 i32) (result i32)))
(import "wasi" "args_sizes_get" (func $wasi_args_sizes_get (type $type_wasi_args)))
```

## 3.1) Take a look at the Rust `wasmtime` implementation

In order to understand how to interact with this function, it is helpful to look at the Rust coding that implements the WASI function [`args_sizes_get()`](https://github.com/bytecodealliance/wasmtime/blob/06377eb08a649619cc8ac9a934cb3f119017f3ef/crates/wasi-preview1-component-adapter/src/lib.rs#L506)

Here, you see the following Rust function signature:

```rust
pub unsafe extern "C" fn args_sizes_get(argc: *mut Size, argv_buf_size: *mut Size) -> Errno
```

If this function call is successful, you get back an error number of `0` that can be ignored by calling `drop`.

## 3.2) Call `args_sizes_get`

Whenever you call a WASI function, you will (almost always) need to pass one or more pointers; however, to avoid hardcoding memory addresses into function calls, the following global pointers have been declared:

```wat
(global $ARGS_COUNT_PTR i32 (i32.const 0x000004c0))
(global $ARGV_BUF_SIZE_PTR i32 (i32.const 0x000004c4))
```

Then, when we call WASI functions, we always reference these global values:

The WASI function then performs its processing and returns information to the calling program by writing that data to the memory locations identified by the pointers.

```wat
;; How many command line args have we received?
(call $wasi_args_sizes_get (global.get $ARGS_COUNT_PTR) (global.get $ARGV_BUF_SIZE_PTR))
drop
```

The actual return value of the function call is used only for error handling.
Here, we have assumed that `args_sizes_get` always gives a return code of zero, so we arbitrarily `drop` the value left on the stack.

![Calling `args_sizes_get`](/chriswhealy/sha256-extended/img/args_sizes_get.png)

We store the values returned by WASI by loading the `i32` values found at the addresses stored in these pointers:

```wat
;; Remember the argument count and the total length of arguments
(local.set $argc (i32.load (global.get $ARGS_COUNT_PTR)))
(local.set $argv_buf_size (i32.load (global.get $ARGV_BUF_SIZE_PTR)))
```

For this command line:

```bash
node sha256sum.mjs ./tests/war_and_peace.txt
```

We get back the value `3` for `argc`; however, the value returned for `argv_buf_size` is much longer than the string value shown above would lead us to believe.
This is because the program name `node` and file name `sha256sum.mjs` have both been expanded to their fully qualified names.

Hence, the value of `argv_buf_size` is actually `0x83` (131 characters)

***IMPORTANT***<br>
The string length of 131 also includes a null terminator character (`0x00`) at the end of each argument.
This must be accounted for when calculating argument lengths.
Loading

0 comments on commit c02e063

Please sign in to comment.