-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPCC-29914 Add embedded wasm support #17570
HPCC-29914 Add embedded wasm support #17570
Conversation
https://track.hpccsystems.com/browse/HPCC-29914 |
a633050
to
5c3ee22
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GordonSmith some initial comments. A few general themes
i) Need to be careful about types, and integer overflow.
ii) unicode is tricky - especially sizes of utf8 strings. I didn't check all the code, but some looks wrong.
iii) I didn't see any thread protection. E.g., hash table access/modification and other items need to be thread safe. I didn't check if there were any problems.
for (int i = 0; i < nbytes; i++) | ||
{ | ||
int b = data[ptr + i]; | ||
if (i == 3 && is_signed && b >= 0x80) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i == 3 or i == nbytes-1? Is it a 1/2/3 byte int?
{ | ||
b -= 0x100; | ||
} | ||
result += b << (i * 8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left shifting a -ve signed number is undefined behaviour. It would work if b was unsigned.
return result; | ||
} | ||
|
||
std::string global_encoding = "utf8"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a constant? There doesn't seem to be any way to change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is constant for now - but will be configurable in a future version of wasmtime
return result; | ||
} | ||
|
||
std::string global_encoding = "utf8"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Problematic having a variable defined in a header - you will get duplicate definitions.
if (global_encoding.compare("utf8") == 0) | ||
{ | ||
alignment = 1; | ||
byte_length = tagged_code_units; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not convinced about this. if tagged code units is the number of code points, then the byte length could be from tagged_code_units to 4*tagged_code_units depending on the characters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They do distinguish between tagged_code_points and tagged_code_units, so this should be good.
{ | ||
auto instanceItr = wasmInstances.find(wasmName); | ||
if (instanceItr == wasmInstances.end()) | ||
throw std::runtime_error("Wasm instance not found: " + wasmName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
careful about all throws...
virtual void bindUTF8Param(const char *name, size32_t chars, const char *val) | ||
{ | ||
TRACE("bindUTF8Param %s %d %s", name, chars, val); | ||
auto memIdxVar = wasmEngine->callRealloc(wasmName, {0, 0, 1, (int32_t)chars}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to calculate the size from the length. rtlUtf8Size(chars, val)
{ | ||
TRACE("getSignedResult"); | ||
if (results[0].kind() == wasmtime::ValKind::I64) | ||
return (int32_t)results[0].i64(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cast removes the top bits
{ | ||
TRACE("getUnsignedResult"); | ||
if (results[0].kind() == wasmtime::ValKind::I64) | ||
return (int32_t)results[0].i64(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
TRACE("getUnsignedResult"); | ||
if (results[0].kind() == wasmtime::ValKind::I64) | ||
return (int32_t)results[0].i64(); | ||
return results[0].i32(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably best to cast to unsigned first - otherwise it is not clear if it is signed extended
3a5ddb9
to
22ec93e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Various random comments. A few comments from before are still relevant.
|
||
int align_to(int ptr, int alignment) | ||
{ | ||
return std::ceil(ptr / alignment) * alignment; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what this is trying to do, but std::ceil isn't right. If trying to round down use
(ptr / alignment * alignment)
or more efficient assuming a power of 2:
ptr & ~(alignment-1)
Also, the parameters need to be more strictly typed. Are they uint32_t?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isAligned(x, y) would be return (x & (y-1)) == 0;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking in https://github.com/WebAssembly/component-model/blob/5a11e36e43244130a71919b7cdd82139e5fe65b3/design/mvp/canonical-abi/definitions.py outside of the tests, it is used to find the next aligned ptr given a current ptr - so am inclined to leave as is for now and look at optimisation once the abi is complete?
{ | ||
uint8_t b = data[ptr + i]; | ||
if (i == nbytes - 1 && std::is_signed<T>::value && b >= 0x80) | ||
b -= 0x100; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has no effect since b is a single byte. Much clearer to negate the value outside the loop.
uint8_t b = data[ptr + i]; | ||
if (i == nbytes - 1 && std::is_signed<T>::value && b >= 0x80) | ||
b -= 0x100; | ||
retVal += b << (i * 8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an integer overflow (undefined behaviour) for 8 byte -ve numbers.
// More: Not currently available from the wasmtime::context object, see https://github.com/bytecodealliance/wasmtime/issues/6719 | ||
std::string global_encoding = "utf8"; | ||
|
||
std::pair<uint32_t /*ptr*/, uint32_t /*byte length*/> load_string_from_range(const wasmtime::Span<uint8_t> &data, uint32_t ptr, uint32_t tagged_code_units) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should have a comment to clarify what tagged_code_units is. I assume it is not the number of codepoints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add a comment, but a "code unit" (and a "code point") are apparently the "correct" terminology to distinguish between a char and a byte: https://unicode.org/glossary/
|
||
std::pair<uint32_t /*ptr*/, uint32_t /*byte length*/> load_string_from_range(const wasmtime::Span<uint8_t> &data, uint32_t ptr, uint32_t tagged_code_units) | ||
{ | ||
std::string encoding = "utf-8"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor. 4 byte indentation is used throughout the rest of the platform code.
TRACE("bindVStringParam %s %s", name, val); | ||
bindStringParam(name, strlen(val), val); | ||
} | ||
virtual void bindUTF8Param(const char *name, size32_t code_points, const char *val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: mixture of multi-word casing conventions. code_points/code_units and elemType, totalBytes. camel case is generally used in most of the platform.
virtual void bindUTF8Param(const char *name, size32_t code_points, const char *val) | ||
{ | ||
TRACE("bindUTF8Param %s %d %s", name, code_points, val); | ||
auto code_units = rtlUtf8Size(code_points, val); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size (or bytes) is clearer than _units. e.g. val_size (or sizeVal)
uint32_t strPtr; | ||
uint32_t code_units; | ||
std::tie(strPtr, code_units) = load_string(data, ptr); | ||
__chars = rtlUtf8Length(code_units, &data[strPtr]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__chars is not a standard name for the result parameter
plugins/wasmembed/wasmembed.cpp
Outdated
} | ||
void manifestPaths(ICodeContext *codeCtx) | ||
{ | ||
if (codeCtx && !manifestAdded) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be thread safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope - refactored away
plugins/wasmembed/wasmembed.cpp
Outdated
|
||
MODULE_EXIT() | ||
{ | ||
kill(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could do with a clearer name - hard to tell from the context what it does. (I didn't search to find it.)
ed298a8
to
d2e4d48
Compare
d2e4d48
to
3d56d73
Compare
3f5b428
to
f707a3e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GordonSmith a few comments relating to c++ details, but broadly looks good.
plugins/wasmembed/CMakeLists.txt
Outdated
@@ -0,0 +1,67 @@ | |||
project(wasmembed) | |||
|
|||
set(CMAKE_CXX_STANDARD 20) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this going to cause problems on any platforms? Should this be inside the if(WASEMBED) test instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should only be local to the wasmembed project, but I can relocate anyway
plugins/wasmembed/abi.cpp
Outdated
#include <sstream> | ||
#include <vector> | ||
|
||
auto UTF16_TAG = 1 << 31; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically undefined behaviour. Would need to be 1U << 31. i.e. an unsigned value.
plugins/wasmembed/abi.cpp
Outdated
|
||
uint32_t align_to(uint32_t ptr, uint32_t alignment) | ||
{ | ||
return std::ceil(ptr / alignment) * alignment; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would avoid using a floating point function in an integer context like this. Assuming alignment is a power of two use
(ptr + alignment -1) & ~(alignment - 1);
Also, the name of the function doesn't make it clear it is rounding up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree WRT to the naming, but want to keep the names in sync with the official ABI spec.
plugins/wasmembed/abi.cpp
Outdated
auto nbytes = sizeof(retVal); | ||
for (int i = 0; i < nbytes; ++i) | ||
{ | ||
uint8_t b = data[ptr + i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use an unsigned type of the correct type, otherwise b << (i * 8) will promote b to an integer, and then you will get an integer overflow error. You need to use unsigned types of the right size to do operations like this.
plugins/wasmembed/abi.cpp
Outdated
} | ||
if (std::is_signed<T>::value) | ||
{ | ||
retVal += (data[ptr + nbytes - 1] & 0x80) ? -1 << (8 * nbytes) : 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also going to have integer overflow problems.
plugins/wasmembed/util.cpp
Outdated
|
||
std::pair<std::string, std::string> splitQualifiedID(const std::string &qualifiedName) | ||
{ | ||
std::istringstream iss(qualifiedName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
efficiency: std::istringstream is a fairly heavy-weight way to solve this problem. Simpler to use find to search for a '.' and then create substrings (similar to the function above).
plugins/wasmembed/secure-enclave.cpp
Outdated
|
||
// #define ENABLE_TRACE | ||
#ifdef ENABLE_TRACE | ||
#define TRACE(format, ...) DBGLOG(format, ##__VA_ARGS__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is non standard (only gcc). See https://en.cppreference.com/w/cpp/preprocessor/replace
I think #define TRACE(...) DBGLOG(VA_ARGS)
may work, but I'm not 100% sure. Would need some testing (or to use the c++20 extension.)
plugins/wasmembed/secure-enclave.cpp
Outdated
return map.find(key) != map.end(); | ||
} | ||
|
||
void for_each(std::function<void(const K &key, const V &value)> func) const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit of a mix of naming conventions: insertIfMissing and for_each, but maybe makes sense since latter is like a c++ std function.
plugins/wasmembed/secure-enclave.cpp
Outdated
|
||
wasmtime::Linker linker(engine); | ||
linker.define_wasi().unwrap(); | ||
TRACE("WASM SE resolveModule4 %s", wasmName.c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect quite a lot of this tracing could be removed as unlikely to be useful in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree...
plugins/wasmembed/secure-enclave.cpp
Outdated
uint32_t strPtr; | ||
uint32_t bytes; | ||
std::tie(strPtr, bytes) = load_string(data, ptr); | ||
rtlStrToStrX(chars, result, bytes, reinterpret_cast<const char *>(&data[strPtr])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to convert from utf8 to a string.
2d1597a
to
b0c4755
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GordonSmith looks close. A few final comments
plugins/wasmembed/abi.cpp
Outdated
@@ -124,12 +129,13 @@ T load_int(const wasmtime::Span<uint8_t> &data, int32_t ptr) | |||
auto nbytes = sizeof(retVal); | |||
for (int i = 0; i < nbytes; ++i) | |||
{ | |||
uint8_t b = data[ptr + i]; | |||
std::make_unsigned_t<T> b = data[ptr + i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
retVal will also need to be unsigned, otherwise you can get integer overflow
plugins/wasmembed/abi.cpp
Outdated
retVal += b << (i * 8); | ||
} | ||
if (std::is_signed<T>::value) | ||
if (std::is_signed<T>::value && data[ptr + nbytes - 1] & 0x80) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There also needs to be a check that nbytes != sizeof(T), because value << 8 * sizeof(T) is undefined behaviour.
There should also be a comment
//if the top bit is set then sign extend the result by setting the top bits.
NOTE: Running in debug on linux will throw errors when undefined behaviour is spotted.
plugins/wasmembed/secure-enclave.cpp
Outdated
@@ -551,7 +542,7 @@ class SecureFunction : public CInterfaceOf<IEmbedFunctionContext> | |||
uint32_t strPtr; | |||
uint32_t bytes; | |||
std::tie(strPtr, bytes) = load_string(data, ptr); | |||
rtlStrToStrX(chars, result, bytes, reinterpret_cast<const char *>(&data[strPtr])); | |||
rtlUtf8ToStrX(chars, result, bytes, reinterpret_cast<const char *>(&data[strPtr])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
careful about difference between length/codepoints and size (I wish it wasn't this way!)
size32_t codepoints = rtlUtfLength(bytes, strPtr);
rtlUtf8ToStrX(chars, result, codePoints, reinterpret_cast<const char *>(&data[strPtr]));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GordonSmith I suspect this is close to ready to merge. I think it might be worth a walkthrough with me and someone else to look at and address any remaining issues.
plugins/wasmembed/secure-enclave.cpp
Outdated
TRACE("WasmEngine loadWasmFiles"); | ||
IEngineContext *engine = codeCtx->queryEngineContext(); | ||
if (!engine) | ||
throw makeStringException(100, "Faile to get engine context"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: failed
throw makeStringException(100, "Faile to get engine context"); | ||
|
||
StringArray manifestModules; | ||
engine->getManifestFiles("wasm", manifestModules); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs rethinking, although not for this PR. Currently it extracts to a file, but much better would be to return the contents as binary data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add ticket to support multiple manifests loading diff wasm files.
Especially for Roxie.
|
||
void registerInstance(const std::string &wasmName) | ||
{ | ||
TRACE("WASM SE registerInstance %s", wasmName.c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there thread safety issues here? Currently thread local, so no, but will that cause other issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe so, this PR reworked the engine / store threading specifically to match the threaded examples in the wasmtime repo (engine is a global and store / store context is per thread).
@@ -266,29 +267,22 @@ class WasmEngine | |||
return found->second.data(store.context()); | |||
} | |||
}; | |||
static std::unique_ptr<WasmEngine> wasmEngine; | |||
thread_local std::unique_ptr<WasmStore> wasmStore = std::make_unique<WasmStore>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential problems with code like this on osx and efficiency of thread_local variables with non-trivial constructors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure how the lifetime of threading works within thor jobs - but I suspect this must be more efficient than a lock on the call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. If I remember correctly it means every access of the variable needs to check if it is initialised, and if not call the constructor. We deliberately removed some thread_local variables like this because they were generating terrible code. I'll see if I can remember where/why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open new ticket for efficiency access to wasm store: Move std::make_unique<WasmStore>();
into secure function constructor (testing if it exists first)
std::tie(strPtr, bytes) = load_string(data, ptr); | ||
rtlUtf8ToStrX(chars, result, bytes, reinterpret_cast<const char *>(&data[strPtr])); | ||
std::tie(strPtr, encoding, bytes) = load_string(data, ptr); | ||
size32_t codepoints = rtlUtf8Length(bytes, &data[strPtr]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should encoding be being used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not currently as everything is always a utf8 - this is more to keep in sync with official ABI and in the future when other encodings are supported.
@@ -614,55 +597,27 @@ class SecureFunction : public CInterfaceOf<IEmbedFunctionContext> | |||
virtual void compileEmbeddedScript(size32_t lenChars, const char *_utf) override | |||
{ | |||
TRACE("WASM SE compileEmbeddedScript"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this fail?
throw makeStringException(100, "Faile to get engine context"); | ||
|
||
StringArray manifestModules; | ||
engine->getManifestFiles("wasm", manifestModules); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add ticket to support multiple manifests loading diff wasm files.
Especially for Roxie.
@@ -266,29 +267,22 @@ class WasmEngine | |||
return found->second.data(store.context()); | |||
} | |||
}; | |||
static std::unique_ptr<WasmEngine> wasmEngine; | |||
thread_local std::unique_ptr<WasmStore> wasmStore = std::make_unique<WasmStore>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open new ticket for efficiency access to wasm store: Move std::make_unique<WasmStore>();
into secure function constructor (testing if it exists first)
@ghalliday last couple changes as per todays meeting. |
https://track.hpccsystems.com/browse/HPCC-29914 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GordonSmith please squash. I'll then quickly scan and merge if I don't spot anything.
f62abce
to
7a50f81
Compare
Squashed and rebased - will re-review sometime this week once the tests succeed |
7a50f81
to
7b47d4d
Compare
@ghalliday - removed some obsolete files and should be ready for pull |
Signed-off-by: Gordon Smith <GordonJSmith@gmail.com>
7b47d4d
to
0820f74
Compare
@ghalliday rebased |
7094fa3
into
hpcc-systems:candidate-9.4.x
Type of change:
Checklist:
Smoketest:
Testing: