Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vulkan support for typescript bindings, gguf support #1390

Merged
merged 138 commits into from
Nov 1, 2023
Merged
Show file tree
Hide file tree
Changes from 136 commits
Commits
Show all changes
138 commits
Select commit Hold shift + click to select a range
8585ff4
adding some native methods to cpp wrapper
jacoobes Sep 2, 2023
dc4b704
gpu seems to work
jacoobes Sep 2, 2023
94c217c
typings and add availibleGpus method
jacoobes Sep 2, 2023
0487394
fix spelling
jacoobes Sep 2, 2023
ab72ef2
fix syntax
jacoobes Sep 2, 2023
57cc5c3
more
jacoobes Sep 2, 2023
efde701
normalize methods to conform to py
jacoobes Sep 4, 2023
463d9cb
remove extra dynamic linker deps when building with vulkan
apage43 Sep 5, 2023
73ff1c4
bump python version (library linking fix)
apage43 Sep 11, 2023
71e2000
Don't link against libvulkan.
manyoso Sep 12, 2023
e5b0d2d
vulkan python bindings on windows fixes
apage43 Sep 12, 2023
be83035
Bring the vulkan backend to the GUI.
manyoso Sep 13, 2023
246ba22
When device is Auto (the default) then we will only consider discrete…
manyoso Sep 13, 2023
74b4800
Show the device we're currently using.
manyoso Sep 13, 2023
ea41e60
Fix up the name and formatting.
manyoso Sep 13, 2023
299dabe
init at most one vulkan device, submodule update
apage43 Sep 13, 2023
2a913ca
Update the submodule.
manyoso Sep 13, 2023
ce9f64e
Add version 2.4.15 and bump the version number.
manyoso Sep 13, 2023
0a19cef
Fix a bug where we're not properly falling back to CPU.
manyoso Sep 13, 2023
11e459e
Sync to a newer version of llama.cpp with bugfix for vulkan.
manyoso Sep 14, 2023
3c5b5f0
Report the actual device we're using.
manyoso Sep 14, 2023
6eb6f23
Only show GPU when we're actually using it.
manyoso Sep 14, 2023
780da62
Bump to new llama with new bugfix.
manyoso Sep 14, 2023
b63c162
Release notes for v2.4.16 and bump the version.
manyoso Sep 14, 2023
635b40d
Fallback to CPU more robustly.
manyoso Sep 14, 2023
81bdcc7
Release notes for v2.4.17 and bump the version.
manyoso Sep 14, 2023
4570660
Bump the Python version to python-v1.0.12 to restrict the quants that…
manyoso Sep 15, 2023
3c9acad
Link against ggml in bin so we can get the available devices without …
manyoso Sep 15, 2023
d713c4c
Send actual and requested device info for those who have opt-in.
manyoso Sep 16, 2023
ce51f82
Actually bump the version.
manyoso Sep 16, 2023
7f46228
Release notes for v2.4.18 and bump the version.
manyoso Sep 16, 2023
6ab97c4
Fix for crashes on systems where vulkan is not installed properly.
manyoso Sep 16, 2023
1e5d52f
Release notes for v2.4.19 and bump the version.
manyoso Sep 16, 2023
af28bd0
Merge branch 'main' into feat(ts)/gpu
jacoobes Sep 16, 2023
3dde1d9
fix typings and vulkan build works on win
jacoobes Sep 16, 2023
281c271
Merge branch 'main' into feat(ts)/gpu
jacoobes Sep 28, 2023
e5ad622
Add flatpak manifest
qnixsynapse Sep 15, 2023
c0740f3
Remove unnecessary stuffs from manifest
qnixsynapse Sep 15, 2023
0cd1aaa
Update to 2.4.19
qnixsynapse Sep 17, 2023
ae76c49
appdata: update software description
qnixsynapse Oct 4, 2023
cd77172
Latest rebase on llama.cpp with gguf support.
manyoso Sep 21, 2023
b31054d
macos build fixes
apage43 Sep 26, 2023
19e0789
llamamodel: metal supports all quantization types now
cebtenzzre Sep 25, 2023
bfbae71
gpt4all.py: GGUF
cebtenzzre Sep 26, 2023
274c296
pyllmodel: print specific error message
cebtenzzre Sep 26, 2023
e0b6eb6
backend: port BERT to GGUF
cebtenzzre Sep 25, 2023
be328d3
backend: port MPT to GGUF
cebtenzzre Sep 28, 2023
021166f
backend: port Replit to GGUF
cebtenzzre Sep 28, 2023
0a61fa7
backend: use gguf branch of llama.cpp-mainline
cebtenzzre Sep 28, 2023
a8b714b
backend: use llamamodel.cpp for StarCoder
cebtenzzre Sep 28, 2023
b50db3d
conversion scripts: cleanup
cebtenzzre Sep 28, 2023
d804031
convert scripts: load model as late as possible
cebtenzzre Sep 29, 2023
c6777ab
convert_mpt_hf_to_gguf.py: better tokenizer decoding
cebtenzzre Sep 29, 2023
ecf5945
backend: use llamamodel.cpp for Falcon
cebtenzzre Sep 29, 2023
8d508b0
convert scripts: make them directly executable
cebtenzzre Sep 29, 2023
b972ed0
fix references to removed model types
cebtenzzre Sep 29, 2023
83e350d
modellist: fix the system prompt
cebtenzzre Sep 25, 2023
8f711bb
backend: port GPT-J to GGUF
cebtenzzre Sep 28, 2023
cb4abc8
gpt-j: update inference to match latest llama.cpp insights
cebtenzzre Sep 29, 2023
ac42296
chatllm: grammar fix
cebtenzzre Sep 29, 2023
78e8ec7
convert scripts: use bytes_to_unicode from transformers
cebtenzzre Sep 29, 2023
0baf34c
convert scripts: make gptj script executable
cebtenzzre Sep 29, 2023
63c3a01
convert scripts: add feed-forward length for better compatiblilty
cebtenzzre Sep 30, 2023
882b140
gptj: remove unused variables
cebtenzzre Oct 2, 2023
3b25bbb
Refactor for subgroups on mat * vec kernel.
manyoso Sep 26, 2023
1728f63
Add q6_k kernels for vulkan.
manyoso Oct 2, 2023
1771aca
python binding: print debug message to stderr
cebtenzzre Oct 3, 2023
2c8d21c
Fix regenerate button to be deterministic and bump the llama version …
manyoso Oct 3, 2023
fb511ff
Bump to the latest fixes for vulkan in llama.
manyoso Oct 4, 2023
107bb5a
llamamodel: fix static vector in LLamaModel::endTokens
cebtenzzre Oct 4, 2023
4a58ff9
Switch to new models2.json for new gguf release and bump our version to
manyoso Oct 5, 2023
46bcb00
Bump to latest llama/gguf branch.
manyoso Oct 5, 2023
09a7a67
chat: report reason for fallback to CPU
cebtenzzre Sep 29, 2023
ba0f9ce
chat: make sure to clear fallback reason on success
cebtenzzre Oct 2, 2023
1e7b888
more accurate fallback descriptions
cebtenzzre Oct 4, 2023
6653f76
differentiate between init failure and unsupported models
cebtenzzre Oct 4, 2023
9ae6a14
backend: do not use Vulkan with non-LLaMA models
cebtenzzre Oct 4, 2023
7c35f2f
Add q8_0 kernels to kompute shaders and bump to latest llama/gguf.
manyoso Oct 5, 2023
6d8aa80
backend: fix build with Visual Studio generator
cebtenzzre Oct 5, 2023
eeb8a03
remove old llama.cpp submodules
cebtenzzre Oct 5, 2023
c9d581d
Reorder and refresh our models2.json.
manyoso Oct 5, 2023
b942c88
rebase on newer llama.cpp
cebtenzzre Oct 5, 2023
43ddd10
python/embed4all: use gguf model, allow passing kwargs/overriding model
apage43 Oct 5, 2023
bf845b3
Add starcoder, rift and sbert to our models2.json.
manyoso Oct 5, 2023
a7b2935
Push a new version number for llmodel backend now that it is based on…
manyoso Oct 5, 2023
7da7a08
fix stray comma in models2.json
apage43 Oct 5, 2023
5a39ed4
Speculative fix for build on mac.
manyoso Oct 5, 2023
f2be23a
chat: clearer CPU fallback messages
cebtenzzre Oct 6, 2023
6479d6f
Fix crasher with an empty string for prompt template.
manyoso Oct 6, 2023
0841dba
Update the language here to avoid misunderstanding.
manyoso Oct 6, 2023
310c44f
added EM German Mistral Model
Oct 9, 2023
961ba07
make codespell happy
apage43 Oct 10, 2023
12b4d79
issue template: remove "Related Components" section
cebtenzzre Oct 10, 2023
0f2e52d
cmake: install the GPT-J plugin (#1487)
cebtenzzre Oct 10, 2023
0753a84
Do not delete saved chats if we fail to serialize properly.
manyoso Oct 7, 2023
14074ee
Restore state from text if necessary.
manyoso Oct 10, 2023
3a035fd
Another codespell attempted fix.
manyoso Oct 11, 2023
91d6d6a
llmodel: do not call magic_match unless build variant is correct (#1488)
cebtenzzre Oct 11, 2023
3ae6569
chatllm: do not write uninitialized data to stream (#1486)
cebtenzzre Oct 11, 2023
e3bf811
mat*mat for q4_0, q8_0
apage43 Oct 11, 2023
d0bb7e1
do not process prompts on gpu yet
apage43 Oct 11, 2023
4d74fbe
python: support Path in GPT4All.__init__ (#1462)
cebtenzzre Oct 11, 2023
adbed54
llmodel: print an error if the CPU does not support AVX (#1499)
cebtenzzre Oct 11, 2023
6ca4d93
python bindings should be quiet by default
apage43 Oct 10, 2023
228802b
python: always check status code of HTTP responses (#1502)
cebtenzzre Oct 11, 2023
9e9842b
Always save chats to disk, but save them as text by default. This als…
manyoso Oct 11, 2023
4db5975
Update README.md
agi-dude Sep 21, 2023
10dd89f
fix embed4all filename
apage43 Oct 11, 2023
d5a5afe
Improves Java API signatures maintaining back compatibility
lordofthejars Aug 25, 2023
460f503
python: replace deprecated pkg_resources with importlib (#1505)
cebtenzzre Oct 12, 2023
59b0962
Updated chat wishlist (#1351)
niansa Oct 12, 2023
f2159c1
q6k, q4_1 mat*mat
apage43 Oct 11, 2023
b485f89
update mini-orca 3b to gguf2, license
apage43 Oct 12, 2023
238e8f4
convert scripts: fix AutoConfig typo (#1512)
cebtenzzre Oct 13, 2023
6e46f10
Merge branch 'main' into feat(ts)/gpu
jacoobes Oct 16, 2023
53b4a46
publish config https://docs.npmjs.com/cli/v9/configuring-npm/package-…
jacoobes Oct 19, 2023
044ba62
merge
jacoobes Oct 19, 2023
834403a
Merge branch 'main' into feat(ts)/gpu
jacoobes Oct 19, 2023
3b48de9
fix appendBin
jacoobes Oct 20, 2023
0b0cf65
fix gpu not initializing first
jacoobes Oct 20, 2023
90c6c27
sync up
jacoobes Oct 20, 2023
ce6fb7c
progress, still wip on destructor
jacoobes Oct 20, 2023
e129be4
some detection work
jacoobes Oct 21, 2023
177172f
merge
jacoobes Oct 25, 2023
1721cdb
untested dispose method
jacoobes Oct 25, 2023
2fda274
add js side of dispose
jacoobes Oct 25, 2023
c94fb9f
Merge branch 'main' into feat(ts)/gpu
jacoobes Oct 29, 2023
b878bea
Merge branch 'main' into feat(ts)/gpu
jacoobes Oct 31, 2023
150095b
Update gpt4all-bindings/typescript/index.cc
jacoobes Oct 31, 2023
b2679cb
Update gpt4all-bindings/typescript/index.cc
jacoobes Oct 31, 2023
3b8a17f
Update gpt4all-bindings/typescript/index.cc
jacoobes Oct 31, 2023
354455c
Update gpt4all-bindings/typescript/src/gpt4all.d.ts
jacoobes Oct 31, 2023
a6a29b4
Update gpt4all-bindings/typescript/src/gpt4all.js
jacoobes Oct 31, 2023
88d8e53
Update gpt4all-bindings/typescript/src/util.js
jacoobes Nov 1, 2023
68e83d2
fix tests
jacoobes Nov 1, 2023
a206269
Merge branch 'main' into feat(ts)/gpu
jacoobes Nov 1, 2023
74e242b
fix circleci for nodejs
jacoobes Nov 1, 2023
661b522
bump version
jacoobes Nov 1, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .circleci/continue_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -994,7 +994,7 @@ jobs:
command: |
cd gpt4all-bindings/typescript
npm set //registry.npmjs.org/:_authToken=$NPM_TOKEN
npm publish --access public --tag alpha
npm publish

workflows:
version: 2
Expand Down
1 change: 1 addition & 0 deletions gpt4all-bindings/typescript/.yarnrc.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
nodeLinker: node-modules
3 changes: 0 additions & 3 deletions gpt4all-bindings/typescript/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,15 +75,12 @@ cd gpt4all-bindings/typescript
```sh
yarn
```

* llama.cpp git submodule for gpt4all can be possibly absent. If this is the case, make sure to run in llama.cpp parent directory

```sh
git submodule update --init --depth 1 --recursive
```

**AS OF NEW BACKEND** to build the backend,

```sh
yarn build:backend
```
Expand Down
136 changes: 121 additions & 15 deletions gpt4all-bindings/typescript/index.cc
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
#include "index.h"

Napi::FunctionReference NodeModelWrapper::constructor;

Napi::Function NodeModelWrapper::GetClass(Napi::Env env) {
Napi::Function self = DefineClass(env, "LLModel", {
Expand All @@ -13,14 +12,64 @@ Napi::Function NodeModelWrapper::GetClass(Napi::Env env) {
InstanceMethod("embed", &NodeModelWrapper::GenerateEmbedding),
InstanceMethod("threadCount", &NodeModelWrapper::ThreadCount),
InstanceMethod("getLibraryPath", &NodeModelWrapper::GetLibraryPath),
InstanceMethod("initGpuByString", &NodeModelWrapper::InitGpuByString),
InstanceMethod("hasGpuDevice", &NodeModelWrapper::HasGpuDevice),
InstanceMethod("listGpu", &NodeModelWrapper::GetGpuDevices),
InstanceMethod("memoryNeeded", &NodeModelWrapper::GetRequiredMemory),
InstanceMethod("dispose", &NodeModelWrapper::Dispose)
});
// Keep a static reference to the constructor
//
constructor = Napi::Persistent(self);
constructor.SuppressDestruct();
Napi::FunctionReference* constructor = new Napi::FunctionReference();
*constructor = Napi::Persistent(self);
env.SetInstanceData(constructor);
return self;
}
Napi::Value NodeModelWrapper::GetRequiredMemory(const Napi::CallbackInfo& info)
{
auto env = info.Env();
return Napi::Number::New(env, static_cast<uint32_t>( llmodel_required_mem(GetInference(), full_model_path.c_str()) ));

}
Napi::Value NodeModelWrapper::GetGpuDevices(const Napi::CallbackInfo& info)
{
auto env = info.Env();
int num_devices = 0;
auto mem_size = llmodel_required_mem(GetInference(), full_model_path.c_str());
llmodel_gpu_device* all_devices = llmodel_available_gpu_devices(GetInference(), mem_size, &num_devices);
if(all_devices == nullptr) {
Napi::Error::New(
env,
"Unable to retrieve list of all GPU devices"
).ThrowAsJavaScriptException();
return env.Undefined();
}
auto js_array = Napi::Array::New(env, num_devices);
for(int i = 0; i < num_devices; ++i) {
auto gpu_device = all_devices[i];
/*
*
* struct llmodel_gpu_device {
int index = 0;
int type = 0; // same as VkPhysicalDeviceType
size_t heapSize = 0;
const char * name;
const char * vendor;
};
*
*/
Napi::Object js_gpu_device = Napi::Object::New(env);
js_gpu_device["index"] = uint32_t(gpu_device.index);
js_gpu_device["type"] = uint32_t(gpu_device.type);
js_gpu_device["heapSize"] = static_cast<uint32_t>( gpu_device.heapSize );
js_gpu_device["name"]= gpu_device.name;
js_gpu_device["vendor"] = gpu_device.vendor;

js_array[i] = js_gpu_device;
}
return js_array;
}

Napi::Value NodeModelWrapper::getType(const Napi::CallbackInfo& info)
{
if(type.empty()) {
Expand All @@ -29,15 +78,41 @@ Napi::Function NodeModelWrapper::GetClass(Napi::Env env) {
return Napi::String::New(info.Env(), type);
}

Napi::Value NodeModelWrapper::InitGpuByString(const Napi::CallbackInfo& info)
{
auto env = info.Env();
uint32_t memory_required = info[0].As<Napi::Number>();

std::string gpu_device_identifier = info[1].As<Napi::String>();

size_t converted_value;
if(memory_required <= std::numeric_limits<size_t>::max()) {
converted_value = static_cast<size_t>(memory_required);
} else {
Napi::Error::New(
env,
"invalid number for memory size. Exceeded bounds for memory."
).ThrowAsJavaScriptException();
return env.Undefined();
}

auto result = llmodel_gpu_init_gpu_device_by_string(GetInference(), converted_value, gpu_device_identifier.c_str());
return Napi::Boolean::New(env, result);
}
Napi::Value NodeModelWrapper::HasGpuDevice(const Napi::CallbackInfo& info)
{
return Napi::Boolean::New(info.Env(), llmodel_has_gpu_device(GetInference()));
}

NodeModelWrapper::NodeModelWrapper(const Napi::CallbackInfo& info) : Napi::ObjectWrap<NodeModelWrapper>(info)
{
auto env = info.Env();
fs::path model_path;

std::string full_weight_path;
//todo
std::string library_path = ".";
std::string model_name;
std::string full_weight_path,
library_path = ".",
model_name,
device;
if(info[0].IsString()) {
model_path = info[0].As<Napi::String>().Utf8Value();
full_weight_path = model_path.string();
Expand All @@ -56,13 +131,14 @@ Napi::Function NodeModelWrapper::GetClass(Napi::Env env) {
} else {
library_path = ".";
}
device = config_object.Get("device").As<Napi::String>();
}
llmodel_set_implementation_search_path(library_path.c_str());
llmodel_error e = {
.message="looks good to me",
.code=0,
};
inference_ = std::make_shared<llmodel_model>(llmodel_model_create2(full_weight_path.c_str(), "auto", &e));
inference_ = llmodel_model_create2(full_weight_path.c_str(), "auto", &e);
if(e.code != 0) {
Napi::Error::New(env, e.message).ThrowAsJavaScriptException();
return;
Expand All @@ -74,18 +150,45 @@ Napi::Function NodeModelWrapper::GetClass(Napi::Env env) {
Napi::Error::New(env, "Had an issue creating llmodel object, inference is null").ThrowAsJavaScriptException();
return;
}
if(device != "cpu") {
size_t mem = llmodel_required_mem(GetInference(), full_weight_path.c_str());
if(mem == 0) {
std::cout << "WARNING: no memory needed. does this model support gpu?\n";
}
std::cout << "Initiating GPU\n";
std::cout << "Memory required estimation: " << mem << "\n";

auto success = llmodel_gpu_init_gpu_device_by_string(GetInference(), mem, device.c_str());
if(success) {
std::cout << "GPU init successfully\n";
} else {
std::cout << "WARNING: Failed to init GPU\n";
}
}

auto success = llmodel_loadModel(GetInference(), full_weight_path.c_str());
if(!success) {
Napi::Error::New(env, "Failed to load model at given path").ThrowAsJavaScriptException();
return;
}

name = model_name.empty() ? model_path.filename().string() : model_name;
full_model_path = full_weight_path;
};
//NodeModelWrapper::~NodeModelWrapper() {
//GetInference().reset();
//}

// NodeModelWrapper::~NodeModelWrapper() {
// if(GetInference() != nullptr) {
// std::cout << "Debug: deleting model\n";
// llmodel_model_destroy(inference_);
// std::cout << (inference_ == nullptr);
// }
// }
// void NodeModelWrapper::Finalize(Napi::Env env) {
// if(inference_ != nullptr) {
// std::cout << "Debug: deleting model\n";
//
// }
// }
Napi::Value NodeModelWrapper::IsModelLoaded(const Napi::CallbackInfo& info) {
return Napi::Boolean::New(info.Env(), llmodel_isModelLoaded(GetInference()));
}
Expand Down Expand Up @@ -193,8 +296,9 @@ Napi::Function NodeModelWrapper::GetClass(Napi::Env env) {
std::string copiedQuestion = question;
PromptWorkContext pc = {
copiedQuestion,
std::ref(inference_),
inference_,
copiedPrompt,
""
};
auto threadSafeContext = new TsfnContext(env, pc);
threadSafeContext->tsfn = Napi::ThreadSafeFunction::New(
Expand All @@ -210,7 +314,9 @@ Napi::Function NodeModelWrapper::GetClass(Napi::Env env) {
threadSafeContext->nativeThread = std::thread(threadEntry, threadSafeContext);
return threadSafeContext->deferred_.Promise();
}

void NodeModelWrapper::Dispose(const Napi::CallbackInfo& info) {
llmodel_model_destroy(inference_);
}
void NodeModelWrapper::SetThreadCount(const Napi::CallbackInfo& info) {
if(info[0].IsNumber()) {
llmodel_setThreadCount(GetInference(), info[0].As<Napi::Number>().Int64Value());
Expand All @@ -233,7 +339,7 @@ Napi::Function NodeModelWrapper::GetClass(Napi::Env env) {
}

llmodel_model NodeModelWrapper::GetInference() {
return *inference_;
return inference_;
}

//Exports Bindings
Expand Down
15 changes: 12 additions & 3 deletions gpt4all-bindings/typescript/index.h
Original file line number Diff line number Diff line change
Expand Up @@ -6,24 +6,33 @@
#include <atomic>
#include <memory>
#include <filesystem>
#include <set>
namespace fs = std::filesystem;


class NodeModelWrapper: public Napi::ObjectWrap<NodeModelWrapper> {
public:
NodeModelWrapper(const Napi::CallbackInfo &);
//~NodeModelWrapper();
//virtual ~NodeModelWrapper();
Napi::Value getType(const Napi::CallbackInfo& info);
Napi::Value IsModelLoaded(const Napi::CallbackInfo& info);
Napi::Value StateSize(const Napi::CallbackInfo& info);
//void Finalize(Napi::Env env) override;
/**
* Prompting the model. This entails spawning a new thread and adding the response tokens
* into a thread local string variable.
*/
Napi::Value Prompt(const Napi::CallbackInfo& info);
void SetThreadCount(const Napi::CallbackInfo& info);
void Dispose(const Napi::CallbackInfo& info);
Napi::Value getName(const Napi::CallbackInfo& info);
Napi::Value ThreadCount(const Napi::CallbackInfo& info);
Napi::Value GenerateEmbedding(const Napi::CallbackInfo& info);
Napi::Value HasGpuDevice(const Napi::CallbackInfo& info);
Napi::Value ListGpus(const Napi::CallbackInfo& info);
Napi::Value InitGpuByString(const Napi::CallbackInfo& info);
Napi::Value GetRequiredMemory(const Napi::CallbackInfo& info);
Napi::Value GetGpuDevices(const Napi::CallbackInfo& info);
/*
* The path that is used to search for the dynamic libraries
*/
Expand All @@ -37,10 +46,10 @@ class NodeModelWrapper: public Napi::ObjectWrap<NodeModelWrapper> {
/**
* The underlying inference that interfaces with the C interface
*/
std::shared_ptr<llmodel_model> inference_;
llmodel_model inference_;

std::string type;
// corresponds to LLModel::name() in typescript
std::string name;
static Napi::FunctionReference constructor;
std::string full_model_path;
};
5 changes: 5 additions & 0 deletions gpt4all-bindings/typescript/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -47,5 +47,10 @@
},
"jest": {
"verbose": true
},
"publishConfig": {
"registry": "https://registry.npmjs.org/",
"access": "public",
"tag": "latest"
}
}
2 changes: 1 addition & 1 deletion gpt4all-bindings/typescript/prompt.cc
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ void threadEntry(TsfnContext* context) {
context->tsfn.BlockingCall(&context->pc,
[](Napi::Env env, Napi::Function jsCallback, PromptWorkContext* pc) {
llmodel_prompt(
*pc->inference_,
pc->inference_,
pc->question.c_str(),
&prompt_callback,
&response_callback,
Expand Down
2 changes: 1 addition & 1 deletion gpt4all-bindings/typescript/prompt.h
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
#include <memory>
struct PromptWorkContext {
std::string question;
std::shared_ptr<llmodel_model>& inference_;
llmodel_model inference_;
llmodel_prompt_context prompt_params;
std::string res;

Expand Down
16 changes: 10 additions & 6 deletions gpt4all-bindings/typescript/spec/chat.mjs
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
import { LLModel, createCompletion, DEFAULT_DIRECTORY, DEFAULT_LIBRARIES_DIRECTORY, loadModel } from '../src/gpt4all.js'

const model = await loadModel(
'orca-mini-3b-gguf2-q4_0.gguf',
{ verbose: true }
'mistral-7b-openorca.Q4_0.gguf',
{ verbose: true, device: 'gpu' }
);
const ll = model.llm;

Expand All @@ -26,7 +26,9 @@ console.log("name " + ll.name());
console.log("type: " + ll.type());
console.log("Default directory for models", DEFAULT_DIRECTORY);
console.log("Default directory for libraries", DEFAULT_LIBRARIES_DIRECTORY);

console.log("Has GPU", ll.hasGpuDevice());
console.log("gpu devices", ll.listGpu())
console.log("Required Mem in bytes", ll.memoryNeeded())
const completion1 = await createCompletion(model, [
{ role : 'system', content: 'You are an advanced mathematician.' },
{ role : 'user', content: 'What is 1 + 1?' },
Expand All @@ -40,23 +42,25 @@ const completion2 = await createCompletion(model, [

console.log(completion2.choices[0].message)

//CALLING DISPOSE WILL INVALID THE NATIVE MODEL. USE THIS TO CLEANUP
model.dispose()
// At the moment, from testing this code, concurrent model prompting is not possible.
// Behavior: The last prompt gets answered, but the rest are cancelled
// my experience with threading is not the best, so if anyone who is good is willing to give this a shot,
// maybe this is possible
// INFO: threading with llama.cpp is not the best maybe not even possible, so this will be left here as reference

//const responses = await Promise.all([
// createCompletion(ll, [
// createCompletion(model, [
// { role : 'system', content: 'You are an advanced mathematician.' },
// { role : 'user', content: 'What is 1 + 1?' },
// ], { verbose: true }),
// createCompletion(ll, [
// createCompletion(model, [
// { role : 'system', content: 'You are an advanced mathematician.' },
// { role : 'user', content: 'What is 1 + 1?' },
// ], { verbose: true }),
//
//createCompletion(ll, [
//createCompletion(model, [
// { role : 'system', content: 'You are an advanced mathematician.' },
// { role : 'user', content: 'What is 1 + 1?' },
//], { verbose: true })
Expand Down
8 changes: 3 additions & 5 deletions gpt4all-bindings/typescript/spec/embed.mjs
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
import { loadModel, createEmbedding } from '../src/gpt4all.js'
import { loadModel, createEmbedding } from '../src/gpt4all.js'

const embedder = await loadModel("ggml-all-MiniLM-L6-v2-f16.bin", { verbose: true })
const embedder = await loadModel("ggml-all-MiniLM-L6-v2-f16.bin", { verbose: true, type: 'embedding'})

console.log(
createEmbedding(embedder, "Accept your current situation")
)
console.log(createEmbedding(embedder, "Accept your current situation"))

Loading