Update docker-compose files and test endpoints

Update to llama.cpp python: bump bindings version for AMD fixes update llama.cpp-mainline vulkan support for typescript bindings, gguf support (nomic-ai#1390) * adding some native methods to cpp wrapper * gpu seems to work * typings and add availibleGpus method * fix spelling * fix syntax * more * normalize methods to conform to py * remove extra dynamic linker deps when building with vulkan * bump python version (library linking fix) * Don't link against libvulkan. * vulkan python bindings on windows fixes * Bring the vulkan backend to the GUI. * When device is Auto (the default) then we will only consider discrete GPU's otherwise fallback to CPU. * Show the device we're currently using. * Fix up the name and formatting. * init at most one vulkan device, submodule update fixes issues w/ multiple of the same gpu * Update the submodule. * Add version 2.4.15 and bump the version number. * Fix a bug where we're not properly falling back to CPU. * Sync to a newer version of llama.cpp with bugfix for vulkan. * Report the actual device we're using. * Only show GPU when we're actually using it. * Bump to new llama with new bugfix. * Release notes for v2.4.16 and bump the version. * Fallback to CPU more robustly. * Release notes for v2.4.17 and bump the version. * Bump the Python version to python-v1.0.12 to restrict the quants that vulkan recognizes. * Link against ggml in bin so we can get the available devices without loading a model. * Send actual and requested device info for those who have opt-in. * Actually bump the version. * Release notes for v2.4.18 and bump the version. * Fix for crashes on systems where vulkan is not installed properly. * Release notes for v2.4.19 and bump the version. * fix typings and vulkan build works on win * Add flatpak manifest * Remove unnecessary stuffs from manifest * Update to 2.4.19 * appdata: update software description * Latest rebase on llama.cpp with gguf support. * macos build fixes * llamamodel: metal supports all quantization types now * gpt4all.py: GGUF * pyllmodel: print specific error message * backend: port BERT to GGUF * backend: port MPT to GGUF * backend: port Replit to GGUF * backend: use gguf branch of llama.cpp-mainline * backend: use llamamodel.cpp for StarCoder * conversion scripts: cleanup * convert scripts: load model as late as possible * convert_mpt_hf_to_gguf.py: better tokenizer decoding * backend: use llamamodel.cpp for Falcon * convert scripts: make them directly executable * fix references to removed model types * modellist: fix the system prompt * backend: port GPT-J to GGUF * gpt-j: update inference to match latest llama.cpp insights - Use F16 KV cache - Store transposed V in the cache - Avoid unnecessary Q copy Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> ggml upstream commit 0265f0813492602fec0e1159fe61de1bf0ccaf78 * chatllm: grammar fix * convert scripts: use bytes_to_unicode from transformers * convert scripts: make gptj script executable * convert scripts: add feed-forward length for better compatiblilty This GGUF key is used by all llama.cpp models with upstream support. * gptj: remove unused variables * Refactor for subgroups on mat * vec kernel. * Add q6_k kernels for vulkan. * python binding: print debug message to stderr * Fix regenerate button to be deterministic and bump the llama version to latest we have for gguf. * Bump to the latest fixes for vulkan in llama. * llamamodel: fix static vector in LLamaModel::endTokens * Switch to new models2.json for new gguf release and bump our version to 2.5.0. * Bump to latest llama/gguf branch. * chat: report reason for fallback to CPU * chat: make sure to clear fallback reason on success * more accurate fallback descriptions * differentiate between init failure and unsupported models * backend: do not use Vulkan with non-LLaMA models * Add q8_0 kernels to kompute shaders and bump to latest llama/gguf. * backend: fix build with Visual Studio generator Use the $<CONFIG> generator expression instead of CMAKE_BUILD_TYPE. This is needed because Visual Studio is a multi-configuration generator, so we do not know what the build type will be until `cmake --build` is called. Fixes nomic-ai#1470 * remove old llama.cpp submodules * Reorder and refresh our models2.json. * rebase on newer llama.cpp * python/embed4all: use gguf model, allow passing kwargs/overriding model * Add starcoder, rift and sbert to our models2.json. * Push a new version number for llmodel backend now that it is based on gguf. * fix stray comma in models2.json Signed-off-by: Aaron Miller <apage43@ninjawhale.com> * Speculative fix for build on mac. * chat: clearer CPU fallback messages * Fix crasher with an empty string for prompt template. * Update the language here to avoid misunderstanding. * added EM German Mistral Model * make codespell happy * issue template: remove "Related Components" section * cmake: install the GPT-J plugin (nomic-ai#1487) * Do not delete saved chats if we fail to serialize properly. * Restore state from text if necessary. * Another codespell attempted fix. * llmodel: do not call magic_match unless build variant is correct (nomic-ai#1488) * chatllm: do not write uninitialized data to stream (nomic-ai#1486) * mat*mat for q4_0, q8_0 * do not process prompts on gpu yet * python: support Path in GPT4All.__init__ (nomic-ai#1462) * llmodel: print an error if the CPU does not support AVX (nomic-ai#1499) * python bindings should be quiet by default * disable llama.cpp logging unless GPT4ALL_VERBOSE_LLAMACPP envvar is nonempty * make verbose flag for retrieve_model default false (but also be overridable via gpt4all constructor) should be able to run a basic test: ```python import gpt4all model = gpt4all.GPT4All('/Users/aaron/Downloads/rift-coder-v0-7b-q4_0.gguf') print(model.generate('def fib(n):')) ``` and see no non-model output when successful * python: always check status code of HTTP responses (nomic-ai#1502) * Always save chats to disk, but save them as text by default. This also changes the UI behavior to always open a 'New Chat' and setting it as current instead of setting a restored chat as current. This improves usability by not requiring the user to wait if they want to immediately start chatting. * Update README.md Signed-off-by: umarmnaq <102142660+umarmnaq@users.noreply.github.com> * fix embed4all filename https://discordapp.com/channels/1076964370942267462/1093558720690143283/1161778216462192692 Signed-off-by: Aaron Miller <apage43@ninjawhale.com> * Improves Java API signatures maintaining back compatibility * python: replace deprecated pkg_resources with importlib (nomic-ai#1505) * Updated chat wishlist (nomic-ai#1351) * q6k, q4_1 mat*mat * update mini-orca 3b to gguf2, license Signed-off-by: Aaron Miller <apage43@ninjawhale.com> * convert scripts: fix AutoConfig typo (nomic-ai#1512) * publish config https://docs.npmjs.com/cli/v9/configuring-npm/package-json#publishconfig (nomic-ai#1375) merge into my branch * fix appendBin * fix gpu not initializing first * sync up * progress, still wip on destructor * some detection work * untested dispose method * add js side of dispose * Update gpt4all-bindings/typescript/index.cc Co-authored-by: cebtenzzre <cebtenzzre@gmail.com> Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com> * Update gpt4all-bindings/typescript/index.cc Co-authored-by: cebtenzzre <cebtenzzre@gmail.com> Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com> * Update gpt4all-bindings/typescript/index.cc Co-authored-by: cebtenzzre <cebtenzzre@gmail.com> Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com> * Update gpt4all-bindings/typescript/src/gpt4all.d.ts Co-authored-by: cebtenzzre <cebtenzzre@gmail.com> Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com> * Update gpt4all-bindings/typescript/src/gpt4all.js Co-authored-by: cebtenzzre <cebtenzzre@gmail.com> Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com> * Update gpt4all-bindings/typescript/src/util.js Co-authored-by: cebtenzzre <cebtenzzre@gmail.com> Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com> * fix tests * fix circleci for nodejs * bump version --------- Signed-off-by: Aaron Miller <apage43@ninjawhale.com> Signed-off-by: umarmnaq <102142660+umarmnaq@users.noreply.github.com> Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com> Co-authored-by: Aaron Miller <apage43@ninjawhale.com> Co-authored-by: Adam Treat <treat.adam@gmail.com> Co-authored-by: Akarshan Biswas <akarshan.biswas@gmail.com> Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com> Co-authored-by: Jan Philipp Harries <jpdus@users.noreply.github.com> Co-authored-by: umarmnaq <102142660+umarmnaq@users.noreply.github.com> Co-authored-by: Alex Soto <asotobu@gmail.com> Co-authored-by: niansa/tuxifan <tuxifan@posteo.de> ts/tooling (nomic-ai#1602) Updated readme for correct install instructions (nomic-ai#1607) Co-authored-by: aj-gameon <aj@gameontechnology.com> llmodel_c: improve quality of error messages (nomic-ai#1625) Add .gguf files to .gitignore and remove unused Dockerfile argument and app/__init__.py file Delete gpt4all-api/gpt4all_api/app/api_v1/routes/__init__.py Signed-off-by: Daniel Salvatierra <dsalvat1@gmail.com> Delete gpt4all-api/test.py Signed-off-by: Daniel Salvatierra <dsalvat1@gmail.com> Delete gpt4all-api/completiontest.py Signed-off-by: Daniel Salvatierra <dsalvat1@gmail.com> Revert "Delete gpt4all-api/completiontest.py" This reverts commit 08e8eea. Revert "Delete gpt4all-api/test.py" This reverts commit 7de26be. Delete test files for local LLM development Refactor code for improved readability and performance. Delete gpt4all-api/completiontest.py Signed-off-by: Daniel Salvatierra <dsalvat1@gmail.com> Delete gpt4all-api/test.py Signed-off-by: Daniel Salvatierra <dsalvat1@gmail.com> Refactor code for improved readability and performance. Resolve Delete test batched completion function with OpenAI API.
dpsalvatierra · Nov 12, 2023 · a5baa8b · a5baa8b
1 parent bc88271
commit a5baa8b
Show file tree

Hide file tree

Showing 42 changed files with 6,032 additions and 4,438 deletions.
diff --git a/.circleci/continue_config.yml b/.circleci/continue_config.yml
@@ -856,6 +856,7 @@ jobs:
       - node/install-packages:
           app-dir: gpt4all-bindings/typescript
           pkg-manager: yarn
+          override-ci-command: yarn install
       - run:
           command: | 
             cd gpt4all-bindings/typescript
@@ -885,6 +886,7 @@ jobs:
       - node/install-packages:
           app-dir: gpt4all-bindings/typescript
           pkg-manager: yarn
+          override-ci-command: yarn install
       - run:
           command: | 
             cd gpt4all-bindings/typescript
@@ -994,7 +996,7 @@ jobs:
           command: |
             cd gpt4all-bindings/typescript
             npm set //registry.npmjs.org/:_authToken=$NPM_TOKEN
-            npm publish --access public --tag alpha
+            npm publish
 
 workflows:
   version: 2

diff --git a/.gitignore b/.gitignore
@@ -183,4 +183,7 @@ build_*
 build-*
 
 # IntelliJ
-.idea/
+.idea/
+
+# gguf files
+*.gguf
diff --git a/gpt4all-api/docker-compose.gpu.yaml b/gpt4all-api/docker-compose.gpu.yaml
@@ -8,8 +8,10 @@ services:
     environment:
       - HUGGING_FACE_HUB_TOKEN=token
       - USE_FLASH_ATTENTION=false
-      - MODEL_ID=''
+      - MODEL_ID=${EMBEDDING}
       - NUM_SHARD=1
+    env_file:
+      - ./gpt4all_api/.env
     command: --model-id $MODEL_ID --num-shard $NUM_SHARD
     volumes:
       - ./:/data

diff --git a/gpt4all-api/docker-compose.yaml b/gpt4all-api/docker-compose.yaml
@@ -7,13 +7,16 @@ services:
     restart: always #restart on error (usually code compilation from save during bad state)
     ports:
       - "4891:4891"
+    env_file:
+      - './gpt4all_api/.env'
     environment:
       - APP_ENVIRONMENT=dev
       - WEB_CONCURRENCY=2
       - LOGLEVEL=debug
       - PORT=4891
-      - model=ggml-mpt-7b-chat.bin
+      - model=${MODEL_BIN}
       - inference_mode=cpu
     volumes:
       - './gpt4all_api/app:/app'
+      - './gpt4all_api/models:/models'
     command: ["/start-reload.sh"]
diff --git a/gpt4all-api/gpt4all.code-workspace b/gpt4all-api/gpt4all.code-workspace
@@ -0,0 +1,7 @@
+{
+	"folders": [
+		{
+			"path": ".."
+		}
+	]
+}
diff --git a/gpt4all-api/gpt4all_api/Dockerfile.buildkit b/gpt4all-api/gpt4all_api/Dockerfile.buildkit
@@ -1,8 +1,6 @@
 # syntax=docker/dockerfile:1.0.0-experimental
 FROM tiangolo/uvicorn-gunicorn:python3.11
 
-ARG MODEL_BIN=ggml-mpt-7b-chat.bin
-
 # Put first so anytime this file changes other cached layers are invalidated.
 COPY gpt4all_api/requirements.txt /requirements.txt
 

diff --git a/gpt4all-api/gpt4all_api/app/__init__.py b/gpt4all-api/gpt4all_api/app/__init__.py
diff --git a/gpt4all-api/gpt4all_api/app/api_v1/__init__.py b/gpt4all-api/gpt4all_api/app/api_v1/__init__.py
diff --git a/gpt4all-api/gpt4all_api/app/api_v1/routes/__init__.py b/gpt4all-api/gpt4all_api/app/api_v1/routes/__init__.py
diff --git a/gpt4all-api/gpt4all_api/app/tests/test_endpoints.py b/gpt4all-api/gpt4all_api/app/tests/test_endpoints.py
@@ -2,24 +2,34 @@
 Use the OpenAI python API to test gpt4all models.
 """
 from typing import List, get_args
+import os
+from dotenv import load_dotenv
 
 import openai
 
 openai.api_base = "http://localhost:4891/v1"
-
 openai.api_key = "not needed for a local LLM"
 
+# Load the .env file
+env_path = 'gpt4all-api/gpt4all_api/.env'
+load_dotenv(dotenv_path=env_path)
+
+# Fetch MODEL_ID from .env file
+model_id = os.getenv('MODEL_BIN', 'default_model_id')
+embedding = os.getenv('EMBEDDING', 'default_embedding_model_id')
+print (model_id)
+print (embedding)
 
 def test_completion():
-    model = "ggml-mpt-7b-chat.bin"
+    model = model_id
     prompt = "Who is Michael Jordan?"
     response = openai.Completion.create(
         model=model, prompt=prompt, max_tokens=50, temperature=0.28, top_p=0.95, n=1, echo=True, stream=False
     )
     assert len(response['choices'][0]['text']) > len(prompt)
 
 def test_streaming_completion():
-    model = "ggml-mpt-7b-chat.bin"
+    model = model_id
     prompt = "Who is Michael Jordan?"
     tokens = []
     for resp in openai.Completion.create(
@@ -38,7 +48,7 @@ def test_streaming_completion():
 
 
 def test_batched_completion():
-    model = "ggml-mpt-7b-chat.bin"
+    model = model_id
     prompt = "Who is Michael Jordan?"
     response = openai.Completion.create(
         model=model, prompt=[prompt] * 3, max_tokens=50, temperature=0.28, top_p=0.95, n=1, echo=True, stream=False
@@ -48,12 +58,12 @@ def test_batched_completion():
 
 
 def test_embedding():
-    model = "ggml-all-MiniLM-L6-v2-f16.bin"
+    model = embedding
     prompt = "Who is Michael Jordan?"
     response = openai.Embedding.create(model=model, input=prompt)
     output = response["data"][0]["embedding"]
     args = get_args(List[float])
 
     assert response["model"] == model
     assert isinstance(output, list)
-    assert all(isinstance(x, args) for x in output)
+    assert all(isinstance(x, args) for x in output)
diff --git a/gpt4all-api/gpt4all_api/models/README.md b/gpt4all-api/gpt4all_api/models/README.md
@@ -0,0 +1 @@
+## Place your gguf models and embeddings on this folder
diff --git a/gpt4all-api/makefile b/gpt4all-api/makefile
@@ -28,19 +28,19 @@ clean_testenv:
 fresh_testenv: clean_testenv testenv
 
 venv:
-	if [ ! -d $(ROOT_DIR)/env ]; then $(PYTHON) -m venv $(ROOT_DIR)/env; fi
+	if [ ! -d $(ROOT_DIR)/venv ]; then $(PYTHON) -m venv $(ROOT_DIR)/venv; fi
 
 dependencies: venv
-	source $(ROOT_DIR)/env/bin/activate; $(PYTHON) -m pip install -r $(ROOT_DIR)/$(APP_NAME)/requirements.txt
+	source $(ROOT_DIR)/venv/bin/activate; $(PYTHON) -m pip install -r $(ROOT_DIR)/$(APP_NAME)/requirements.txt
 
 clean: clean_testenv
 	# Remove existing environment
-	rm -rf $(ROOT_DIR)/env;
+	rm -rf $(ROOT_DIR)/venv;
 	rm -rf $(ROOT_DIR)/$(APP_NAME)/*.pyc;
 
 
 black:
-	source $(ROOT_DIR)/env/bin/activate; black -l 120 -S --target-version py38 $(APP_NAME)
+	source $(ROOT_DIR)/venv/bin/activate; black -l 120 -S --target-version py38 $(APP_NAME)
 
 isort:
-	source $(ROOT_DIR)/env/bin/activate; isort  --ignore-whitespace --atomic -w 120 $(APP_NAME)
+	source $(ROOT_DIR)/venv/bin/activate; isort  --ignore-whitespace --atomic -w 120 $(APP_NAME)
diff --git a/gpt4all-backend/llama.cpp-mainline b/gpt4all-backend/llama.cpp-mainline
diff --git a/gpt4all-backend/llamamodel.cpp b/gpt4all-backend/llamamodel.cpp
@@ -385,22 +385,35 @@ DLL_EXPORT const char *get_build_variant() {
 }
 
 DLL_EXPORT bool magic_match(const char * fname) {
-
     struct ggml_context * ctx_meta = NULL;
     struct gguf_init_params params = {
         /*.no_alloc = */ true,
         /*.ctx      = */ &ctx_meta,
     };
     gguf_context *ctx_gguf = gguf_init_from_file(fname, params);
-    if (!ctx_gguf)
+    if (!ctx_gguf) {
+        std::cerr << __func__ << ": gguf_init_from_file failed\n";
         return false;
+    }
+
+    bool valid = true;
+
+    int gguf_ver = gguf_get_version(ctx_gguf);
+    if (valid && gguf_ver > 3) {
+        std::cerr << __func__ << ": unsupported gguf version: " << gguf_ver << "\n";
+        valid = false;
+    }
 
-    bool isValid = gguf_get_version(ctx_gguf) <= 3;
     auto arch = get_arch_name(ctx_gguf);
-    isValid = isValid && (arch == "llama" || arch == "starcoder" || arch == "falcon" || arch == "mpt");
+    if (valid && !(arch == "llama" || arch == "starcoder" || arch == "falcon" || arch == "mpt")) {
+        if (!(arch == "gptj" || arch == "bert")) { // we support these via other modules
+            std::cerr << __func__ << ": unsupported model architecture: " << arch << "\n";
+        }
+        valid = false;
+    }
 
     gguf_free(ctx_gguf);
-    return isValid;
+    return valid;
 }
 
 DLL_EXPORT LLModel *construct() {

diff --git a/gpt4all-backend/llmodel.cpp b/gpt4all-backend/llmodel.cpp
@@ -123,11 +123,18 @@ const std::vector<LLModel::Implementation> &LLModel::Implementation::implementat
 }
 
 const LLModel::Implementation* LLModel::Implementation::implementation(const char *fname, const std::string& buildVariant) {
+    bool buildVariantMatched = false;
     for (const auto& i : implementationList()) {
         if (buildVariant != i.m_buildVariant) continue;
+        buildVariantMatched = true;
+
         if (!i.m_magicMatch(fname)) continue;
         return &i;
     }
+
+    if (!buildVariantMatched) {
+        std::cerr << "LLModel ERROR: Could not find any implementations for build variant: " << buildVariant << "\n";
+    }
     return nullptr;
 }
 

diff --git a/gpt4all-backend/llmodel_c.cpp b/gpt4all-backend/llmodel_c.cpp
@@ -11,45 +11,33 @@ struct LLModelWrapper {
     ~LLModelWrapper() { delete llModel; }
 };
 
-
 thread_local static std::string last_error_message;
 
-
 llmodel_model llmodel_model_create(const char *model_path) {
-    auto fres = llmodel_model_create2(model_path, "auto", nullptr);
+    const char *error;
+    auto fres = llmodel_model_create2(model_path, "auto", &error);
     if (!fres) {
-        fprintf(stderr, "Invalid model file\n");
+        fprintf(stderr, "Unable to instantiate model: %s\n", error);
     }
     return fres;
 }
 
-llmodel_model llmodel_model_create2(const char *model_path, const char *build_variant, llmodel_error *error) {
+llmodel_model llmodel_model_create2(const char *model_path, const char *build_variant, const char **error) {
     auto wrapper = new LLModelWrapper;
-    int error_code = 0;
 
     try {
         wrapper->llModel = LLModel::Implementation::construct(model_path, build_variant);
+        if (!wrapper->llModel) {
+            last_error_message = "Model format not supported (no matching implementation found)";
+        }
     } catch (const std::exception& e) {
-        error_code = EINVAL;
         last_error_message = e.what();
     }
 
     if (!wrapper->llModel) {
         delete std::exchange(wrapper, nullptr);
-        // Get errno and error message if none
-        if (error_code == 0) {
-            if (errno != 0) {
-                error_code = errno;
-                last_error_message = std::strerror(error_code);
-            } else {
-                error_code = ENOTSUP;
-                last_error_message = "Model format not supported (no matching implementation found)";
-            }
-        }
-        // Set error argument
         if (error) {
-            error->message = last_error_message.c_str();
-            error->code = error_code;
+            *error = last_error_message.c_str();
         }
     }
     return reinterpret_cast<llmodel_model*>(wrapper);

diff --git a/gpt4all-backend/llmodel_c.h b/gpt4all-backend/llmodel_c.h
@@ -23,17 +23,6 @@ extern "C" {
  */
 typedef void *llmodel_model;
 
-/**
- * Structure containing any errors that may eventually occur
- */
-struct llmodel_error {
-    const char *message;  // Human readable error description; Thread-local; guaranteed to survive until next llmodel C API call
-    int code;             // errno; 0 if none
-};
-#ifndef __cplusplus
-typedef struct llmodel_error llmodel_error;
-#endif
-
 /**
  * llmodel_prompt_context structure for holding the prompt context.
  * NOTE: The implementation takes care of all the memory handling of the raw logits pointer and the
@@ -105,10 +94,10 @@ DEPRECATED llmodel_model llmodel_model_create(const char *model_path);
  * Recognises correct model type from file at model_path
  * @param model_path A string representing the path to the model file; will only be used to detect model type.
  * @param build_variant A string representing the implementation to use (auto, default, avxonly, ...),
- * @param error A pointer to a llmodel_error; will only be set on error.
+ * @param error A pointer to a string; will only be set on error.
  * @return A pointer to the llmodel_model instance; NULL on error.
  */
-llmodel_model llmodel_model_create2(const char *model_path, const char *build_variant, llmodel_error *error);
+llmodel_model llmodel_model_create2(const char *model_path, const char *build_variant, const char **error);
 
 /**
  * Destroy a llmodel instance.

diff --git a/gpt4all-bindings/golang/Makefile b/gpt4all-bindings/golang/Makefile
@@ -139,7 +139,7 @@ $(info I CXX:      $(CXXV))
 $(info )
 
 llmodel.o:
-	mkdir buildllm
+	[ -e buildllm ] || mkdir buildllm
 	cd buildllm && cmake ../../../gpt4all-backend/ $(CMAKEFLAGS) && make
 	cd buildllm && cp -rf CMakeFiles/llmodel.dir/llmodel_c.cpp.o ../llmodel_c.o
 	cd buildllm && cp -rf CMakeFiles/llmodel.dir/llmodel.cpp.o ../llmodel.o
@@ -150,7 +150,7 @@ clean:
 	rm -rf buildllm
 	rm -rf example/main
 
-binding.o: 
+binding.o: binding.cpp binding.h
 	$(CXX) $(CXXFLAGS) binding.cpp -o binding.o -c $(LDFLAGS)
 
 libgpt4all.a: binding.o llmodel.o

diff --git a/gpt4all-bindings/golang/binding.cpp b/gpt4all-bindings/golang/binding.cpp
@@ -17,11 +17,10 @@
 
 void* load_model(const char *fname, int n_threads) {
     // load the model
-    llmodel_error new_error{};
+    const char *new_error;
     auto model = llmodel_model_create2(fname, "auto", &new_error);
-    if (model == nullptr ){
-        fprintf(stderr, "%s: error '%s'\n",
-                __func__, new_error.message);
+    if (model == nullptr) {
+        fprintf(stderr, "%s: error '%s'\n", __func__, new_error);
         return nullptr;
     }
     if (!llmodel_loadModel(model, fname)) {

diff --git a/gpt4all-bindings/golang/placeholder b/gpt4all-bindings/golang/placeholder
diff --git a/gpt4all-bindings/java/src/main/java/com/hexadevlabs/gpt4all/LLModel.java b/gpt4all-bindings/java/src/main/java/com/hexadevlabs/gpt4all/LLModel.java
@@ -1,6 +1,7 @@
 package com.hexadevlabs.gpt4all;
 
 import jnr.ffi.Pointer;
+import jnr.ffi.byref.PointerByReference;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
@@ -176,7 +177,7 @@ public LLModel(Path modelPath) {
         modelName = modelPath.getFileName().toString();
         String modelPathAbs = modelPath.toAbsolutePath().toString();
 
-        LLModelLibrary.LLModelError error = new LLModelLibrary.LLModelError(jnr.ffi.Runtime.getSystemRuntime());
+        PointerByReference error = new PointerByReference();
 
         // Check if model file exists
         if(!Files.exists(modelPath)){
@@ -192,7 +193,7 @@ public LLModel(Path modelPath) {
         model = library.llmodel_model_create2(modelPathAbs, "auto", error);
 
         if(model == null) {
-            throw new IllegalStateException("Could not load, gpt4all backend returned error: " + error.message);
+            throw new IllegalStateException("Could not load, gpt4all backend returned error: " + error.getValue().getString(0));
         }
         library.llmodel_loadModel(model, modelPathAbs);
 
@@ -631,4 +632,4 @@ public void close() throws Exception {
         library.llmodel_model_destroy(model);
     }
 
-}
+}
-Original file line number
+Diff line change
@@ Expand Up / @@ -183,4 +183,7 @@ build_* @@
     build-*
     # IntelliJ
-    .idea/
+    .idea/
+    # gguf files
+    *.gguf
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		## Place your gguf models and embeddings on this folder