diff --git a/README.md b/README.md index 3150ea4..43807e2 100644 --- a/README.md +++ b/README.md @@ -1,23 +1,32 @@ -# Ollama Grid Search and A/B Testing Desktop App. +# Ollama Grid Search: Instantly Evaluate Multiple LLMs and Prompts. -A Rust based tool to evaluate LLM models, prompts and model params. - -(Issues with Llama3? Please read [this](https://github.com/dezoito/ollama-grid-search/issues/8)). - -## Purpose This project automates the process of selecting the best models, prompts, or inference parameters for a given use-case, allowing you to iterate over their combinations and to visually inspect the results. It assumes [Ollama](https://www.ollama.ai) is installed and serving endpoints, either in `localhost` or in a remote server. -## Quick Example - -Here's a test for a simple prompt, tested on 2 models, using `0.7` and `1.0` as values for `temperature`: +Here's what an experiment for a simple prompt, tested on 3 different models, looks like: [Main Screenshot](./screenshots/main.png?raw=true) (For a more in-depth look at an evaluation process assisted by this tool, please check https://dezoito.github.io/2023/12/27/rust-ollama-grid-search.html). + +## Table of Contents + +- [Installation](#installation) +- [Features](#features) +- [Grid Search Concept](#grid-search-or-something-similar) +- [A/B Testing](#ab-testing) +- [Prompt Archive](#prompt-archive) +- [Experiment Logs](#experiment-logs) +- [Future Features](#future-features) +- [Contributing](#contributing) +- [Development](#development) +- [Citations](#citations) +- [Acknowledgements](#thank-you) + + ## Installation Check the [releases page](https://github.com/dezoito/ollama-grid-search/releases) for the project, or on the sidebar. @@ -25,7 +34,7 @@ Check the [releases page](https://github.com/dezoito/ollama-grid-search/releases ## Features - Automatically fetches models from local or remote Ollama servers; -- Iterates over different models, prompts and parameters to generate inferences; +- Iterates over multiple different models, prompts and parameters to generate inferences; - A/B test different prompts on several models simultaneously; - Allows multiple iterations for each combination of parameters; - Allows [limited concurrency](https://dezoito.github.io/2024/03/21/react-limited-concurrency.html) **or** synchronous inference calls (to prevent spamming servers); @@ -36,9 +45,11 @@ Check the [releases page](https://github.com/dezoito/ollama-grid-search/releases - Experiments can be inspected in readable views; - Re-run past experiments, cloning or modifying the parameters used in the past; - Configurable inference timeout; -- Custom default parameters and system prompts can be defined in settings: +- Custom default parameters and system prompts can be defined in settings +- Fully functional prompt database with examples; +- Prompts can be selected and "autocompleted" by typing "/" in the inputs + -[Settings](./screenshots/settings.png?raw=true) ## Grid Search (or something similar...) @@ -52,7 +63,6 @@ Lets define a selection of models, a prompt and some parameter combinations: The prompt will be submitted once for each parameter **value**, for each one of the selected models, generating a set of responses. - ## A/B Testing Similarly, you can perform A/B tests by selecting different models and compare results for the same prompt/parameter combination, or test different prompts under similar configurations: @@ -62,6 +72,7 @@ Similarly, you can perform A/B tests by selecting different models and compare r Comparing the results of different prompts for the same model ## Prompt Archive + You can save and manage your prompts (we want to make prompts compatible with [Open WebUI](https://github.com/open-webui/open-webui)) [Settings](./screenshots/prompt-archive.png?raw=true) @@ -70,8 +81,6 @@ You can **autocomplete** prompts by typing "/" (inspired by Open WebUI, as well) [A/B testing](./screenshots/autocomplete.gif?raw=true) - - ## Experiment Logs You can list, inspect, or download your experiments: @@ -81,7 +90,7 @@ You can list, inspect, or download your experiments: ## Future Features - Grading results and filtering by grade -- Importing, exporting and sharing prompt lists and experiment parameters. +- Importing, exporting and sharing prompt lists and experiment files. ## Contributing @@ -113,7 +122,7 @@ cd ollama-grid-search If you are running VS Code, add this to your `settings.json` file - ``` + ```json { ... "rust-analyzer.check.command": "clippy", diff --git a/README.md.old b/README.md.old new file mode 100644 index 0000000..69af402 --- /dev/null +++ b/README.md.old @@ -0,0 +1,135 @@ +# Ollama Grid Search: Instantly Evaluate Multiple LLMs and Prompts. + + +This project automates the process of selecting the best models, prompts, or inference parameters for a given use-case, allowing you to iterate over their combinations and to visually inspect the results. + +It assumes [Ollama](https://www.ollama.ai) is installed and serving endpoints, either in `localhost` or in a remote server. + +Here's what an experiment for a simple prompt, tested on 3 different models, looks like: + +[Main Screenshot](./screenshots/main.png?raw=true) + +(For a more in-depth look at an evaluation process assisted by this tool, please check https://dezoito.github.io/2023/12/27/rust-ollama-grid-search.html). + +## Installation + +Check the [releases page](https://github.com/dezoito/ollama-grid-search/releases) for the project, or on the sidebar. + +## Features + +- Automatically fetches models from local or remote Ollama servers; +- Iterates over different models, prompts and parameters to generate inferences; +- A/B test different prompts on several models simultaneously; +- Allows multiple iterations for each combination of parameters; +- Allows [limited concurrency](https://dezoito.github.io/2024/03/21/react-limited-concurrency.html) **or** synchronous inference calls (to prevent spamming servers); +- Optionally outputs inference parameters and response metadata (inference time, tokens and tokens/s); +- Refetching of individual inference calls; +- Model selection can be filtered by name; +- List experiments which can be downloaded in JSON format; +- Experiments can be inspected in readable views; +- Re-run past experiments, cloning or modifying the parameters used in the past; +- Configurable inference timeout; +- Custom default parameters and system prompts can be defined in settings: + +[Settings](./screenshots/settings.png?raw=true) + +## Grid Search (or something similar...) + +Technically, the term "grid search" refers to iterating over a series of different model hyperparams to optimize model performance, but that usually means parameters like `batch_size`, `learning_rate`, or `number_of_epochs`, more commonly used in training. + +But the concept here is similar: + +Lets define a selection of models, a prompt and some parameter combinations: + +[gridparams](./screenshots/gridparams-animation.gif?raw=true) + +The prompt will be submitted once for each parameter **value**, for each one of the selected models, generating a set of responses. + + +## A/B Testing + +Similarly, you can perform A/B tests by selecting different models and compare results for the same prompt/parameter combination, or test different prompts under similar configurations: + +[A/B testing](./screenshots/ab-animation.gif?raw=true) + +Comparing the results of different prompts for the same model + +## Prompt Archive +You can save and manage your prompts (we want to make prompts compatible with [Open WebUI](https://github.com/open-webui/open-webui)) + +[Settings](./screenshots/prompt-archive.png?raw=true) + +You can **autocomplete** prompts by typing "/" (inspired by Open WebUI, as well): + +[A/B testing](./screenshots/autocomplete.gif?raw=true) + + + +## Experiment Logs + +You can list, inspect, or download your experiments: + +[Settings](./screenshots/experiments.png?raw=true) + +## Future Features + +- Grading results and filtering by grade +- Importing, exporting and sharing prompt lists and experiment parameters. + +## Contributing + +- For obvious bugs and spelling mistakes, please go ahead and submit a PR. + +- If you want to propose a new feature, change existing functionality, or propose anything more complex, please open an issue for discussion, **before** getting work done on a PR. + +## Development + +1. Make sure you have Rust installed. + +2. Clone the repository (or a fork) + +```sh +git clone https://github.com/dezoito/ollama-grid-search.git +cd ollama-grid-search +``` + +3. Install the frontend dependencies. + + ```sh + cd + # I'm using bun to manage dependencies, + # but feel free to use yarn or npm + bun install + ``` + +4. Make sure `rust-analyzer` is configured to run `Clippy` when checking code. + + If you are running VS Code, add this to your `settings.json` file + + ``` + { + ... + "rust-analyzer.check.command": "clippy", + } + ``` + + (or, better yet, just use the settings file provided with the code) + +5. Run the app in development mode + ```sh + cd / + bun tauri dev + ``` +6. Go grab a cup of coffee because this may take a while. + +## Citations + +The following works and theses have cited this repository: + +Inouye, D & Lindo, L, & Lee, R & Allen, E; Computer Science and Engineering Senior Theses: **Applied Auto-tuning on LoRA Hyperparameters** +Santa Clara University, 2024 + + +## Thank you! + +Huge thanks to [@FabianLars](https://github.com/FabianLars), [@peperroni21](https://github.com/pepperoni21) and [@TomReidNZ](https://github.com/TomReidNZ).