Light Prompter - Accelerate test-time-compute with batching!

Inspired by optillm, Light Prompter is a Python framework for efficiently batching responses with prompting strategies that include multiple steps. This means that if you use an inference engine such as vLLM, you can get speedups of several magnitudes.

It works through a set of Responders, each of which acts as a state machine containing nested responders. Responders request a chat completion, which gets aggregated with all other chat completions so that they can be solved in one batch. Aftewards, these chat completions are sent back down to the responders.

Here's a diagram illustrating this process for something like self-consistency:

Quick Start:

Use one of the following Kaggle notebooks:

https://www.kaggle.com/code/green000/light-prompter-fast-moa

This is an implementation of Mixture-Of-Agents from TogetherAI. Prompts were taken from https://github.com/togethercomputer/MoA.

https://www.kaggle.com/code/green000/light-prompter-fast-self-consistency

This is an implementation of self-consistency from https://arxiv.org/abs/2203.11171.

Feel free to edit the similarity function as you wish, but be sure to include a case for when processing the answer fails, as it is not guranteed to be in the correct format.

Slow Start:

Read the examples in the examples folder.

Here are all the responders you can use:

Prompt_Basic: Single-step response generation

Prompt_TwoTurn: Two-step response and answer extraction

Aggregate_LLM: Combine multiple responses using another responder

Critique_LLM: Generates an initial response with a responder, critiques it with a second responder, and rewrites it using a third responder

PickCommon_Custom: Self-consistency; picks answer with most similarity to other answers based on the input function

All responders return a response which contain a verbose_output, an output, and an answer. The answer is the shortest specific answer, the output is the standard LLM output and the verbose_output contains all the steps taken.

Within sane_defaults.py you will find starter prompts (a basic COT prompt, the aggregator prompt used for MOA, a prompt that makes the aggregator choose the best response, a critique prompt, a rewriting prompt) and some answer extractors (which take in a string and try to "extract" the LLM's answer). Note that you can pass in answer extractor to most responders to try to parse the answer out. This is important! Some strategies like self-consistency rely on having a properly parsed answer.

After you have built a tree of responders, use the execute function from responders.py on the root. Make sure you have a model setup, the default in model.py is OpenAI compatible and thus will work with any local inference engine but is slow. The vLLM variant is much preferred. You can also implement your own.

Basic example:

from light_prompter import Model
from light_prompter import responder

from light_prompter.responders import Prompt_TwoTurn

# Initialize model
model = Model(url="https://api.example.com", api_key="your-api-key", model="model-name-here")

# Create responder
responder = Prompt_TwoTurn()

# Execute prompt
response = execute(model, responder, "What is the capital of France?")

print(response.output)

Contributions are welcome!

TODO:

Implement plansearch and code execution, possibly some kind of tree search

it probably will not be possible to break plansearch down into a set of smaller responders like is done for MOA, as it is quite involved

Create a WebUI for testing with a list of preset responder configurations
Create an OpenAI compatible API
Batching across many inputs

Planned:

Support more inference engines with batching out of the box such as exllama, sglang, tensorrtllm
A model router for routing requests to several API's at once when batching.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
examples		examples
responders		responders
README.md		README.md
model.py		model.py
responder.py		responder.py
sane_defaults.py		sane_defaults.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Light Prompter - Accelerate test-time-compute with batching!

Quick Start:

Slow Start:

About

Releases

Packages

Languages

Green0-0/light_prompter

Folders and files

Latest commit

History

Repository files navigation

Light Prompter - Accelerate test-time-compute with batching!

Quick Start:

Slow Start:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages