If you use public AI services such as OpenAI, Anthropic or Mistral, Sarus Arena is an agent you can easily deploy in your infrastructure to do:
- LLM evaluation: AB-testing, user-feedback evaluation, formula-based evaluation and LLM as a Judge
- LLM compliance: Request and response filtering and redacting (PII removal, guardrailing), evaluation-based routing
- LLM distillation: Train your own model based on the best evaluated responses
A test instance is hosted by Sarus: arena.sarus.app.
You can deploy your own instance using the provided helm arena chart and following the deployment instructions.
A document describing the installation process is available there.
To start the test environment, run:
docker compose --profile "*" up
Docker compose uses compose.yml
and overrides it with compose.override.yml
.