Domain | Leaderboard |
---|---|
๐ CISO | ๐ View CISO Leaderboard |
โ๏ธ SRE | ๐ View SRE Leaderboard |
Measure the performance of your AI agent(s) across a wide variety of complex and real-life IT automation tasks targetting three key use cases:
- Site Reliability Engineering (SRE): focusing on availability and resiliency
- Financial Operations (FinOps): focusing on enforcing cost efficiencies and optimizing return on investment
- Compliance and Security Operations (CISO): focusing on ensuring compliance and security of IT implementations
This is a public leaderboard. ITBench handles the deployment of the environments and scenarios, and it evaluates the submissions made by the agent.
- Scenario: ITBench incorporates a collection of problems that we call "scenarios." For example, one of the SRE scenarios in ITBench is to resolve a โHigh error rate on service checkoutโ in a Kubernetes environment. Another scenario that is relevant for the CISO use case involves assessing the compliance posture for a โnew control rule detected for RHEL 9.โ
- Environment: Each of the ITBench scenarios are deployed in an operational sandboxed Kubernetes environment.
- Benchmark: Collection of scenarios that are excuted parallel or in sequence but independent of each other. An agent makes a submission to address. diagnose, or remediate the scenario at hand.
- A private GitHub repository
- A file facilitating the agent and leaderboard handshake is pushed to this private repository.
- The file(s) may be created or deleted automatically during the benchmark lifecycle.
- A Kubernetes sandbox cluster (KinD recommended) -- Only needed for CISO
- Do not use a production cluster, because the benchmark process will create and delete resources dynamically.
- Please refer to prepare-kubeconfig-kind.md
- An agent to benchmark
- A base agent is available from IBM for immediate use. The base agent for the CISO use case can be found here, and one for SRE and FinOps use cases can be found [here]. This allows you to leverage your methodologies and make improvements without having to worry about interactions between the agent and leaderboard service.
Install the ibm-itbench GitHub app into the private GitHub repository (see Prerequisites).
-
Go to the installation page here.
-
Select your GitHub Organization.
-
Select your Agent configuration repo.
In this step, you will register your agent information with ITBench.
- Create a new registration issue.
- Go to Agent Registration Form and create a new issue.
- Go to Agent Registration Form and create a new issue.
- Fill in the issue template with the following information:
- Submit the issue.
-
Click "Create" to submit your registration request.
-
Once your request is approved:
-
If you subscribe to the issue, you will also receive email notifications.
If there are any problems with your submission, we will respond directly on the issue. If you do not receive any response within a couple of days, please reach out to the maintainers.
In this step, you will register your benchmark entry.
- Create a new benchmark issue.
-
Go to Benchmark Registration Form and create a new issue.
-
- Fill in the issue template.
- Submit the issue.
If there are any problems with your submission, we will respond directly on the issue. If you do not receive any response within a couple of days, please reach out to the maintainers.
You can run either your own custom agent or one of our built-in agents against the ITBench benchmark.
The following guides and videos demonstrate how to run the benchmark using our built-in agents. These may also serve as helpful references when setting up your own agent:
- CISO Agent โ Documentation ใป Demo Video
- SRE Agent โ Documentation
- ITBench: Central repository providing an overview of the ITBench ecosystem, related announcements, and publications.
- CISO-CAA Agent: CISO (Chief Information Security Officer) agents that automate compliance assessments by generating policies from natural language, collecting evidence, integrating with GitOps workflows, and deploying policies for assessment.
- SRE Agent: SRE (Site Reliability Engineering) agents designed to diagnose and remediate problems in Kubernetes-based environments. Leverage logs, metrics, traces, and Kubernetes states/events from the IT enviroment.
- ITBench Scenarios: Environment setup and mechanism to trigger scenarios.
- ITBench Utilities: Collection of supporting tools and utilities for participants in the ITBench ecosystem and leaderboard challenges.
- ITBench Tutorials: Repository containing the latest tutorials, workshops, and educational content for getting started with ITBench.
- Takumi Yanagawa - @yana1205
- Yuji Watanabe - @yuji-watanabe-jp
- Rohan R. Arora - @rohanarora