IntentGuard is a Python library for verifying code properties using natural language assertions. It integrates with testing frameworks like pytest and unittest, allowing you to express complex code expectations in plain English within your existing test suites.
Important
IntentGuard has been updated with a new model, IntentGuard-1-qwen2.5-coder-1.5b
, which delivers improved performance with higher precision (92.3% vs 91.0%) while maintaining excellent overall accuracy. Upgrade to the latest version to benefit from these improvements!
Traditional code testing often requires writing extensive code to verify intricate properties. IntentGuard simplifies this by enabling you to express sophisticated test cases in natural language. This is particularly useful when writing conventional test code becomes impractical or overly complex.
Key Use Cases:
- Complex Property Verification: Test intricate code behaviors that are hard to assert with standard methods.
- Reduced Boilerplate: Avoid writing lengthy test code for advanced checks.
- Improved Readability: Natural language assertions make tests easier to understand, especially for complex logic.
- Natural Language Assertions: Write test assertions in plain English.
- Testing Framework Integration: Works seamlessly with pytest and unittest.
- Deterministic Results: Employs a voting mechanism and controlled sampling for consistent test outcomes.
- Flexible Verification: Test properties difficult to verify using traditional techniques.
- Detailed Failure Explanations: Provides clear, natural language explanations when assertions fail.
- Efficient Result Caching: Caches results to speed up test execution and avoid redundant evaluations.
IntentGuard is ideal when implementing traditional tests for certain code properties is challenging or requires excessive code. Consider these scenarios:
# Example 1: Error Handling Verification
def test_error_handling():
ig.assert_code(
"All methods in {module} should use the custom ErrorHandler class for exception management, and log errors before re-raising them",
{"module": my_critical_module}
)
# Example 2: Documentation Consistency Check
def test_docstring_completeness():
ig.assert_code(
"All public methods in {module} should have docstrings that include Parameters, Returns, and Examples sections",
{"module": my_api_module}
)
In these examples, manually writing tests to iterate through methods, parse AST, and check for specific patterns would be significantly more complex than using IntentGuard's natural language assertions.
IntentGuard ensures reliable results through these mechanisms:
- Voting Mechanism: Each assertion is evaluated multiple times (configurable via
num_evaluations
), and the majority result determines the outcome. - Temperature Control: Low temperature sampling in the LLM minimizes randomness.
- Structured Prompts: Natural language assertions are converted into structured prompts for consistent LLM interpretation.
You can configure determinism settings:
options = IntentGuardOptions(
num_evaluations=5, # Number of evaluations per assertion
)
IntentGuard is compatible with:
- Python: 3.10+
- Operating Systems:
- Linux 2.6.18+ (most distributions since ~2007)
- Darwin (macOS) 23.1.0+ (GPU support only on ARM64)
- Windows 10+ (AMD64 only)
- FreeBSD 13+
- NetBSD 9.2+ (AMD64 only)
- OpenBSD 7+ (AMD64 only)
These OS and architecture compatibilities are inherited from llamafile, which IntentGuard uses to run the model locally.
pip install intentguard
import intentguard as ig
def test_code_properties():
guard = ig.IntentGuard()
# Test code organization
guard.assert_code(
"Classes in {module} should follow the Single Responsibility Principle",
{"module": my_module}
)
# Test security practices
guard.assert_code(
"All database queries in {module} should be parameterized to prevent SQL injection",
{"module": db_module}
)
import unittest
import intentguard as ig
class TestCodeQuality(unittest.TestCase):
def setUp(self):
self.guard = ig.IntentGuard()
def test_error_handling(self):
self.guard.assert_code(
"All API endpoints in {module} should have proper input validation",
{"module": api_module}
)
import intentguard as ig
options = ig.IntentGuardOptions(
num_evaluations=7, # Increase number of evaluations
temperature=0.1, # Lower temperature for more deterministic results
)
guard = ig.IntentGuard(options)
IntentGuard utilizes a custom 1.5B parameter model, fine-tuned from qwen2.5-coder-1.5b. This model is optimized for code analysis and verification and runs locally using llamafile for privacy and efficient inference.
IntentGuard achieves strong performance on code property verification tasks through a rigorous validation framework.
Model | Accuracy | Precision | Recall |
---|---|---|---|
(current model) IntentGuard-1-qwen2.5-coder-1.5b | 92.5% | 92.3% | 89.4% |
(previous model) IntentGuard-1-llama3.2-1b | 92.4% | 91.0% | 91.0% |
(reference model) gpt-4o-mini | 89.3% | 85.3% | 90.2% |
Our validation framework employs a systematic approach:
- Each test example undergoes 15 total evaluations (5 trials × 3 evaluations per trial)
- A voting mechanism is applied within each group (jury size = 3)
- A test passes only if ALL 5 trials succeed with majority agreement (≥2 out of 3)
This strict validation ensures high confidence in the model's consistency and reliability. For more details, see our validation documentation.
To contribute to IntentGuard, set up your local environment:
- Prerequisites: Python 3.10+, Poetry.
- Clone:
git clone <repository_url> && cd intentguard
- Install dev dependencies:
make install
- Run tests & checks:
make test
Refer to the Makefile
for more development commands.
make install
: Installs development dependencies.make install-prod
: Installs production dependencies only.make check
: Runs linting checks (ruff check
).make format-check
: Checks code formatting (ruff format --check
).make mypy
: Runs static type checking (mypy
).make unittest
: Runs unit tests.make test
: Runs all checks and tests.make clean
: Removes the virtual environment.make help
: Lists availablemake
commands.
IntentGuard is a complementary tool for specific testing needs, not a replacement for traditional testing. It is most effective for verifying complex code properties that are difficult to test conventionally.