The LLM Defense Framework enhances large language model security through post-processing defenses and statistical guarantees based on one-class SVM. It combines advanced sampling methods with adaptive policy updates and comprehensive evaluation metrics, providing researchers and practitioners with tools to build more secure AI systems.
-
Updated
Feb 6, 2025 - Python