-
EPFL
- Lausanne
- https://go.romaingrx.com/website
- https://go.romaingrx.com/x
Highlights
- Pro
Pinned Loading
-
Second-Order-Jailbreak
Second-Order-Jailbreak PublicNeurIPS workshop : We examine the risk of powerful malignant intelligent actors spreading their influence over networks of agents with varying intelligence and motivations.
Python 5
-
red-teamer-mistral-nemo
red-teamer-mistral-nemo PublicFinetuning of Mistral Nemo 13B on the WildJailbreak dataset to produce a red-teaming model
Python 2
-
llm-as-a-jailbreak-judge
llm-as-a-jailbreak-judge PublicExplore techniques to use small models as jailbreaking judges
Python 1
-
token-noise-sandbagging
token-noise-sandbagging PublicDetection of sandbagging through Best-of-N sampling with input-level noise perturbations
Python 1
-
relax
relax PublicImplementation of ML papers in JAX+Haiku+Optax while being relaxed
Jupyter Notebook
If the problem persists, check the GitHub status page or contact support.