diff --git a/README.md b/README.md index 021b0427e..3093fc960 100644 --- a/README.md +++ b/README.md @@ -21,6 +21,9 @@ [![Downloads](https://pepy.tech/badge/garak/month)](https://pepy.tech/project/garak) +"🚧 garak is moving. This repository is moving to the `NVIDIA` github organization in the near future. This is planned to be a non-disruptive transition with automatic redirection. 🚧" + + ## Get started ### > See our user guide! [docs.garak.ai](https://docs.garak.ai/) ### > Join our [Discord](https://discord.gg/uVch4puUCs)! @@ -75,6 +78,13 @@ python -m pip install -e . OK, if that went fine, you're probably good to go! +**Note**: if you cloned before the move to the `NVIDIA` GitHub organisation, but you're reading this at the `github.com/NVIDIA` URI, please update your remotes as follows: + +``` +git remote set-url origin https://github.com/NVIDIA/garak.git +``` + + ## Getting started The general syntax is: @@ -223,7 +233,7 @@ For testing. This generator repeats back the prompt it received. | Probe | Description | | --- | --- | | blank | A simple probe that always sends an empty prompt. | -| atkgen | Automated Attack Generation. A red-teaming LLM probes the target and reacts to it in an attempt to get toxic output. Prototype, mostly stateless, for now uses a simple GPT-2 [fine-tuned](https://huggingface.co/leondz/artgpt2tox) on the subset of hhrlhf attempts that yielded detectable toxicity (the only target currently supported for now). | +| atkgen | Automated Attack Generation. A red-teaming LLM probes the target and reacts to it in an attempt to get toxic output. Prototype, mostly stateless, for now uses a simple GPT-2 [fine-tuned](https://huggingface.co/garak-llm/artgpt2tox) on the subset of hhrlhf attempts that yielded detectable toxicity (the only target currently supported for now). | | av_spam_scanning | Probes that attempt to make the model output malicious content signatures | | continuation | Probes that test if the model will continue a probably undesirable word | | dan | Various [DAN](https://adguard.com/en/blog/chatgpt-dan-prompt-abuse.html) and DAN-like attacks |