Skip to content

Commit

Permalink
docs: update faq (#957)
Browse files Browse the repository at this point in the history
touch on a few items that could do with more attention
  • Loading branch information
leondz authored Oct 25, 2024
2 parents 549c72a + 689ba7e commit 7d333b7
Show file tree
Hide file tree
Showing 7 changed files with 148 additions and 41 deletions.
46 changes: 45 additions & 1 deletion FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,10 +102,54 @@ Adding a custom generator is fairly straight forward. One can either add a new c
* The Python implementation of XDG that garak uses allows overriding the data directory using the `XDG_DATA_HOME` environment variable
* An alternative is to symlink the paths to where you want them to be

## How do I configure my run in more detail?

There is a lot you can do here. In order of increasing complexity:

1. Be specific about the list of probes you request, using the `-p` command line option
1. Have a look at `garak`'s config options: run `garak --help` to see what there is
1. Garak offers rich and detailed configuration for runs and its plugins, via YAML. You can find an intro guide here, [Configuring garak](https://reference.garak.ai/en/latest/configurable.html).

## There are many static prompts in garak. How can I make these more dynamic?

This is exactly what [`buffs`](https://reference.garak.ai/en/latest/buffs.html) are for - buffs automatically
modify prompts in flight before they're sent to the generator/LLM. For example, `garak.buffs.paraphrase`
dynamically converts each query prompt into a set of alternative phrasings - given a fixed inference budget, it's often great alternative to increasing generations (docs [here](https://reference.garak.ai/en/latest/garak.buffs.paraphrase.html)).

## Is garak just static probes?

No, very much not. Garak has:

* static probes, which are a set of fixed prompts; this can be from e.g. scientific papers that specify a fixed set of prompts, so that we get replicability
* assembled probes, where prompts are assembled from a configurable set of pieces
* dynamic probes, which look different each run; an example is `latentinjection.LatentWhoisSnippet`, where the list of snippet permutations is so large that it's best to shuffe and sample
* reactive probes, that respond to LLM behavior and adapt as we go along; examples include `atkgen`, `topic`, as well as the compute-intense `tap` and `suffix` modules (excluding their cached versions)

## How do I get a report according to OWASP LLM Top 10 categories?

You can invoke report analysis directly on the report.jsonl file in question,
and give a taxonomy as a second parameter. For example:

```
python -m garak.analyze.report_digest garak.1234.report.jsonl owasp > report.html
```

This groups the top-leve figures and findings according to the OWASP Top 10 for LLM v1.


## How do I interpret my scores?

It's difficult to know if a 0.55 pass rate is good or terrible. That's why we calibrate
garak scores against a bag of state-of-the-art models regularly, and report how well the
target model is performing relative to that. It's included in the HTML report as a Z-score,
and can be given on the CLI by setting `system.show_z=True` in the config.

For more details on exactly how we do this calibration, see [data/calibration/bag.md].



<!-- ## Why the name?
Congrats, if you're reading this, you found a flag!
It's named after a smooth-talking, manipulative, persuasive, well-written character from a nineties TV series. Because we need tools like that to dissect LLM behavior. -->
It's also a smooth-talking, manipulative, persuasive, well-written character from a nineties TV series. Because we need tools like that in order to dissect LLM behavior. -->
23 changes: 22 additions & 1 deletion docs/source/_config.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,28 @@
garak._config
=============

``_config`` is an internal-only class, with no guarantee of a stable API.

This module holds config values.

These are broken into the following major categories:

* system: options that don't affect the security assessment
* run: options that describe how a garak run will be conducted
* plugins: config for plugins (generators, probes, detectors, buffs)
* transient: internal values local to a single garak execution

Config values are loaded in the following priority (lowest-first):

* Plugin defaults in the code
* Core config: from ``garak/resources/garak.core.yaml``; not to be overridden
* Site config: from ``$HOME/.config/garak/garak.site.yaml``
* Runtime config: from an optional config file specified manually, via e.g. CLI parameter
* Command-line options


Code
^^^^


.. automodule:: garak._config
:members:
Expand Down
15 changes: 14 additions & 1 deletion docs/source/_plugins.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,20 @@
garak._plugins
==============

``_plugins`` is an internal-only class, with no guarantee of a stable API.

garak._plugins
--------------

This module manages plugin enumeration and loading.
There is one class per plugin in garak.
Enumerating the classes, with e.g. ``--list_probes`` on the command line, means importing each module.
Therefore, modules should do as little as possible on load, and delay
intensive activities (like loading classifiers) until a plugin's class is instantiated.


Code
^^^^


.. automodule:: garak._plugins
:members:
Expand Down
85 changes: 54 additions & 31 deletions docs/source/basic.rst
Original file line number Diff line number Diff line change
@@ -1,46 +1,69 @@
garak
=====
Top-level concepts in garak
===========================

garak._config
-------------
What are we doing here, and how does it all fit together?
Our goal is to test the security of something that takes prompts
and returns text. garak has a few constructs used to simplify and
organise this process.

This module holds config values.

These are broken into the following major categories:
generators
----------

* system: options that don't affect the security assessment
* run: options that describe how a garak run will be conducted
* plugins: config for plugins (generators, probes, detectors, buffs)
* transient: internal values local to a single garak execution
Generators wrap a target LLM or dialogue system. They take a prompt
and return the output. The rest is abstracted away. Generator classes
deal with things like authentication, loading, connection management,
backoff, and all the behind-the-scenes things that need to happen
to get that prompt/response interaction working.

Config values are loaded in the following priority (lowest-first):
probes
------
Each probe tries to exploit a weakness and elicit a failure. The probe
manages all the interaction with the generator. It determines how
often to prompt, and what the content of the prompts is. Interaction
between probes and generators is mediated in an object called an attempt.

* Plugin defaults in the code
* Core config: from ``garak/resources/garak.core.yaml``; not to be overridden
* Site config: from ``$HOME/.config/garak/garak.site.yaml``
* Runtime config: from an optional config file specified manually, via e.g. CLI parameter
* Command-line options
attempt
-------
Attempts represent one unique try at breaking the target. A probe wraps
up each of its adversarial interactions in an attempt object, and passes this
to the generator. The generator adds responses into the attempt and sends
the attempt back. This is logged in garak reporting which contains (among other
things) JSON dumps of attempts.

Code
^^^^
Once the probe is done with the attempt and the generator has added its
outputs, the outputs are examined for signs of failures. This is done in a
detector.

.. automodule:: garak._config
:members:
:undoc-members:
:show-inheritance:
detectors
---------
Each detector attempts to identify a single failure mode. This could be
for example some unsafe contact, or failure to refuse a request. Detectors
do this by examining outputs that are stored in a prompt, looking for a
certain phenomenon. This could be a lack of refusal, or continuation of a
string in a certain way, or decoding an encoded prompt, for example.


buffs
-----
Buffs adjust prompts before they're sent to a generator. This could involve
translating them to another language, or adding paraphrases for probes that
have only a few, static prompts.


evaluators
----------
When detectors have added judgments to attempts, an evaluator converts the results
to an object containing pass/fail data for a specific probe and detector pair.

harnesses
---------
The harnesses manage orchestration of a garak run. They select probes, then
detectors, and co-ordinate running probes, passing results to detectors, and
doing the final evaluation

garak._plugins
--------------

This module manages plugin enumeration and loading.
There is one class per plugin in garak.
Enumerating the classes, with e.g. ``--list_probes`` on the command line, means importing each module.
Therefore, modules should do as little as possible on load, and delay
intensive activities (like loading classifiers) until a plugin's class is instantiated.

Code
^^^^

.. automodule:: garak._plugins
:members:
Expand Down
11 changes: 10 additions & 1 deletion docs/source/buffs.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
garak.buffs
===============
===========

Buff plugins augment, constrain, or otherwise perturb the interaction
between probes and a generator. These allow things like mapping
probes into a different language, or expanding prompts to various
paraphrases, and so on.

Buffs must inherit this base class.
`Buff` serves as a template showing what expectations there are for
implemented buffs.

.. toctree::
:maxdepth: 2
Expand Down
2 changes: 2 additions & 0 deletions docs/source/garak.buffs.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
garak.buffs
===========



.. automodule:: garak.buffs
:members:
:undoc-members:
Expand Down
7 changes: 1 addition & 6 deletions garak/buffs/base.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,7 @@
# SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0

"""Base classes for buffs.
Buff plugins augment, constrain, or otherwise perturb the interaction
between probes and a generator. Buffs must inherit this base class.
`Buff` serves as a template showing what expectations there are for
implemented buffs. """
"""Base classes for buffs. """

from collections.abc import Iterable
import logging
Expand Down

0 comments on commit 7d333b7

Please sign in to comment.