Skip to content

Commit

Permalink
Merged commit includes the following changes:
Browse files Browse the repository at this point in the history
326478470  by iftenney:

    Fix unbatching issue and tidy up code for GPT-2 demo

--
326478303  by iftenney:

    Fix frontend error if no token input is available

--
326465488  by jwexler:

    Allow setting of hostname used by werkzeug

--
326434919  by jwexler:

    LM prediction module: reset masked token on example switch

--
326347597  by jwexler:

    Clean up README

--
326347174  by jwexler:

    Update README for LIT paper

--
326346943  by iftenney:

    Fix tokenization bug in click-to-mask mode for MLM

--
326345968  by iftenney:

    Safer max_length handling for GLUE classifier

--
326339730  by iftenney:

    Set default layout for LM demo

--
326338272  by iftenney:

    Internal change

326284480  by iftenney:

    Internal change

326278999  by iftenney:

    Internal change

326230051  by jwexler:

    Internal change

PiperOrigin-RevId: 326478470
  • Loading branch information
Googler authored and jameswex committed Aug 13, 2020
1 parent 4f079df commit 5de804d
Show file tree
Hide file tree
Showing 16 changed files with 324 additions and 333 deletions.
89 changes: 69 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Language Interpretability Tool (LIT) :fire:
# 🔥 Language Interpretability Tool (LIT)

The Language Interpretability Tool (LIT) is a visual, interactive
model-understanding tool for NLP models.
Expand Down Expand Up @@ -29,62 +29,111 @@ Features include:
multi-head models and multiple input features out of the box.
* **Framework-agnostic** and compatible with TensorFlow, PyTorch, and more.

For a broader overview, check out [our paper](TBD) and the
For a broader overview, check out [our paper](https://arxiv.org/abs/2008.05122) and the
[user guide](docs/user_guide.md).

## Getting Started
## Documentation

* [User Guide](docs/user_guide.md)
* [Developer Guide](docs/development.md)
* [FAQ](docs/faq.md)

## Download and Installation

Download the repo and set up a Python environment:

```sh
git clone https://github.com/PAIR-code/lit.git ~/lit

# Set up Python environment
cd ~/lit
conda env create -f environment.yml
conda activate lit-nlp
conda install cudnn cupti # optional, for GPU support
conda install -c pytorch pytorch # optional, for PyTorch

# Build the frontend
cd ~/lit/lit_nlp/client
yarn && yarn build
```

Build the frontend (output will be in `~/lit/client/build`). You only need to do
this once, unless you change the TypeScript or CSS files.
## Running LIT

### Quick-start: sentiment classifier

```sh
cd ~/lit/lit_nlp/client
yarn # install deps
yarn build --watch
cd ~/lit
python -m lit_nlp.examples.quickstart_sst_demo --port=5432
```

And run a LIT server, such as those included in
../lit_nlp/examples:
This will fine-tune a [BERT-tiny](https://arxiv.org/abs/1908.08962) model on the
[Stanford Sentiment Treebank](https://nlp.stanford.edu/sentiment/treebank.html),
which should take less than 5 minutes on a GPU. After training completes, it'll
start a LIT server on the development set; navigate to http://localhost:5432 for
the UI.

### Quick start: language modeling

To explore predictions from a pretrained language model (BERT or GPT-2), run:

```sh
cd ~/lit
python -m lit_nlp.examples.pretrained_lm_demo --models=bert-base-uncased \
--port=5432
```

You can then access the LIT UI at http://localhost:5432.
And navigate to http://localhost:5432 for the UI.

## Full Documentation
### More Examples

[Click here for the full documentation site.](docs/index.md)
See ../lit_nlp/examples. Run similarly to the above:

To learn about the features of the tool as an end-user, check out the
[user guide](docs/user_guide.md).
```sh
cd ~/lit
python -m lit_nlp.examples.<example_name> --port=5432 [optional --args]
```

## User Guide

To learn about LIT's features, check out the [user guide](user_guide.md), or
watch this [short video](https://www.youtube.com/watch?v=j0OfBWFUqIE).

## Adding your own models or data

You can easily run LIT with your own model by creating a custom `demo.py`
launcher, similar to those in ../lit_nlp/examples. For a full
walkthrough, see
[adding models and data](docs/python_api.md#adding-models-and-data).
launcher, similar to those in ../lit_nlp/examples. The basic
steps are:

* Write a data loader which follows the
[`Dataset` API](python_api.md#datasets)
* Write a model wrapper which follows the [`Model` API](python_api.md#models)
* Pass models, datasets, and any additional
[components](python_api.md#interpretation-components) to the LIT server
class

For a full walkthrough, see
[adding models and data](python_api.md#adding-models-and-data).

## Extending LIT with new components

LIT is easy to extend with new interpretability components, generators, and
more, both on the frontend or the backend. See the
[developer guide](docs/development.md) to get started.
[developer guide](development.md) to get started.

## Citing LIT

If you use LIT as part of your work, please cite:

TODO: add BibTeX here once we're on arXiv
```
@misc{tenney2020language,
title={The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models},
author={Ian Tenney and James Wexler and Jasmijn Bastings and Tolga Bolukbasi and Andy Coenen and Sebastian Gehrmann and Ellen Jiang and Mahima Pushkarna and Carey Radebaugh and Emily Reif and Ann Yuan},
year={2020},
eprint={2008.05122},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```

## Disclaimer

Expand Down
27 changes: 14 additions & 13 deletions docs/development.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,12 +39,12 @@ browser:
models = {'foo': FooModel(...),
'bar': BarModel(...)}
datasets = {'baz': BazDataset(...)}
server = lit.Server(models, datasets, port=4321)
server = lit_nlp.dev_server.Server(models, datasets, port=4321)
server.serve()
```

For more, see [adding models and data](python_api.md#adding-models-and-data) or the
examples in ../lit_nlp/examples.
For more, see [adding models and data](python_api.md#adding-models-and-data) or
the examples in ../lit_nlp/examples.

[^1]: Naming is just a happy coincidence; the Language Interpretability Tool is
not related to the lit-html or lit-element projects.
Expand All @@ -63,10 +63,10 @@ might define the following spec:
```python
# dataset.spec()
{
"premise": lit.TextSegment(),
"hypothesis": lit.TextSegment(),
"label": lit.CategoryLabel(vocab=["entailment", "neutral", "contradiction"]),
"genre": lit.CategoryLabel(),
"premise": lit_types.TextSegment(),
"hypothesis": lit_types.TextSegment(),
"label": lit_types.CategoryLabel(vocab=["entailment", "neutral", "contradiction"]),
"genre": lit_types.CategoryLabel(),
}
```

Expand All @@ -88,8 +88,8 @@ subset of the dataset fields:
```python
# model.input_spec()
{
"premise": lit.TextSegment(),
"hypothesis": lit.TextSegment(),
"premise": lit_types.TextSegment(),
"hypothesis": lit_types.TextSegment(),
}
```

Expand All @@ -98,8 +98,9 @@ And the output spec:
```python
# model.output_spec()
{
"probas": lit.MulticlassPreds(parent="label",
vocab=["entailment", "neutral", "contradiction"]),
"probas": lit_types.MulticlassPreds(
parent="label",
vocab=["entailment", "neutral", "contradiction"]),
}
```

Expand All @@ -126,15 +127,15 @@ defining multiple `TextSegment` fields as in the above example, while
multi-headed models can simply define multiple output fields. Furthermore, new
types can easily be added to support custom input modalities, output types, or
to provide access to model internals. For a more detailed example, see the
[`lit.Model` documentation](python_api#models).
[`Model` documentation](python_api#models).

The actual spec types, such as `MulticlassLabel`, are simple dataclasses (built
using [`attr.s`](https://www.attrs.org/en/stable/). They are defined in Python,
but are available in the [TypeScript client](client.md) as well.

[`utils.find_spec_keys()`](../lit_nlp/lib/utils.py)
(Python) and
[findSpecKeys()](../lit_nlp/client/lib/utils.ts)
[`findSpecKeys()`](../lit_nlp/client/lib/utils.ts)
(TypeScript) are commonly used to interact with a full spec and identify fields
of interest. These recognize subclasses: for example,
`utils.find_spec_keys(spec, Scalar)` will also match any `RegressionScore`
Expand Down
35 changes: 31 additions & 4 deletions docs/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,34 @@

<!--* freshness: { owner: 'lit-dev' reviewed: '2020-08-04' } *-->

### Can LIT work with `<insert giant transformer model here>`?
### Your implementation of `<technique>` is really cool - can I use it in `<other tool>`?

For backend components: yes! Models, datasets, and interpretation components
don't depend on the LIT serving code at all, and they're designed for standalone
use. You can treat them as any other Python class and use from Colab, regular
scripts, bulk inference pipelines, etc. For example, to compute LIME:

```python
from lit_nlp.examples.datasets import glue
from lit_nlp.examples.models import glue_models
from lit_nlp.components import lime_explainer

dataset = glue.SST2Data('validation')
model = glue_models.SST2Model("/path/to/saved/model")
lime = lime_explainer.LIME()
lime.run([dataset.examples[0]], model, dataset)
# will return {"tokens": ..., "salience": ...} for each example given
```

For the frontend, it's a little more difficult. In order to respond to and
interact with the shared UI state, there's a lot more "framework" code involved.
We're working on refactoring the LIT modules
(../lit_nlp/client/modules) to separate framework and API
code from the visualizations (e.g.
../lit_nlp/client/elements), which can then be re-used in
other environments.

### Can LIT work with `<giant transformer model>`?

Generally, yes! But you'll probably want to use `warm_start=1.0` (or pass
`--warm_start=1.0` as a flag) to pre-compute predictions when the server loads,
Expand All @@ -12,14 +39,14 @@ Also, beware of memory usage: since LIT keeps the models in memory to support
new queries, only so many can fit on a single GPU. If you want to load more
models than can fit in local memory, LIT has experimental support for
remotely-hosted models on another LIT server (see
[`remote_model.py`](../language/lit/components/remote_model.py)
for more details), and you can also write a [`lit.Model`](python_api.md#models)
[`remote_model.py`](../lit_nlp/components/remote_model.py)
for more details), and you can also write a [`Model`](python_api.md#models)
class to interface with your favorite serving framework.

### How many datapoints / examples can LIT handle?

It depends on your model, and on your hardware. We've successfully tested with
10k examples (the entire MultiNLI `validation_matched` split), including
10k examples (the full MultiNLI `validation_matched` split), including
embeddings from the model. But, a couple caveats:

* LIT expects predictions to be available on the whole dataset when the UI
Expand Down
58 changes: 3 additions & 55 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,58 +4,6 @@



## Getting Started

Download the repo and set up a Python environment:

```sh
git clone https://github.com/PAIR-code/lit.git ~/lit
cd ~/lit
conda env create -f environment.yml
conda activate lit-nlp
```

Build the frontend (output will be in `~/lit/client/build`). You only need to do
this once, unless you change the TypeScript or CSS files.

```sh
cd ~/lit/client
yarn # install deps
yarn build --watch
```

And run a LIT server, such as those included in
../lit_nlp/examples:

```sh
cd ~/lit
python -m lit_nlp.examples.pretrained_lm_demo --models=bert-base-uncased \
--port=5432
```

You can then access the LIT UI at http://localhost:4321.

## User Guide

To learn about LIT's features, check out the [user guide](user_guide.md).

## Adding your own models or data

You can easily run LIT with your own model by creating a custom `demo.py`
launcher, similar to those in ../lit_nlp/examples. The basic steps
are:

* Write a data loader which follows the
[`lit.Dataset` API](python_api.md#datasets)
* Write a model wrapper which follows the
[`lit.Model` API](python_api.md#models)
* Pass models, datasets, and any additional
[components](python_api.md#interpretation-components) to the LIT server class

For a full walkthrough, see [adding models and data](python_api.md#adding-models-and-data).

## Extending LIT with new components

LIT is easy to extend with new interpretability components, generators, and
more, both on the frontend or the backend. See the
[developer guide](development.md) to get started.
* [User Guide](user_guide.md)
* [Developer Guide](development.md)
* [FAQ](faq.md)
Loading

0 comments on commit 5de804d

Please sign in to comment.