Merged commit includes the following changes:

326478470 by iftenney: Fix unbatching issue and tidy up code for GPT-2 demo -- 326478303 by iftenney: Fix frontend error if no token input is available -- 326465488 by jwexler: Allow setting of hostname used by werkzeug -- 326434919 by jwexler: LM prediction module: reset masked token on example switch -- 326347597 by jwexler: Clean up README -- 326347174 by jwexler: Update README for LIT paper -- 326346943 by iftenney: Fix tokenization bug in click-to-mask mode for MLM -- 326345968 by iftenney: Safer max_length handling for GLUE classifier -- 326339730 by iftenney: Set default layout for LM demo -- 326338272 by iftenney: Internal change 326284480 by iftenney: Internal change 326278999 by iftenney: Internal change 326230051 by jwexler: Internal change PiperOrigin-RevId: 326478470
PAIR-code · Aug 13, 2020 · 5de804d · 5de804d
1 parent 4f079df
commit 5de804d
Show file tree

Hide file tree

Showing 16 changed files with 324 additions and 333 deletions.
diff --git a/README.md b/README.md
@@ -1,4 +1,4 @@
-# Language Interpretability Tool (LIT) :fire:
+# 🔥 Language Interpretability Tool (LIT)
 
 The Language Interpretability Tool (LIT) is a visual, interactive
 model-understanding tool for NLP models.
@@ -29,62 +29,111 @@ Features include:
     multi-head models and multiple input features out of the box.
 *   **Framework-agnostic** and compatible with TensorFlow, PyTorch, and more.
 
-For a broader overview, check out [our paper](TBD) and the
+For a broader overview, check out [our paper](https://arxiv.org/abs/2008.05122) and the
 [user guide](docs/user_guide.md).
 
-## Getting Started
+## Documentation
+
+*   [User Guide](docs/user_guide.md)
+*   [Developer Guide](docs/development.md)
+*   [FAQ](docs/faq.md)
+
+## Download and Installation
 
 Download the repo and set up a Python environment:
 
 ```sh
 git clone https://github.com/PAIR-code/lit.git ~/lit
+
+# Set up Python environment
 cd ~/lit
 conda env create -f environment.yml
 conda activate lit-nlp
+conda install cudnn cupti  # optional, for GPU support
+conda install -c pytorch pytorch  # optional, for PyTorch
+
+# Build the frontend
+cd ~/lit/lit_nlp/client
+yarn && yarn build
 ```
 
-Build the frontend (output will be in `~/lit/client/build`). You only need to do
-this once, unless you change the TypeScript or CSS files.
+## Running LIT
+
+### Quick-start: sentiment classifier
 
 ```sh
-cd ~/lit/lit_nlp/client
-yarn  # install deps
-yarn build --watch
+cd ~/lit
+python -m lit_nlp.examples.quickstart_sst_demo --port=5432
 ```
 
-And run a LIT server, such as those included in
-../lit_nlp/examples:
+This will fine-tune a [BERT-tiny](https://arxiv.org/abs/1908.08962) model on the
+[Stanford Sentiment Treebank](https://nlp.stanford.edu/sentiment/treebank.html),
+which should take less than 5 minutes on a GPU. After training completes, it'll
+start a LIT server on the development set; navigate to http://localhost:5432 for
+the UI.
+
+### Quick start: language modeling
+
+To explore predictions from a pretrained language model (BERT or GPT-2), run:
 
 ```sh
 cd ~/lit
 python -m lit_nlp.examples.pretrained_lm_demo --models=bert-base-uncased \
   --port=5432
 ```
 
-You can then access the LIT UI at http://localhost:5432.
+And navigate to http://localhost:5432 for the UI.
 
-## Full Documentation
+### More Examples
 
-[Click here for the full documentation site.](docs/index.md)
+See ../lit_nlp/examples. Run similarly to the above:
 
-To learn about the features of the tool as an end-user, check out the
-[user guide](docs/user_guide.md).
+```sh
+cd ~/lit
+python -m lit_nlp.examples.<example_name> --port=5432 [optional --args]
+```
+
+## User Guide
+
+To learn about LIT's features, check out the [user guide](user_guide.md), or
+watch this [short video](https://www.youtube.com/watch?v=j0OfBWFUqIE).
+
+## Adding your own models or data
 
 You can easily run LIT with your own model by creating a custom `demo.py`
-launcher, similar to those in ../lit_nlp/examples. For a full
-walkthrough, see
-[adding models and data](docs/python_api.md#adding-models-and-data).
+launcher, similar to those in ../lit_nlp/examples. The basic
+steps are:
+
+*   Write a data loader which follows the
+    [`Dataset` API](python_api.md#datasets)
+*   Write a model wrapper which follows the [`Model` API](python_api.md#models)
+*   Pass models, datasets, and any additional
+    [components](python_api.md#interpretation-components) to the LIT server
+    class
+
+For a full walkthrough, see
+[adding models and data](python_api.md#adding-models-and-data).
 
+## Extending LIT with new components
 
 LIT is easy to extend with new interpretability components, generators, and
 more, both on the frontend or the backend. See the
-[developer guide](docs/development.md) to get started.
+[developer guide](development.md) to get started.
 
 ## Citing LIT
 
 If you use LIT as part of your work, please cite:
 
-TODO: add BibTeX here once we're on arXiv
+```
+@misc{tenney2020language,
+    title={The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models},
+    author={Ian Tenney and James Wexler and Jasmijn Bastings and Tolga Bolukbasi and Andy Coenen and Sebastian Gehrmann and Ellen Jiang and Mahima Pushkarna and Carey Radebaugh and Emily Reif and Ann Yuan},
+    year={2020},
+    eprint={2008.05122},
+    archivePrefix={arXiv},
+    primaryClass={cs.CL}
+}
+```
 
 ## Disclaimer
 

diff --git a/docs/development.md b/docs/development.md
@@ -39,12 +39,12 @@ browser:
 models = {'foo': FooModel(...),
           'bar': BarModel(...)}
 datasets = {'baz': BazDataset(...)}
-server = lit.Server(models, datasets, port=4321)
+server = lit_nlp.dev_server.Server(models, datasets, port=4321)
 server.serve()
 ```
 
-For more, see [adding models and data](python_api.md#adding-models-and-data) or the
-examples in ../lit_nlp/examples.
+For more, see [adding models and data](python_api.md#adding-models-and-data) or
+the examples in ../lit_nlp/examples.
 
 [^1]: Naming is just a happy coincidence; the Language Interpretability Tool is
     not related to the lit-html or lit-element projects.
@@ -63,10 +63,10 @@ might define the following spec:
 ```python
 # dataset.spec()
 {
-  "premise": lit.TextSegment(),
-  "hypothesis": lit.TextSegment(),
-  "label": lit.CategoryLabel(vocab=["entailment", "neutral", "contradiction"]),
-  "genre": lit.CategoryLabel(),
+  "premise": lit_types.TextSegment(),
+  "hypothesis": lit_types.TextSegment(),
+  "label": lit_types.CategoryLabel(vocab=["entailment", "neutral", "contradiction"]),
+  "genre": lit_types.CategoryLabel(),
 }
 ```
 
@@ -88,8 +88,8 @@ subset of the dataset fields:
 ```python
 # model.input_spec()
 {
-  "premise": lit.TextSegment(),
-  "hypothesis": lit.TextSegment(),
+  "premise": lit_types.TextSegment(),
+  "hypothesis": lit_types.TextSegment(),
 }
 ```
 
@@ -98,8 +98,9 @@ And the output spec:
 ```python
 # model.output_spec()
 {
-  "probas": lit.MulticlassPreds(parent="label",
-                                vocab=["entailment", "neutral", "contradiction"]),
+  "probas": lit_types.MulticlassPreds(
+        parent="label",
+        vocab=["entailment", "neutral", "contradiction"]),
 }
 ```
 
@@ -126,15 +127,15 @@ defining multiple `TextSegment` fields as in the above example, while
 multi-headed models can simply define multiple output fields. Furthermore, new
 types can easily be added to support custom input modalities, output types, or
 to provide access to model internals. For a more detailed example, see the
-[`lit.Model` documentation](python_api#models).
+[`Model` documentation](python_api#models).
 
 The actual spec types, such as `MulticlassLabel`, are simple dataclasses (built
 using [`attr.s`](https://www.attrs.org/en/stable/). They are defined in Python,
 but are available in the [TypeScript client](client.md) as well.
 
 [`utils.find_spec_keys()`](../lit_nlp/lib/utils.py)
 (Python) and
-[findSpecKeys()](../lit_nlp/client/lib/utils.ts)
+[`findSpecKeys()`](../lit_nlp/client/lib/utils.ts)
 (TypeScript) are commonly used to interact with a full spec and identify fields
 of interest. These recognize subclasses: for example,
 `utils.find_spec_keys(spec, Scalar)` will also match any `RegressionScore`

diff --git a/docs/faq.md b/docs/faq.md
@@ -2,7 +2,34 @@
 
 <!--* freshness: { owner: 'lit-dev' reviewed: '2020-08-04' } *-->
 
-### Can LIT work with `<insert giant transformer model here>`?
+### Your implementation of `<technique>` is really cool - can I use it in `<other tool>`?
+
+For backend components: yes! Models, datasets, and interpretation components
+don't depend on the LIT serving code at all, and they're designed for standalone
+use. You can treat them as any other Python class and use from Colab, regular
+scripts, bulk inference pipelines, etc. For example, to compute LIME:
+
+```python
+from lit_nlp.examples.datasets import glue
+from lit_nlp.examples.models import glue_models
+from lit_nlp.components import lime_explainer
+
+dataset = glue.SST2Data('validation')
+model = glue_models.SST2Model("/path/to/saved/model")
+lime = lime_explainer.LIME()
+lime.run([dataset.examples[0]], model, dataset)
+# will return {"tokens": ..., "salience": ...} for each example given
+```
+
+For the frontend, it's a little more difficult. In order to respond to and
+interact with the shared UI state, there's a lot more "framework" code involved.
+We're working on refactoring the LIT modules
+(../lit_nlp/client/modules) to separate framework and API
+code from the visualizations (e.g.
+../lit_nlp/client/elements), which can then be re-used in
+other environments.
+
+### Can LIT work with `<giant transformer model>`?
 
 Generally, yes! But you'll probably want to use `warm_start=1.0` (or pass
 `--warm_start=1.0` as a flag) to pre-compute predictions when the server loads,
@@ -12,14 +39,14 @@ Also, beware of memory usage: since LIT keeps the models in memory to support
 new queries, only so many can fit on a single GPU. If you want to load more
 models than can fit in local memory, LIT has experimental support for
 remotely-hosted models on another LIT server (see
-[`remote_model.py`](../language/lit/components/remote_model.py)
-for more details), and you can also write a [`lit.Model`](python_api.md#models)
+[`remote_model.py`](../lit_nlp/components/remote_model.py)
+for more details), and you can also write a [`Model`](python_api.md#models)
 class to interface with your favorite serving framework.
 
 ### How many datapoints / examples can LIT handle?
 
 It depends on your model, and on your hardware. We've successfully tested with
-10k examples (the entire MultiNLI `validation_matched` split), including
+10k examples (the full MultiNLI `validation_matched` split), including
 embeddings from the model. But, a couple caveats:
 
 *   LIT expects predictions to be available on the whole dataset when the UI

diff --git a/docs/index.md b/docs/index.md
@@ -4,58 +4,6 @@
 
 
 
-## Getting Started
-
-Download the repo and set up a Python environment:
-
-```sh
-git clone https://github.com/PAIR-code/lit.git ~/lit
-cd ~/lit
-conda env create -f environment.yml
-conda activate lit-nlp
-```
-
-Build the frontend (output will be in `~/lit/client/build`). You only need to do
-this once, unless you change the TypeScript or CSS files.
-
-```sh
-cd ~/lit/client
-yarn  # install deps
-yarn build --watch
-```
-
-And run a LIT server, such as those included in
-../lit_nlp/examples:
-
-```sh
-cd ~/lit
-python -m lit_nlp.examples.pretrained_lm_demo --models=bert-base-uncased \
-  --port=5432
-```
-
-You can then access the LIT UI at http://localhost:4321.
-
-## User Guide
-
-To learn about LIT's features, check out the [user guide](user_guide.md).
-
-## Adding your own models or data
-
-You can easily run LIT with your own model by creating a custom `demo.py`
-launcher, similar to those in ../lit_nlp/examples. The basic steps
-are:
-
-*   Write a data loader which follows the
-    [`lit.Dataset` API](python_api.md#datasets)
-*   Write a model wrapper which follows the
-    [`lit.Model` API](python_api.md#models)
-*   Pass models, datasets, and any additional
-    [components](python_api.md#interpretation-components) to the LIT server class
-
-For a full walkthrough, see [adding models and data](python_api.md#adding-models-and-data).
-
-## Extending LIT with new components
-
-LIT is easy to extend with new interpretability components, generators, and
-more, both on the frontend or the backend. See the
-[developer guide](development.md) to get started.
+*   [User Guide](user_guide.md)
+*   [Developer Guide](development.md)
+*   [FAQ](faq.md)