Skip to content

demos.md

James Wexler edited this page Oct 29, 2021 · 3 revisions

Demos

The LIT team maintains a number of hosted demos, as well as pre-built launchers for some common tasks and model types.

For publicly-visible demos hosted on Google Cloud, see https://pair-code.github.io/lit/demos/.


Classification

Sentiment and NLI

Hosted instance: https://pair-code.github.io/lit/demos/glue.html
Code: https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/glue_demo.py

  • Multi-task demo:
    • Sentiment analysis as a binary classification task (SST-2) on single sentences.
    • Natural Language Inference (NLI) using MultiNLI, as a three-way classification task with two-segment input (premise, hypothesis).
    • STS-B textual similarity task (see Regression / Scoring below).
    • Switch tasks using the Settings (⚙️) menu.
  • BERT models of different sizes, built on HuggingFace TF2 (Keras).
  • Supports the widest range of LIT interpretability features:
    • Model output probabilities, custom thresholds, and multiclass metrics.
    • Jitter plot of output scores, to find confident examples or ones near the margin.
    • Embedding projector to find clusters in representation space.
    • Integrated Gradients, LIME, and other salience methods.
    • Attention visualization.
    • Counterfactual generators, including HotFlip for targeted adversarial perturbations.

Tip: check out a case study for this demo on the public LIT website: https://pair-code.github.io/lit/tutorials/sentiment

Multilingual (XNLI)

Code: https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/xnli_demo.py

  • XNLI dataset translates a subset of MultiNLI into 14 different languages.
  • Specify --languages=en,jp,hi,... flag to select which languages to load.
  • NLI as a three-way classification task with two-segment input (premise, hypothesis).
  • Fine-tuned multilingual BERT model.
  • Salience methods work with non-whitespace-delimited text, by using the model's wordpiece tokenization.

Regression / Scoring

Textual Similarity (STS-B)

Hosted instance: https://pair-code.github.io/lit/demos/glue.html?models=stsb&dataset=stsb_dev
Code: https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/glue_demo.py

  • STS-B textual similarity task, predicting scores on a range from 0 (unrelated) to 5 (very similar).
  • BERT models built on HuggingFace TF2 (Keras).
  • Supports a wide range of LIT interpretability features:
    • Model output scores and metrics.
    • Scatter plot of scores and error, and jitter plot of true labels for quick filtering.
    • Embedding projector to find clusters in representation space.
    • Integrated Gradients, LIME, and other salience methods.
    • Attention visualization.

Sequence-to-Sequence

Open-source T5

Code: https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/t5_demo.py

  • Supports HuggingFace TF2 (Keras) models as well as TensorFlow SavedModel formats.
  • Visualize beam candidates and highlight diffs against references.
  • Visualize per-token decoder hypotheses to see where the model veers away from desired output.
  • Filter examples by ROUGE score against reference.
  • Embeddings from last layer of model, visualized with UMAP or PCA.
  • Task wrappers to handle pre- and post-processing for summarization and machine translation tasks.
  • Pre-loaded eval sets for CNNDM and WMT.

Tip: check out a case study for this demo on the public LIT website: https://pair-code.github.io/lit/tutorials/generation


Language Modeling

BERT and GPT-2

Code: https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/lm_demo.py

  • Compare multiple BERT and GPT-2 models side-by-side on a variety of plain-text corpora.
  • LM visualization supports different modes:
    • BERT masked language model: click-to-mask, and query model at that position.
    • GPT-2 shows left-to-right hypotheses for each target token.
  • Embedding projector to show latent space of the model.

Structured Prediction

Gender Bias in Coreference

Code: https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/coref/coref_demo.py

  • Gold-mention coreference model, trained on OntoNotes.
  • Evaluate on the Winogender schemas (Rudinger et al. 2018) which test for gendered associations with profession names.
  • Visualizations of coreference edges, as well as binary classification between two candidate referents.
  • Stratified metrics for quantifying model bias as a function of pronoun gender or Bureau of Labor Statistics profession data.

Tip: check out a case study for this demo on the public LIT website: https://pair-code.github.io/lit/tutorials/coref


Multimodal

Tabular Data: Penguin Classification

Code: https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/penguin_demo.py

  • Binary classification on penguin dataset.
  • Showing using of LIT on non-text data (numeric and categorical features).
  • Use partial-dependence plots to understand feature importance on individual examples, selections, or the entire evaluation dataset.
  • Use binary classifier threshold setters to find best thresholds for slices of examples to achieve specific fairness constraints, such as demographic parity.

Image Classification with MobileNet

Code: https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/image_demo.py

  • Classification on ImageNet labels using a MobileNet model.
  • Showing using of LIT on image data.
  • Explore results of multiple gradient-based image saliency techniques in the Salience Maps module.