Skip to content

Codebase the paper "The Remarkable Robustness of LLMs: Stages of Inference?"

License

Notifications You must be signed in to change notification settings

vdlad/Remarkable-Robustness-of-LLMs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Remarkable Robustness of LLMs: Stages of Inference?

Overview

This repository contains the codebase for the "Remarkable Robustness of Large Language Models" paper. The project aims to investigate the robustness of large language models (LLMs) by swapping and ablation experiments. All supporting experiments in the paper, such as prediction and suppression neuron counting, entropy calculation, and attention visualization, are included in this repository. The codebase is written in Python and uses Jupyter Notebooks for data analysis and visualization.

Repository Structure

└── /
    ├── LICENSE
    ├── README.md
    ├── model_intervention.py
    ├── notebooks
        ├── attention_prev5.ipynb
        ├── casestudies
        ├── entropy_calculation.ipynb
        └── neuron_counter.ipynb
    └──requirements.txt

Modules

Repository Summary
File Summary
model_intervention.py Carry out layer swapping and ablation experiments on any model supported by TransformerLens. Computes metrics and conducts interventions to study model behavior and performance and saves to dataframe.
requirements.txt Package requirements for the repository
neuron_counter.ipynb Determine the number of prediction and suppression neurons in any model supported by TransformerLens
entropy_calculation.ipynb Use the LogitLens technique but then takes the entropy to see the entropy of the model change through the layers.
attention_prev5.ipynb Uses TransformerLens to determine the mean attention on the previous 5 tokens of any input.
subjoiner_heads.ipynb Code for discovering subjoiner heads in language models. A subjoiner head is an attention head responsible for predicting the next token in multi-token words.
probe_neurons.ipynb Probe individual neurons (which you can determine by find_neurons) by training a probe on the activations of the MLP output. It compares individual probes against an ensemble of probes to show that neurons work together to achieve their accuracy, even outperforming the mean model accuracy with the right ensemble.
find_neurons.ipynb Find the relevant neurons to probe by looking at the product of the unembedding matrix and the output weights of the MLPs.

Getting Started

Installation

From source

  1. Clone the repository:
$ git clone https://github.com/vdlad/Remarkable-Robustness-of-LLMs/
  1. Change to the project directory:
$ cd Remarkable-Robustness-of-LLMs
  1. Install the dependencies:
$ pip install -r requirements.txt

Cite Us

@article{lad2024remarkable,
  title={The Remarkable Robustness of LLMs: Stages of Inference?},
  author={Lad, Vedang and Gurnee, Wes and Tegmark, Max},
  journal={arXiv preprint arXiv:2406.19384},
  year={2024}
}

About

Codebase the paper "The Remarkable Robustness of LLMs: Stages of Inference?"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published