A complete end-to-end pipeline for LLM interpretability with sparse autoencoders (SAEs) using Llama 3.2, written in pure PyTorch and fully reproducible.
-
Updated
Nov 29, 2024
A complete end-to-end pipeline for LLM interpretability with sparse autoencoders (SAEs) using Llama 3.2, written in pure PyTorch and fully reproducible.
A library for making RepE control vectors
This repository collects all relevant resources about interpretability in LLMs
Implementation of the stacked denoising autoencoder in Tensorflow
Pytorch implementations of various types of autoencoders
SANSA - sparse EASE for millions of items
Evaluate interpretability methods on localizing and disentangling concepts in LLMs.
Tensorflow Examples
Sparse Auto Encoder and regular MNIST classification with mini batch's
Experiments with Adversarial Autoencoders using Keras
Multi-Layer Sparse Autoencoders
Repository of Deep Propensity Network - Sparse Autoencoder(DPN-SA) to calculate propensity score using sparse autoencoder
Sparse Autoencoders (SAE) vs CLIP fine-tuning fun.
Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the paper "Evaluating Open-Source Sparse Autoencoders on Disentangling Factual Knowledge in GPT-2 Small"
Explore visualization tools for understanding Transformer-based large language models (LLMs)
Collection of autoencoder models in Tensorflow
A tiny easily hackable implementation of a feature dashboard.
Implemented semi-supervised learning for digit recognition using Sparse Autoencoder
A resource repository of sparse autoencoders for large language models
exploration WYSIWYG editor
Add a description, image, and links to the sparse-autoencoder topic page so that developers can more easily learn about it.
To associate your repository with the sparse-autoencoder topic, visit your repo's landing page and select "manage topics."