sparse-autoencoder

Here are 27 public repositories matching this topic...

PaulPauls / llama3_interpretability_sae

A complete end-to-end pipeline for LLM interpretability with sparse autoencoders (SAEs) using Llama 3.2, written in pure PyTorch and fully reproducible.

pytorch feature-extraction open-research sparse-autoencoder llama3 llm-interpretability feature-steering

Updated Nov 29, 2024

vgel / repeng

Star

A library for making RepE control vectors

machine-learning transformers language-model sparse-autoencoders sae sparse-autoencoder saes representation-engineering

Updated Dec 14, 2024
Jupyter Notebook

ruizheliUOA / Awesome-Interpretability-in-Large-Language-Models

Star

This repository collects all relevant resources about interpretability in LLMs

dictionary-learning sparse-autoencoder interpretability-and-explainability mechanistic-interpretability

Updated Nov 1, 2024

wblgers / tensorflow_stacked_denoising_autoencoder

Star

Implementation of the stacked denoising autoencoder in Tensorflow

tensorflow autoencoder denoising-autoencoders sparse-autoencoder stacked-autoencoder

Updated Aug 21, 2018
Python

syorami / Autoencoders-Variants

Star

Pytorch implementations of various types of autoencoders

deep-learning pytorch autoencoder variational-autoencoder sparse-autoencoder

Updated Dec 4, 2018
Python

glami / sansa

Star

SANSA - sparse EASE for millions of items

collaborative-filtering recommender-system sparse-matrix sparse-autoencoder approximate-inverse

Updated Dec 7, 2024
Python

explanare / ravel

Star

Evaluate interpretability methods on localizing and disentangling concepts in LLMs.

intervention interpretability sparse-autoencoder probing disentangled-representations causal-intervention

Updated Oct 5, 2024
Python

khoink94 / tensorflow-Deep-learning

Star

Tensorflow Examples

Updated May 11, 2017
Python

snooky23 / K-Sparse-AutoEncoder

Star

Sparse Auto Encoder and regular MNIST classification with mini batch's

deep-neural-networks python3 mnist-dataset pure-python sparse-autoencoder

Updated Apr 5, 2018
Jupyter Notebook

mrquincle / keras-adversarial-autoencoders

Star

Experiments with Adversarial Autoencoders using Keras

jupyter keras autoencoder variational-autoencoder sparse-autoencoder adversarial-autoencoder

Updated Dec 31, 2019
Jupyter Notebook

tim-lawson / mlsae

Star

Multi-Layer Sparse Autoencoders

transformer sae sparse-autoencoder mechanistic-interpretability

Updated Dec 20, 2024
Python

shantanu-ai / DPN-SA

Star

Repository of Deep Propensity Network - Sparse Autoencoder(DPN-SA) to calculate propensity score using sparse autoencoder

deep-learning autoencoder causality causal-inference sparse-autoencoder dpn-sa deep-propensity-network

Updated Jul 28, 2022
Python

zer0int / CLIP-SAE-finetune

Star

Sparse Autoencoders (SAE) vs CLIP fine-tuning fun.

vit fine-tune clip sae adversarial-learning sparse-autoencoder finetune fine-tuning adversarial-attacks vision-transformer

Updated Dec 19, 2024
Python

Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the paper "Evaluating Open-Source Sparse Autoencoders on Disentangling Factual Knowledge in GPT-2 Small"

sparse-autoencoders sae sparse-autoencoder