An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
-
Updated
Feb 6, 2025 - Python
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data.
Super-Efficient RLHF Training of LLMs with Parameter Reallocation
Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback
A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.
Official code for ICML 2024 paper, "RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences" (ICML 2024 Spotlight)
annotated tutorial of the huggingface TRL repo for reinforcement learning from human feedback connecting equations from PPO and GAE to the lines of code in the pytorch implementation
Shaping Language Models with Cognitive Insights
RLHF-Blender: A Configurable Interactive Interface for Learning from Diverse Human Feedback
[TSMC] Ask-AC: An Initiative Advisor-in-the-Loop Actor-Critic Framework
Acceleration framework for Human Alignment Learning
This repository contains the implementation of a Reinforcement Learning with Human Feedback (RLHF) system using custom datasets. The project utilizes the trlX library for training a preference model that integrates human feedback directly into the optimization of language models.
LMRax is a framework built on JAX to train transformers language models by reinforcement learning, along with the reward model training.
[AAMAS 2025] Privacy-preserving and Personalized RLHF, with convergence guarantees. The Code contains experiments for training multiple instances of GPT-2 for personalized sentiment aligned text generation.
Comparing various RLHF methods
Code for Bachelor thesis, The Human Factor: Addressing Diversity in Reinforcement Learning from Human Feedback.
Summaries of papers related to the alignment problem in NLP
Unlocking the Power of Generative AI: In-Context Learning, Instruction Fine-Tuning and Reinforcement Learning Fine-Tuning.
Add a description, image, and links to the reinforcement-learning-from-human-feedback topic page so that developers can more easily learn about it.
To associate your repository with the reinforcement-learning-from-human-feedback topic, visit your repo's landing page and select "manage topics."