Language-Models

Model / Methods	Title	Paper Link	Published	Keywords	Venue
ALBERT	ALBERT: A Lite BERT for Self-supervised Learning of Language Representations		2019	google-research
BART	BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension		2019	facebookresearch
BARThez	BARThez: a Skilled Pretrained French Sequence-to-Sequence Model		2020	moussaKam
BARTpho	BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese		2022	VinAIResearch	INTERSPEECH 2022
BERT	BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding		2018
BertGeneration	Leveraging Pre-trained Checkpoints for Sequence Generation Tasks		2019
BERTweet	BERTweet: A pre-trained language model for English Tweets		2020		EMNLP-2020
BigBird	Big Bird: Transformers for Longer Sequences		2020		NeurIPS 2020
BioGPT	BioGPT: generative pre-trained transformer for biomedical text generation and mining		2022
Blenderbot	Recipes for building an open-domain chatbot		2020
BLOOM	Introducing The World’s Largest Open Multilingual Language Model: BLOOM		2022
BORT	Optimal Subarchitecture Extraction for BERT		2020
ByT5	ByT5: Towards a token-free future with pre-trained byte-to-byte models		2021
CamemBERT	CamemBERT: a Tasty French Language Model		2019
CANINE	CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation		2021	Google-research
CodeGen	A Conversational Paradigm for Program Synthesis		2022	salesforce
CodeLlama	Code Llama: Open Foundation Models for Code		2023
Cohere	Command-R: Retrieval Augmented Generation at Production Scale		2024
ConvBERT	ConvBERT: Improving BERT with Span-based Dynamic Convolution		2020
CPM	CPM: A Large-scale Generative Chinese Pre-trained Language Model		2020
CPMAnt	CPM-Ant is an open-source Chinese pre-trained language model (PLM) with 10B parameters.		2020
CTRL	CTRL: A Conditional Transformer Language Model for Controllable Generation		2019
DBRX	DBRX is a transformer-based decoder-only large language model (LLM) that was trained using next-token prediction.		2024
DeBERTa	DeBERTa：Decoding-enhanced BERT with Disentangled Attention		2020
DeBERTa-v2	DeBERTa：Decoding-enhanced BERT with Disentangled Attention		2021
DialoGPT	DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation		2019
DistilBERT	DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter		2019
DPR	Dense Passage Retrieval for Open-Domain Question Answering		2020
ELECTRA	ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators		2020
ERNIE 1.0	ERNIE: Enhanced Representation through Knowledge Integration		2019
ERNIE 2.0	ERNIE 2.0: A Continual Pre-Training Framework for Language Understanding		2020		AAAI 2020
ERNIE 3.0	ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation		2021
ERNIE-Gram	ERNIE-Gram: Pre-Training with Explicitly N-Gram Masked Language Modeling for Natural Language Understanding		2020
ERNIE-health	Building Chinese Biomedical Language Models via Multi-Level Text Discrimination		2022
ErnieM	ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora		2020
ESM	Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences		2022
Falcon	The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only		2023
FastSpeech2Conformer	Recent Developments On Espnet Toolkit Boosted By Conformer		2020
FLAN-T5	Scaling Instruction-Finetuned Language Models		2022
FLAN-UL2	UL2: Unifying Language Learning Paradigms		2022
FlauBERT	FlauBERT：Unsupervised Language Model Pre-training for French		2019
FNet	FNet: Mixing Tokens with Fourier Transforms		2021
FSMT	Facebook FAIR’s WMT19 News Translation Task Submission		2019
Funnel Transformer	Funnel-Transformer：Filtering out Sequential Redemption for Efficient Language Processing		2020
Fuyu	Fuyu-8B: A Multimodal Architecture for AI Agents		2023
Gemma	Gemma：Open Models Based on Gemini Technology and Research		2023
Gemma2	Gemma2: Open Models Based on Gemini Technology and Research		2023
OpenAI GPT	Improving Language Understanding by Generative Pre-Training		2018	OpenAI
GPT Neo	The Pile: An 800GB Dataset of Diverse Text for Language Modeling		2020
GPTBigCode	SantaCoder: don't reach for the stars!		2023
OpenAI GPT2	Language Models are Unsupervised Multitask Learners		2019	1.5B OpenAI
GPT-Sw3	Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish		2022
HerBERT	KLEJ: Comprehensive Benchmark for Polish Language Understanding		2020
I-BERT	I-BERT: Integer-only BERT Quantization		2021
Jamba	Introducing Jamba: AI21's Groundbreaking SSM-Transformer Model		2024
Jukebox	Jukebox: A generative model for music		2020
LED	Longformer: The Long-Document Transformer		2020
LLaMA	LLaMA: Open and Efficient Foundation Language Models		2023
Llama2	LLaMA: Open Foundation and Fine-Tuned Chat Models		2023
Llama3	Introducing Meta Llama 3: The most capable openly available LLM to date		2024
Longformer	Longformer: The Long-Document Transformer		2020
LongT5	LongT5: Efficient Text-To-Text Transformer for Long Sequences		2021
LUKE	LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention		2020
M2M100	Beyond English-Centric Multilingual Machine Translation		2020
MADLAD-400	MADLAD-400：A Multilingual And Document-Level Large Audited Dataset]（MADLAD-400：A Multilingual And Document-Level Large Audited Dataset		2023
Mamba	Mamba：Linear-Time Sequence Modeling with Selective State Spaces		2024
MarianMT	A framework for translation models, using the same models as BART.		2024
MarkupLM	MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding		2021
MBart and MBart-50	Multilingual Denoising Pre-training for Neural Machine Translation		2020
Mega	Mega: Moving Average Equipped Gated Attention		2022
MegatronBERT	Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism		2019
MegatronGPT2	Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism		2019
Mistral	Mistral-7B is a decoder-only Transformer		2023
Mixtral	a high-quality sparse mixture of experts models (SMoE) with open weights.		2023
mLUKE	mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models		2021
MobileBERT	MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices		2020
MPNet	MPNet：Masked and Permuted Pre-training for Language Understanding		2020
MPT	MPT models are GPT-style decoder-only transformers with several improvements		2023
MRA	Multi Resolution Analysis (MRA) for Approximate Self-Attention		2022
MT5	mT5: A massively multilingual pre-trained text-to-text transformer		2020
MVP	MVP: Multi-task Supervised Pre-training for Natural Language Generation		2022
Nezha	NEZHA: Neural Contextualized Representation for Chinese Language Understanding		2019
NLLB	No Language Left Behind: Scaling Human-Centered Machine Translation		2022
NLLB-MOE	No Language Left Behind: Scaling Human-Centered Machine Translation		2022
Nyströmformer	Nyströmformer：A Nyström-Based Algorithm for Approximating Self-Attention		2021
OLMo	OLMo: Accelerating the Science of Language Models		2024
Open-Llama	The Open-Llama model was proposed in the open source Open-Llama project by community developer s-JoL.		2023
OPT	Open Pre-trained Transformer Language Models		2022
Pegasus	PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization		2019
PEGASUS-X	Investigating Efficiently Extending Transformers for Long Input Summarization		2022
Persimmon	Persimmon-8B is a fully permissively-licensed model with approximately 8 billion parameters		2022
Phi	Textbooks Are All You Need		2023
Phi-3	Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone		2024
PhoBERT	PhoBERT: Pre-trained language models for Vietnamese		2022
PLBart	Unified Pre-training for Program Understanding and Generation		2021
ProphetNet	ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training		2020
QDQBERT	Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation		2020
Qwen	Qwen is a comprehensive language model series that encompasses distinct models with varying parameter counts.		2023
Qwen2	Qwen2 is the new model series of large language models from the Qwen team.		2024
Qwen-VL	Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond		2023
Qwen2MoE	Qwen1.5-MoE: Matching 7B Model Performance with 1/3 Activated Parameters		2024
REALM	REALM: Retrieval-Augmented Language Model Pre-Training		2020
RecurrentGemma	RecurrentGemma: Moving Past Transformers for Efficient Open Language Models		2024
Reformer	Reformer: The Efficient Transformer		2020
RemBERT	Rethinking Embedding Coupling in Pre-trained Language Models		2020
RetriBERT	Explain Anything Like I’m Five: A Model for Open Domain Long Form Question Answering		2020
RoBERTa	RoBERTa: A Robustly Optimized BERT Pretraining Approach		2019
RoBERTa-PreLayerNorm	fairseq: A Fast, Extensible Toolkit for Sequence Modeling		2022
RoCBert	RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining		2022
RoFormer	RoFormer: Enhanced Transformer with Rotary Position Embedding		2021
RWKV-LM	RWKV is an RNN with transformer-level LLM performance	-	2022
RWKV-4.0	RWKV: Reinventing RNNs for the Transformer Era		2023
RWKV-5/6 Eagle/Finch	Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence		2024
Splinter	Few-Shot Question Answering by Pretraining Span Selection		2021
SqueezeBERT	SqueezeBERT: What can computer vision teach NLP about efficient neural networks?		2020
StableLM	StableLM-3B-4E1T		2024
Starcoder2	StarCoder 2 and The Stack v2: The Next Generation		2024
SwitchTransformers	Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity		2021
T5	Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer		2023
T5v1.1	Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer		2023
TAPEX	TAPEX: Table Pre-training via Learning a Neural SQL Executor		2021
Transformer XL	Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context		2019
UL2	Unifying Language Learning Paradigms		2022
UMT5	UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining		2024
X-MOD	Lifting the Curse of Multilinguality by Pre-training Modular Transformers		2022
XGLM	Few-shot Learning with Multilingual Language Models		2021
XLM	Cross-lingual Language Model Pretraining		2019
XLM-ProphetNet	ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training		2020
XLM-RoBERTa	Unsupervised Cross-lingual Representation Learning at Scale		2019
XLM-RoBERTa-XL	Larger-Scale Transformers for Multilingual Masked Language Modeling		2021
XLM-V	XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models		2023
XLNet	XLNet: Generalized Autoregressive Pretraining for Language Understanding		2019
YOSO	You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling		2021

Encoder Decoder Models
RAG

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index.md

index.md

Language-Models

Files

index.md

Latest commit

History

index.md

File metadata and controls

Language-Models