Model / Methods | Title | Paper Link | Code Link | Published | Keywords | Venue |
---|---|---|---|---|---|---|
ALBERT | ALBERT: A Lite BERT for Self-supervised Learning of Language Representations | 2019 | google-research | |||
BART | BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension | 2019 | facebookresearch | |||
BARThez | BARThez: a Skilled Pretrained French Sequence-to-Sequence Model | 2020 | moussaKam | |||
BARTpho | BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese | 2022 | VinAIResearch | INTERSPEECH 2022 | ||
BERT | BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | 2018 | ||||
BertGeneration | Leveraging Pre-trained Checkpoints for Sequence Generation Tasks | 2019 | ||||
BERTweet | BERTweet: A pre-trained language model for English Tweets | 2020 | EMNLP-2020 | |||
BigBird | Big Bird: Transformers for Longer Sequences | 2020 | NeurIPS 2020 | |||
BioGPT | BioGPT: generative pre-trained transformer for biomedical text generation and mining | 2022 | ||||
Blenderbot | Recipes for building an open-domain chatbot | 2020 | ||||
BLOOM | Introducing The World’s Largest Open Multilingual Language Model: BLOOM | 2022 | ||||
BORT | Optimal Subarchitecture Extraction for BERT | 2020 | ||||
ByT5 | ByT5: Towards a token-free future with pre-trained byte-to-byte models | 2021 | ||||
CamemBERT | CamemBERT: a Tasty French Language Model | 2019 | ||||
CANINE | CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation | 2021 | Google-research | |||
CodeGen | A Conversational Paradigm for Program Synthesis | 2022 | salesforce | |||
CodeLlama | Code Llama: Open Foundation Models for Code | 2023 | ||||
Cohere | Command-R: Retrieval Augmented Generation at Production Scale | 2024 | ||||
ConvBERT | ConvBERT: Improving BERT with Span-based Dynamic Convolution | 2020 | ||||
CPM | CPM: A Large-scale Generative Chinese Pre-trained Language Model | 2020 | ||||
CPMAnt | CPM-Ant is an open-source Chinese pre-trained language model (PLM) with 10B parameters. | 2020 | ||||
CTRL | CTRL: A Conditional Transformer Language Model for Controllable Generation | 2019 | ||||
DBRX | DBRX is a transformer-based decoder-only large language model (LLM) that was trained using next-token prediction. | 2024 | ||||
DeBERTa | DeBERTa:Decoding-enhanced BERT with Disentangled Attention | 2020 | ||||
DeBERTa-v2 | DeBERTa:Decoding-enhanced BERT with Disentangled Attention | 2021 | ||||
DialoGPT | DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation | 2019 | ||||
DistilBERT | DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter | 2019 | ||||
DPR | Dense Passage Retrieval for Open-Domain Question Answering | 2020 | ||||
ELECTRA | ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators | 2020 | ||||
ERNIE 1.0 | ERNIE: Enhanced Representation through Knowledge Integration | 2019 | ||||
ERNIE 2.0 | ERNIE 2.0: A Continual Pre-Training Framework for Language Understanding | 2020 | AAAI 2020 | |||
ERNIE 3.0 | ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation | 2021 | ||||
ERNIE-Gram | ERNIE-Gram: Pre-Training with Explicitly N-Gram Masked Language Modeling for Natural Language Understanding | 2020 | ||||
ERNIE-health | Building Chinese Biomedical Language Models via Multi-Level Text Discrimination | 2022 | ||||
ErnieM | ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora | 2020 | ||||
ESM | Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences | 2022 | ||||
Falcon | The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only | 2023 | ||||
FastSpeech2Conformer | Recent Developments On Espnet Toolkit Boosted By Conformer | 2020 | ||||
FLAN-T5 | Scaling Instruction-Finetuned Language Models | 2022 | ||||
FLAN-UL2 | UL2: Unifying Language Learning Paradigms | 2022 | ||||
FlauBERT | FlauBERT:Unsupervised Language Model Pre-training for French | 2019 | ||||
FNet | FNet: Mixing Tokens with Fourier Transforms | 2021 | ||||
FSMT | Facebook FAIR’s WMT19 News Translation Task Submission | 2019 | ||||
Funnel Transformer | Funnel-Transformer:Filtering out Sequential Redemption for Efficient Language Processing | 2020 | ||||
Fuyu | Fuyu-8B: A Multimodal Architecture for AI Agents | 2023 | ||||
Gemma | Gemma:Open Models Based on Gemini Technology and Research | 2023 | ||||
Gemma2 | Gemma2: Open Models Based on Gemini Technology and Research | 2023 | ||||
OpenAI GPT | Improving Language Understanding by Generative Pre-Training | 2018 | OpenAI | |||
GPT Neo | The Pile: An 800GB Dataset of Diverse Text for Language Modeling | 2020 | ||||
GPTBigCode | SantaCoder: don't reach for the stars! | 2023 | ||||
OpenAI GPT2 | Language Models are Unsupervised Multitask Learners | 2019 | 1.5B OpenAI | |||
GPT-Sw3 | Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish | 2022 | ||||
HerBERT | KLEJ: Comprehensive Benchmark for Polish Language Understanding | 2020 | ||||
I-BERT | I-BERT: Integer-only BERT Quantization | 2021 | ||||
Jamba | Introducing Jamba: AI21's Groundbreaking SSM-Transformer Model | 2024 | ||||
Jukebox | Jukebox: A generative model for music | 2020 | ||||
LED | Longformer: The Long-Document Transformer | 2020 | ||||
LLaMA | LLaMA: Open and Efficient Foundation Language Models | 2023 | ||||
Llama2 | LLaMA: Open Foundation and Fine-Tuned Chat Models | 2023 | ||||
Llama3 | Introducing Meta Llama 3: The most capable openly available LLM to date | 2024 | ||||
Longformer | Longformer: The Long-Document Transformer | 2020 | ||||
LongT5 | LongT5: Efficient Text-To-Text Transformer for Long Sequences | 2021 | ||||
LUKE | LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention | 2020 | ||||
M2M100 | Beyond English-Centric Multilingual Machine Translation | 2020 | ||||
MADLAD-400 | MADLAD-400:A Multilingual And Document-Level Large Audited Dataset](MADLAD-400:A Multilingual And Document-Level Large Audited Dataset | 2023 | ||||
Mamba | Mamba:Linear-Time Sequence Modeling with Selective State Spaces | 2024 | ||||
MarianMT | A framework for translation models, using the same models as BART. | 2024 | ||||
MarkupLM | MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding | 2021 | ||||
MBart and MBart-50 | Multilingual Denoising Pre-training for Neural Machine Translation | 2020 | ||||
Mega | Mega: Moving Average Equipped Gated Attention | 2022 | ||||
MegatronBERT | Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism | 2019 | ||||
MegatronGPT2 | Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism | 2019 | ||||
Mistral | Mistral-7B is a decoder-only Transformer | 2023 | ||||
Mixtral | a high-quality sparse mixture of experts models (SMoE) with open weights. | 2023 | ||||
mLUKE | mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models | 2021 | ||||
MobileBERT | MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices | 2020 | ||||
MPNet | MPNet:Masked and Permuted Pre-training for Language Understanding | 2020 | ||||
MPT | MPT models are GPT-style decoder-only transformers with several improvements | 2023 | ||||
MRA | Multi Resolution Analysis (MRA) for Approximate Self-Attention | 2022 | ||||
MT5 | mT5: A massively multilingual pre-trained text-to-text transformer | 2020 | ||||
MVP | MVP: Multi-task Supervised Pre-training for Natural Language Generation | 2022 | ||||
Nezha | NEZHA: Neural Contextualized Representation for Chinese Language Understanding | 2019 | ||||
NLLB | No Language Left Behind: Scaling Human-Centered Machine Translation | 2022 | ||||
NLLB-MOE | No Language Left Behind: Scaling Human-Centered Machine Translation | 2022 | ||||
Nyströmformer | Nyströmformer:A Nyström-Based Algorithm for Approximating Self-Attention | 2021 | ||||
OLMo | OLMo: Accelerating the Science of Language Models | 2024 | ||||
Open-Llama | The Open-Llama model was proposed in the open source Open-Llama project by community developer s-JoL. | 2023 | ||||
OPT | Open Pre-trained Transformer Language Models | 2022 | ||||
Pegasus | PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization | 2019 | ||||
PEGASUS-X | Investigating Efficiently Extending Transformers for Long Input Summarization | 2022 | ||||
Persimmon | Persimmon-8B is a fully permissively-licensed model with approximately 8 billion parameters | 2022 | ||||
Phi | Textbooks Are All You Need | 2023 | ||||
Phi-3 | Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone | 2024 | ||||
PhoBERT | PhoBERT: Pre-trained language models for Vietnamese | 2022 | ||||
PLBart | Unified Pre-training for Program Understanding and Generation | 2021 | ||||
ProphetNet | ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training | 2020 | ||||
QDQBERT | Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation | 2020 | ||||
Qwen | Qwen is a comprehensive language model series that encompasses distinct models with varying parameter counts. | 2023 | ||||
Qwen2 | Qwen2 is the new model series of large language models from the Qwen team. | 2024 | ||||
Qwen-VL | Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond | 2023 | ||||
Qwen2MoE | Qwen1.5-MoE: Matching 7B Model Performance with 1/3 Activated Parameters | 2024 | ||||
REALM | REALM: Retrieval-Augmented Language Model Pre-Training | 2020 | ||||
RecurrentGemma | RecurrentGemma: Moving Past Transformers for Efficient Open Language Models | 2024 | ||||
Reformer | Reformer: The Efficient Transformer | 2020 | ||||
RemBERT | Rethinking Embedding Coupling in Pre-trained Language Models | 2020 | ||||
RetriBERT | Explain Anything Like I’m Five: A Model for Open Domain Long Form Question Answering | 2020 | ||||
RoBERTa | RoBERTa: A Robustly Optimized BERT Pretraining Approach | 2019 | ||||
RoBERTa-PreLayerNorm | fairseq: A Fast, Extensible Toolkit for Sequence Modeling | 2022 | ||||
RoCBert | RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining | 2022 | ||||
RoFormer | RoFormer: Enhanced Transformer with Rotary Position Embedding | 2021 | ||||
RWKV-LM | RWKV is an RNN with transformer-level LLM performance | - | 2022 | |||
RWKV-4.0 | RWKV: Reinventing RNNs for the Transformer Era | 2023 | ||||
RWKV-5/6 Eagle/Finch | Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence | 2024 | ||||
Splinter | Few-Shot Question Answering by Pretraining Span Selection | 2021 | ||||
SqueezeBERT | SqueezeBERT: What can computer vision teach NLP about efficient neural networks? | 2020 | ||||
StableLM | StableLM-3B-4E1T | 2024 | ||||
Starcoder2 | StarCoder 2 and The Stack v2: The Next Generation | 2024 | ||||
SwitchTransformers | Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity | 2021 | ||||
T5 | Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | 2023 | ||||
T5v1.1 | Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | 2023 | ||||
TAPEX | TAPEX: Table Pre-training via Learning a Neural SQL Executor | 2021 | ||||
Transformer XL | Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context | 2019 | ||||
UL2 | Unifying Language Learning Paradigms | 2022 | ||||
UMT5 | UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining | 2024 | ||||
X-MOD | Lifting the Curse of Multilinguality by Pre-training Modular Transformers | 2022 | ||||
XGLM | Few-shot Learning with Multilingual Language Models | 2021 | ||||
XLM | Cross-lingual Language Model Pretraining | 2019 | ||||
XLM-ProphetNet | ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training | 2020 | ||||
XLM-RoBERTa | Unsupervised Cross-lingual Representation Learning at Scale | 2019 | ||||
XLM-RoBERTa-XL | Larger-Scale Transformers for Multilingual Masked Language Modeling | 2021 | ||||
XLM-V | XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models | 2023 | ||||
XLNet | XLNet: Generalized Autoregressive Pretraining for Language Understanding | 2019 | ||||
YOSO | You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling | 2021 | ||||
-
Encoder Decoder Models
-
RAG