Skip to content

Latest commit

 

History

History
executable file
·
142 lines (139 loc) · 112 KB

File metadata and controls

executable file
·
142 lines (139 loc) · 112 KB

Language-Models

Model / Methods Title Paper Link Code Link Published Keywords Venue
ALBERT ALBERT: A Lite BERT for Self-supervised Learning of Language Representations Paper CodeStar 2019 google-research
BART BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension Paper CodeStar 2019 facebookresearch
BARThez BARThez: a Skilled Pretrained French Sequence-to-Sequence Model Paper CodeStar 2020 moussaKam
BARTpho BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese Paper CodeStar 2022 VinAIResearch INTERSPEECH 2022
BERT BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Paper CodeStar 2018
BertGeneration Leveraging Pre-trained Checkpoints for Sequence Generation Tasks Paper Code 2019
BERTweet BERTweet: A pre-trained language model for English Tweets Paper CodeStar 2020 EMNLP-2020
BigBird Big Bird: Transformers for Longer Sequences Paper CodeStar 2020 NeurIPS 2020
BioGPT BioGPT: generative pre-trained transformer for biomedical text generation and mining Paper CodeStar 2022
Blenderbot Recipes for building an open-domain chatbot Paper CodeStar 2020
BLOOM Introducing The World’s Largest Open Multilingual Language Model: BLOOM Paper Code 2022
BORT Optimal Subarchitecture Extraction for BERT Paper CodeStar 2020
ByT5 ByT5: Towards a token-free future with pre-trained byte-to-byte models Paper CodeStar 2021
CamemBERT CamemBERT: a Tasty French Language Model Paper Code 2019
CANINE CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation Paper CodeStar 2021 Google-research
CodeGen A Conversational Paradigm for Program Synthesis Paper CodeStar 2022 salesforce
CodeLlama Code Llama: Open Foundation Models for Code Paper CodeStar 2023
Cohere Command-R: Retrieval Augmented Generation at Production Scale Paper CodeStar 2024
ConvBERT ConvBERT: Improving BERT with Span-based Dynamic Convolution Paper CodeStar 2020
CPM CPM: A Large-scale Generative Chinese Pre-trained Language Model Paper CodeStar 2020
CPMAnt CPM-Ant is an open-source Chinese pre-trained language model (PLM) with 10B parameters. Paper CodeStar 2020
CTRL CTRL: A Conditional Transformer Language Model for Controllable Generation Paper CodeStar 2019
DBRX DBRX is a transformer-based decoder-only large language model (LLM) that was trained using next-token prediction. Paper CodeStar 2024
DeBERTa DeBERTa:Decoding-enhanced BERT with Disentangled Attention Paper CodeStar 2020
DeBERTa-v2 DeBERTa:Decoding-enhanced BERT with Disentangled Attention Paper CodeStar 2021
DialoGPT DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation Paper CodeStar 2019
DistilBERT DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter Paper Code 2019
DPR Dense Passage Retrieval for Open-Domain Question Answering Paper CodeStar 2020
ELECTRA ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators Paper CodeStar 2020
ERNIE 1.0 ERNIE: Enhanced Representation through Knowledge Integration Paper CodeStar 2019
ERNIE 2.0 ERNIE 2.0: A Continual Pre-Training Framework for Language Understanding Paper CodeStar 2020 AAAI 2020
ERNIE 3.0 ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation Paper CodeStar 2021
ERNIE-Gram ERNIE-Gram: Pre-Training with Explicitly N-Gram Masked Language Modeling for Natural Language Understanding Paper CodeStar 2020
ERNIE-health Building Chinese Biomedical Language Models via Multi-Level Text Discrimination Paper CodeStar 2022
ErnieM ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora Paper CodeStar 2020
ESM Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences Paper CodeStar 2022
Falcon The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only Paper CodeStar 2023
FastSpeech2Conformer Recent Developments On Espnet Toolkit Boosted By Conformer Paper CodeStar 2020
FLAN-T5 Scaling Instruction-Finetuned Language Models Paper CodeStar 2022
FLAN-UL2 UL2: Unifying Language Learning Paradigms Paper CodeStar 2022
FlauBERT FlauBERT:Unsupervised Language Model Pre-training for French Paper CodeStar 2019
FNet FNet: Mixing Tokens with Fourier Transforms Paper CodeStar 2021
FSMT Facebook FAIR’s WMT19 News Translation Task Submission Paper CodeStar 2019
Funnel Transformer Funnel-Transformer:Filtering out Sequential Redemption for Efficient Language Processing Paper CodeStar 2020
Fuyu Fuyu-8B: A Multimodal Architecture for AI Agents Paper CodeStar 2023
Gemma Gemma:Open Models Based on Gemini Technology and Research Paper Code 2023
Gemma2 Gemma2: Open Models Based on Gemini Technology and Research Paper Code 2023
OpenAI GPT Improving Language Understanding by Generative Pre-Training Paper CodeStar 2018 OpenAI
GPT Neo The Pile: An 800GB Dataset of Diverse Text for Language Modeling Paper CodeStar 2020
GPTBigCode SantaCoder: don't reach for the stars! Paper Code 2023
OpenAI GPT2 Language Models are Unsupervised Multitask Learners Paper Code 2019 1.5B OpenAI
GPT-Sw3 Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish Paper Code 2022
HerBERT KLEJ: Comprehensive Benchmark for Polish Language Understanding Paper CodeStar 2020
I-BERT I-BERT: Integer-only BERT Quantization Paper CodeStar 2021
Jamba Introducing Jamba: AI21's Groundbreaking SSM-Transformer Model Paper Code 2024
Jukebox Jukebox: A generative model for music Paper CodeStar 2020
LED Longformer: The Long-Document Transformer Paper Code 2020
LLaMA LLaMA: Open and Efficient Foundation Language Models Paper CodeStar 2023
Llama2 LLaMA: Open Foundation and Fine-Tuned Chat Models Paper CodeStar 2023
Llama3 Introducing Meta Llama 3: The most capable openly available LLM to date Paper CodeStar 2024
Longformer Longformer: The Long-Document Transformer Paper CodeStar 2020
LongT5 LongT5: Efficient Text-To-Text Transformer for Long Sequences Paper CodeStar 2021
LUKE LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention Paper CodeStar 2020
M2M100 Beyond English-Centric Multilingual Machine Translation Paper Code 2020
MADLAD-400 MADLAD-400:A Multilingual And Document-Level Large Audited Dataset](MADLAD-400:A Multilingual And Document-Level Large Audited Dataset Paper CodeStar 2023
Mamba Mamba:Linear-Time Sequence Modeling with Selective State Spaces Paper CodeStar 2024
MarianMT A framework for translation models, using the same models as BART. Paper Code 2024
MarkupLM MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding Paper CodeStar 2021
MBart and MBart-50 Multilingual Denoising Pre-training for Neural Machine Translation Paper CodeStar 2020
Mega Mega: Moving Average Equipped Gated Attention Paper CodeStar 2022
MegatronBERT Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism Paper CodeStar 2019
MegatronGPT2 Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism Paper CodeStar 2019
Mistral Mistral-7B is a decoder-only Transformer Paper Code 2023
Mixtral a high-quality sparse mixture of experts models (SMoE) with open weights. Paper Code 2023
mLUKE mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models Paper CodeStar 2021
MobileBERT MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices Paper CodeStar 2020
MPNet MPNet:Masked and Permuted Pre-training for Language Understanding Paper CodeStar 2020
MPT MPT models are GPT-style decoder-only transformers with several improvements Paper CodeStar 2023
MRA Multi Resolution Analysis (MRA) for Approximate Self-Attention Paper CodeStar 2022
MT5 mT5: A massively multilingual pre-trained text-to-text transformer Paper CodeStar 2020
MVP MVP: Multi-task Supervised Pre-training for Natural Language Generation Paper CodeStar 2022
Nezha NEZHA: Neural Contextualized Representation for Chinese Language Understanding Paper CodeStar 2019
NLLB No Language Left Behind: Scaling Human-Centered Machine Translation Paper CodeStar 2022
NLLB-MOE No Language Left Behind: Scaling Human-Centered Machine Translation Paper CodeStar 2022
Nyströmformer Nyströmformer:A Nyström-Based Algorithm for Approximating Self-Attention Paper CodeStar 2021
OLMo OLMo: Accelerating the Science of Language Models Paper CodeStar 2024
Open-Llama The Open-Llama model was proposed in the open source Open-Llama project by community developer s-JoL. Paper Code 2023
OPT Open Pre-trained Transformer Language Models Paper CodeStar 2022
Pegasus PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization Paper CodeStar 2019
PEGASUS-X Investigating Efficiently Extending Transformers for Long Input Summarization Paper CodeStar 2022
Persimmon Persimmon-8B is a fully permissively-licensed model with approximately 8 billion parameters Paper CodeStar 2022
Phi Textbooks Are All You Need Paper Code 2023
Phi-3 Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper Code 2024
PhoBERT PhoBERT: Pre-trained language models for Vietnamese Paper CodeStar 2022
PLBart Unified Pre-training for Program Understanding and Generation Paper CodeStar 2021
ProphetNet ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training Paper Code 2020
QDQBERT Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation Paper Code 2020
Qwen Qwen is a comprehensive language model series that encompasses distinct models with varying parameter counts. Paper CodeStar 2023
Qwen2 Qwen2 is the new model series of large language models from the Qwen team. Paper CodeStar 2024
Qwen-VL Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond Paper CodeStar 2023
Qwen2MoE Qwen1.5-MoE: Matching 7B Model Performance with 1/3 Activated Parameters Paper Code 2024
REALM REALM: Retrieval-Augmented Language Model Pre-Training Paper CodeStar 2020
RecurrentGemma RecurrentGemma: Moving Past Transformers for Efficient Open Language Models Paper CodeStar 2024
Reformer Reformer: The Efficient Transformer Paper CodeStar 2020
RemBERT Rethinking Embedding Coupling in Pre-trained Language Models Paper Code 2020
RetriBERT Explain Anything Like I’m Five: A Model for Open Domain Long Form Question Answering Paper Code 2020
RoBERTa RoBERTa: A Robustly Optimized BERT Pretraining Approach Paper CodeStar 2019
RoBERTa-PreLayerNorm fairseq: A Fast, Extensible Toolkit for Sequence Modeling Paper CodeStar 2022
RoCBert RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining Paper Code 2022
RoFormer RoFormer: Enhanced Transformer with Rotary Position Embedding Paper CodeStar 2021
RWKV-LM RWKV is an RNN with transformer-level LLM performance - CodeStar 2022
RWKV-4.0 RWKV: Reinventing RNNs for the Transformer Era Paper CodeStar 2023
RWKV-5/6 Eagle/Finch Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence Paper CodeStar 2024
Splinter Few-Shot Question Answering by Pretraining Span Selection Paper CodeStar 2021
SqueezeBERT SqueezeBERT: What can computer vision teach NLP about efficient neural networks? Paper Code 2020
StableLM StableLM-3B-4E1T Paper Code 2024
Starcoder2 StarCoder 2 and The Stack v2: The Next Generation Paper Code 2024
SwitchTransformers Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity Paper CodeStar 2021
T5 Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Paper CodeStar 2023
T5v1.1 Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Paper CodeStar 2023
TAPEX TAPEX: Table Pre-training via Learning a Neural SQL Executor Paper Code 2021
Transformer XL Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context Paper Code 2019
UL2 Unifying Language Learning Paradigms Paper CodeStar 2022
UMT5 UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining Paper CodeStar 2024
X-MOD Lifting the Curse of Multilinguality by Pre-training Modular Transformers Paper CodeStar 2022
XGLM Few-shot Learning with Multilingual Language Models Paper CodeStar 2021
XLM Cross-lingual Language Model Pretraining Paper CodeStar 2019
XLM-ProphetNet ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training Paper CodeStar 2020
XLM-RoBERTa Unsupervised Cross-lingual Representation Learning at Scale Paper CodeStar 2019
XLM-RoBERTa-XL Larger-Scale Transformers for Multilingual Masked Language Modeling Paper CodeStar 2021
XLM-V XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models Paper CodeStar 2023
XLNet XLNet: Generalized Autoregressive Pretraining for Language Understanding Paper CodeStar 2019
YOSO You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling Paper CodeStar 2021
  • Encoder Decoder Models

  • RAG