A library & tools to evaluate predictive language models.
-
Updated
Aug 9, 2023 - Python
A library & tools to evaluate predictive language models.
A data construction and evaluation framework to quantify privacy norm awareness of language models (LMs) and emerging privacy risk of LM agents. (NeurIPS 2024 D&B)
Code and data for "KoDialogBench: Evaluating Conversational Understanding of Language Models with Korean Dialogue Benchmark" (LREC-COLING 2024)
Curriculum is a new format of NLI benchmark for evaluation of broad-coverage linguistic phenomena. This linguistic-phenomena-driven benchmark can serve as an effective tool for diagnosing model behavior and verifying model learning quality.
Language Modeling
BA thesis PoC code about using Rust and language models to contextualize IoT data
A thesis investigating the use of large language models for summarizing application logs.
A from scratch LM finetuning project to understand neural nets, text generation and evals
Add a description, image, and links to the language-model-evaluation topic page so that developers can more easily learn about it.
To associate your repository with the language-model-evaluation topic, visit your repo's landing page and select "manage topics."