docs(weave): Update Models page with example of pairwise eval #3739

J2-D2-3PO · 2025-02-21T22:29:52Z

Description

Fixes WB-NNNNN

Adds new subsection to https://weave-docs.wandb.ai/guides/core-types/models describing how to do pairwise evaluation of two models. Based on #3688 and https://github.com/wandb/weave/pull/3688/files

Testing

yarn start on local
SME review

Summary by CodeRabbit

Documentation
- Introduced a new guide section detailing how to compare model outputs using relative evaluation metrics.
- Provided practical examples illustrating comparative performance assessments for tasks such as text generation and summarization.
- Enhanced the overall clarity and breadth of model evaluation strategies presented to users.

circle-job-mirror · 2025-02-21T22:36:35Z

Preview this PR with FeatureBee: https://beta.wandb.ai/?betaVersion=d0f04fe4621973759c4d24fa4d14950462af1528

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

docs/docs/guides/core-types/models.md (3)

83-83: Spelling Correction Needed.
There is a typographical error in the sentence: “createing” should be corrected to “creating.”

120-128: Include Import for Dataset.
The code sample uses Dataset when constructing the evaluation dataset but does not include an import statement for it. To ensure the sample is fully self-contained and executable, consider adding an import (for example: from weave import Dataset).

134-134: Fenced Code Block Language Specification.
A markdown linter hint indicates that fenced code blocks should specify a language for better syntax highlighting and compliance with style guidelines. Please verify that all fenced code blocks in this document (or in the closing fence at this location) have an appropriate language specifier.

🧰 Tools

🪛 markdownlint-cli2 (0.17.2)

134-134: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

134-134: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5e10d91 and 11a9b2a.

📒 Files selected for processing (1)

docs/docs/guides/core-types/models.md (2 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

`**/*.{md,mdx}`: Focus on technical accuracy. Check for brok...

**/*.{md,mdx}: Focus on technical accuracy.
Check for broken links.
Verify code examples are up-to-date.
Look for clarity and completeness.
Don't focus on grammar/spelling unless significant.

docs/docs/guides/core-types/models.md

🪛 markdownlint-cli2 (0.17.2)

docs/docs/guides/core-types/models.md

134-134: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

134-134: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)

⏰ Context from checks skipped due to timeout of 90000ms (41)

GitHub Check: Trace nox tests (3, 13, trace)
GitHub Check: Trace nox tests (3, 12, scorers)
GitHub Check: Trace nox tests (3, 12, trace)
GitHub Check: Trace nox tests (3, 11, trace)
GitHub Check: Trace nox tests (3, 10, trace)
GitHub Check: Trace nox tests (3, 13, trace)
GitHub Check: Trace nox tests (3, 12, trace)
GitHub Check: Trace nox tests (3, 11, trace)
GitHub Check: Trace nox tests (3, 10, trace)
GitHub Check: Trace nox tests (3, 13, trace)
GitHub Check: Trace nox tests (3, 12, trace)
GitHub Check: Trace nox tests (3, 11, trace)
GitHub Check: Trace nox tests (3, 10, trace)
GitHub Check: Trace nox tests (3, 13, trace)
GitHub Check: Trace nox tests (3, 12, trace)
GitHub Check: Trace nox tests (3, 11, trace)
GitHub Check: Trace nox tests (3, 10, trace)
GitHub Check: Trace nox tests (3, 13, trace)
GitHub Check: Trace nox tests (3, 12, trace)
GitHub Check: Trace nox tests (3, 11, trace)
GitHub Check: Trace nox tests (3, 10, trace)
GitHub Check: Trace nox tests (3, 13, trace)
GitHub Check: Trace nox tests (3, 12, trace)
GitHub Check: Trace nox tests (3, 11, trace)
GitHub Check: Trace nox tests (3, 10, trace)
GitHub Check: Trace nox tests (3, 13, trace)
GitHub Check: Trace nox tests (3, 12, trace)
GitHub Check: Trace nox tests (3, 11, trace)
GitHub Check: Trace nox tests (3, 10, trace)
GitHub Check: Trace nox tests (3, 13, trace)
GitHub Check: Trace nox tests (3, 12, trace)
GitHub Check: Trace nox tests (3, 11, trace)
GitHub Check: Trace nox tests (3, 10, trace)
GitHub Check: Trace nox tests (3, 13, trace)
GitHub Check: Trace nox tests (3, 12, trace)
GitHub Check: Trace nox tests (3, 11, trace)
GitHub Check: Trace nox tests (3, 10, trace)
GitHub Check: Trace nox tests (3, 13, trace)
GitHub Check: Trace nox tests (3, 12, trace)
GitHub Check: Trace nox tests (3, 11, trace)
GitHub Check: Trace nox tests (3, 10, trace)

🔇 Additional comments (2)

docs/docs/guides/core-types/models.md (2)

6-7: Informative Model Description.
The added paragraph clearly explains what a Model is in Weave and highlights the benefits of using this API. No changes are needed here.

79-80: Clear Introduction to Pairwise Evaluation.
The new section on pairwise evaluation is well introduced. It concisely explains the rationale behind using relative metrics over absolute ones for subjective tasks such as text generation and summarization.

docs/docs/guides/core-types/models.md

docs(weave): Update Models page with example of pairwise eval

bb8152f

J2-D2-3PO self-assigned this Feb 21, 2025

coderabbitai bot reviewed Feb 21, 2025

View reviewed changes

docs/docs/guides/core-types/models.md Outdated Show resolved Hide resolved

nits

966cab7

J2-D2-3PO force-pushed the DOCS-1290 branch from 11a9b2a to 966cab7 Compare February 21, 2025 22:49

Merge branch 'master' into DOCS-1290

f876322

J2-D2-3PO marked this pull request as ready for review February 21, 2025 22:50

J2-D2-3PO requested review from a team as code owners February 21, 2025 22:50

J2-D2-3PO requested a review from andrewtruong February 21, 2025 22:50

andrewtruong reviewed Feb 21, 2025

View reviewed changes

docs/docs/guides/core-types/models.md Outdated Show resolved Hide resolved

andrewtruong reviewed Feb 21, 2025

View reviewed changes

docs/docs/guides/core-types/models.md Show resolved Hide resolved

J2-D2-3PO commented Feb 21, 2025

View reviewed changes

docs/docs/guides/core-types/models.md Outdated Show resolved Hide resolved

J2-D2-3PO added 2 commits February 21, 2025 15:58

SME review: improve description

afef479

document _get_other_model_output

292bb1c

J2-D2-3PO commented Feb 21, 2025

View reviewed changes

docs/docs/guides/core-types/models.md Outdated Show resolved Hide resolved

Nit: code spacing

31620b3

J2-D2-3PO requested a review from andrewtruong February 21, 2025 23:28

andrewtruong approved these changes Feb 21, 2025

View reviewed changes

wandb deleted a comment from coderabbitai bot Feb 21, 2025

J2-D2-3PO merged commit 88a1e8b into master Feb 21, 2025
133 checks passed

J2-D2-3PO deleted the DOCS-1290 branch February 21, 2025 23:50

github-actions bot locked and limited conversation to collaborators Feb 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(weave): Update Models page with example of pairwise eval #3739

docs(weave): Update Models page with example of pairwise eval #3739

J2-D2-3PO commented Feb 21, 2025 •

edited

Loading

circle-job-mirror bot commented Feb 21, 2025 •

edited

Loading

coderabbitai bot left a comment

docs(weave): Update Models page with example of pairwise eval #3739

docs(weave): Update Models page with example of pairwise eval #3739

Conversation

J2-D2-3PO commented Feb 21, 2025 • edited Loading

Description

Testing

Summary by CodeRabbit

circle-job-mirror bot commented Feb 21, 2025 • edited Loading

coderabbitai bot left a comment

Choose a reason for hiding this comment

J2-D2-3PO commented Feb 21, 2025 •

edited

Loading

circle-job-mirror bot commented Feb 21, 2025 •

edited

Loading