Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(weave): Update Models page with example of pairwise eval #3739

Merged
merged 6 commits into from
Feb 21, 2025

Conversation

J2-D2-3PO
Copy link
Contributor

@J2-D2-3PO J2-D2-3PO commented Feb 21, 2025

Description

Adds new subsection to https://weave-docs.wandb.ai/guides/core-types/models describing how to do pairwise evaluation of two models. Based on #3688 and https://github.com/wandb/weave/pull/3688/files

Testing

  • yarn start on local
  • SME review

Summary by CodeRabbit

  • Documentation
    • Introduced a new guide section detailing how to compare model outputs using relative evaluation metrics.
    • Provided practical examples illustrating comparative performance assessments for tasks such as text generation and summarization.
    • Enhanced the overall clarity and breadth of model evaluation strategies presented to users.

@J2-D2-3PO J2-D2-3PO self-assigned this Feb 21, 2025
@circle-job-mirror
Copy link

circle-job-mirror bot commented Feb 21, 2025

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
docs/docs/guides/core-types/models.md (3)

83-83: Spelling Correction Needed.
There is a typographical error in the sentence: “createing” should be corrected to “creating.”


120-128: Include Import for Dataset.
The code sample uses Dataset when constructing the evaluation dataset but does not include an import statement for it. To ensure the sample is fully self-contained and executable, consider adding an import (for example: from weave import Dataset).


134-134: Fenced Code Block Language Specification.
A markdown linter hint indicates that fenced code blocks should specify a language for better syntax highlighting and compliance with style guidelines. Please verify that all fenced code blocks in this document (or in the closing fence at this location) have an appropriate language specifier.

🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

134-134: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)


134-134: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5e10d91 and 11a9b2a.

📒 Files selected for processing (1)
  • docs/docs/guides/core-types/models.md (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{md,mdx}`: Focus on technical accuracy. Check for brok...

**/*.{md,mdx}: Focus on technical accuracy.
Check for broken links.
Verify code examples are up-to-date.
Look for clarity and completeness.
Don't focus on grammar/spelling unless significant.

  • docs/docs/guides/core-types/models.md
🪛 markdownlint-cli2 (0.17.2)
docs/docs/guides/core-types/models.md

134-134: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)


134-134: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)

⏰ Context from checks skipped due to timeout of 90000ms (41)
  • GitHub Check: Trace nox tests (3, 13, trace)
  • GitHub Check: Trace nox tests (3, 12, scorers)
  • GitHub Check: Trace nox tests (3, 12, trace)
  • GitHub Check: Trace nox tests (3, 11, trace)
  • GitHub Check: Trace nox tests (3, 10, trace)
  • GitHub Check: Trace nox tests (3, 13, trace)
  • GitHub Check: Trace nox tests (3, 12, trace)
  • GitHub Check: Trace nox tests (3, 11, trace)
  • GitHub Check: Trace nox tests (3, 10, trace)
  • GitHub Check: Trace nox tests (3, 13, trace)
  • GitHub Check: Trace nox tests (3, 12, trace)
  • GitHub Check: Trace nox tests (3, 11, trace)
  • GitHub Check: Trace nox tests (3, 10, trace)
  • GitHub Check: Trace nox tests (3, 13, trace)
  • GitHub Check: Trace nox tests (3, 12, trace)
  • GitHub Check: Trace nox tests (3, 11, trace)
  • GitHub Check: Trace nox tests (3, 10, trace)
  • GitHub Check: Trace nox tests (3, 13, trace)
  • GitHub Check: Trace nox tests (3, 12, trace)
  • GitHub Check: Trace nox tests (3, 11, trace)
  • GitHub Check: Trace nox tests (3, 10, trace)
  • GitHub Check: Trace nox tests (3, 13, trace)
  • GitHub Check: Trace nox tests (3, 12, trace)
  • GitHub Check: Trace nox tests (3, 11, trace)
  • GitHub Check: Trace nox tests (3, 10, trace)
  • GitHub Check: Trace nox tests (3, 13, trace)
  • GitHub Check: Trace nox tests (3, 12, trace)
  • GitHub Check: Trace nox tests (3, 11, trace)
  • GitHub Check: Trace nox tests (3, 10, trace)
  • GitHub Check: Trace nox tests (3, 13, trace)
  • GitHub Check: Trace nox tests (3, 12, trace)
  • GitHub Check: Trace nox tests (3, 11, trace)
  • GitHub Check: Trace nox tests (3, 10, trace)
  • GitHub Check: Trace nox tests (3, 13, trace)
  • GitHub Check: Trace nox tests (3, 12, trace)
  • GitHub Check: Trace nox tests (3, 11, trace)
  • GitHub Check: Trace nox tests (3, 10, trace)
  • GitHub Check: Trace nox tests (3, 13, trace)
  • GitHub Check: Trace nox tests (3, 12, trace)
  • GitHub Check: Trace nox tests (3, 11, trace)
  • GitHub Check: Trace nox tests (3, 10, trace)
🔇 Additional comments (2)
docs/docs/guides/core-types/models.md (2)

6-7: Informative Model Description.
The added paragraph clearly explains what a Model is in Weave and highlights the benefits of using this API. No changes are needed here.


79-80: Clear Introduction to Pairwise Evaluation.
The new section on pairwise evaluation is well introduced. It concisely explains the rationale behind using relative metrics over absolute ones for subjective tasks such as text generation and summarization.

@J2-D2-3PO J2-D2-3PO marked this pull request as ready for review February 21, 2025 22:50
@J2-D2-3PO J2-D2-3PO requested review from a team as code owners February 21, 2025 22:50
@wandb wandb deleted a comment from coderabbitai bot Feb 21, 2025
@J2-D2-3PO J2-D2-3PO merged commit 88a1e8b into master Feb 21, 2025
133 checks passed
@J2-D2-3PO J2-D2-3PO deleted the DOCS-1290 branch February 21, 2025 23:50
@github-actions github-actions bot locked and limited conversation to collaborators Feb 21, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants