Merge branch 'master' into feat/smolagents-integrations-2

wandb · Feb 21, 2025 · 8e56fb8 · 8e56fb8
2 parents 137f661 + 5d997c4
commit 8e56fb8
Show file tree

Hide file tree

Showing 114 changed files with 7,776 additions and 4,126 deletions.
diff --git a/.gitignore b/.gitignore
@@ -17,4 +17,5 @@ gha-creds-*.json
 .coverage
 .nox
 *.log
-*/file::memory:?cache=shared
+*/file::memory:?cache=shared
+tests/weave_models/
diff --git a/docs/docs/guides/core-types/env-vars.md b/docs/docs/guides/core-types/env-vars.md
@@ -20,6 +20,7 @@ os.environ["WEAVE_PRINT_CALL_LINK"] = "false"
 
 | Variable | Type | Default | Description |
 |----------|------|---------|-------------|
+| `WANDB_API_KEY` | `string` | `None` | If set, automatically log into W&B Weave without being prompted for your API key. To generate an API key, log in to your W&B account and go to [https://wandb.ai/authorize](https://wandb.ai/authorize). |
 | `WEAVE_DISABLED` | `bool` | `false` | When set to `true`, disables all Weave tracing. Weave ops will behave like regular functions. |
 | `WEAVE_PRINT_CALL_LINK` | `bool` | `true` | Controls whether to print a link to the Weave UI when calling a Weave op. |
 | `WEAVE_CAPTURE_CODE` | `bool` | `true` | Controls whether to save code for ops so they can be reloaded for later use. |

diff --git a/docs/docs/guides/core-types/evaluations.md b/docs/docs/guides/core-types/evaluations.md
@@ -1,4 +1,4 @@
-# Evaluations
+# Offline Batch Evaluation
 
 Evaluation-driven development helps you reliably iterate on an application. The `Evaluation` class is designed to assess the performance of a `Model` on a given `Dataset` or set of examples using scoring functions.
 

diff --git a/docs/docs/guides/evaluation/guardrails_and_monitors.md b/docs/docs/guides/evaluation/guardrails_and_monitors.md
@@ -1,7 +1,7 @@
 import Tabs from '@theme/Tabs';
 import TabItem from '@theme/TabItem';
 
-# Guardrails and Monitors
+# Online Evaluation: Guardrails and Monitors
 
 ![Feedback](./../../../static/img/guardrails_scorers.png)
 
@@ -100,7 +100,7 @@ result, call = generate_text.call("Say hello")
 await call.apply_scorer(LengthScorer())
 ```
 
-## Using Scorers as Guardrails
+## Using Scorers as Guardrails {#using-scorers-as-guardrails}
 
 Guardrails act as safety checks that run before allowing LLM output to reach users. Here's a practical example:
 
@@ -146,7 +146,7 @@ When applying scorers:
 - You can view scorer results in the UI or query them via the API
 :::
 
-## Using Scorers as Monitors
+## Using Scorers as Monitors {#using-scorers-as-monitors}
 
 Monitors help track quality metrics over time without blocking operations. This is useful for:
 - Identifying quality trends

diff --git a/docs/docs/guides/evaluation/scorers.md b/docs/docs/guides/evaluation/scorers.md
@@ -1,9 +1,7 @@
 import Tabs from '@theme/Tabs';
 import TabItem from '@theme/TabItem';
 
-# Evaluation Metrics
-
-## Evaluations in Weave
+# Scoring Overview
 
 In Weave, Scorers are used to evaluate AI outputs and return evaluation metrics. They take the AI's output, analyze it, and return a dictionary of results. Scorers can use your input data as reference if needed and can also output extra information, such as explanations or reasonings from the evaluation.
 
@@ -25,11 +23,12 @@ In Weave, Scorers are used to evaluate AI outputs and return evaluation metrics.
 ## Create your own Scorers
 
 :::tip[Ready-to-Use Scorers]
-While this guide shows you how to create custom scorers, Weave comes with a variety of [predefined scorers](./builtin_scorers.mdx) that you can use right away, including:
+While this guide shows you how to create custom scorers, Weave comes with a variety of [predefined scorers](./builtin_scorers.mdx) and [local SLM scorers](./weave_local_scorers.md) that you can use right away, including:
 - [Hallucination detection](./builtin_scorers.mdx#hallucinationfreescorer)
 - [Summarization quality](./builtin_scorers.mdx#summarizationscorer)
 - [Embedding similarity](./builtin_scorers.mdx#embeddingsimilarityscorer)
-- [Relevancy evaluation](./builtin_scorers.mdx#ragas---contextrelevancyscorer)
+- [Toxicity detection (local)](./weave_local_scorers.md#weavetoxicityscorerv1)
+- [Context Relevance scoring (local)](./weave_local_scorers.md#weavecontextrelevancescorerv1)
 - And more!
 :::
 

diff --git a/docs/docs/guides/evaluation/weave_local_scorers.md b/docs/docs/guides/evaluation/weave_local_scorers.md
diff --git a/docs/docs/guides/integrations/azure.md b/docs/docs/guides/integrations/azure.md
@@ -1,30 +1,36 @@
 # Microsoft Azure
 
-Weights & Biases integrates with Microsoft Azure OpenAI services, helping teams to manage, debug, and optimize their Azure AI workflows at scale. This guide introduces the W&B integration, what it means for Weave users, its key features, and how to get started.
+Weights & Biases (W&B) Weave integrates with Microsoft Azure OpenAI services, helping teams to optimize their Azure AI applications. Using W&B, you can 
 
 :::tip
 For the latest tutorials, visit [Weights & Biases on Microsoft Azure](https://wandb.ai/site/partners/azure).
 :::
 
-## Key features
-
-- **LLM evaluations**: Evaluate and monitor LLM-powered applications using Weave, optimized for Azure infrastructure.  
-- **Seamless integration**: Deploy W&B Models on a dedicated Azure tenant with built-in integrations for Azure AI Studio, Azure ML, Azure OpenAI Service, and other Azure AI services.  
-- **Enhanced performance**: Use Azure’s infrastructure to train and deploy models faster, with auto-scaling clusters and optimized resources.  
-- **Scalable experiment tracking**: Automatically log hyperparameters, metrics, and artifacts for Azure AI Studio and Azure ML runs.  
-- **LLM fine-tuning**: Fine-tune models with W&B Models.
-- **Central repository for models and datasets**: Manage and version models and datasets with W&B Registry and Azure AI Studio.  
-- **Collaborative workspaces**: Support teamwork with shared workspaces, experiment commenting, and Microsoft Teams integration.  
-- **Governance framework**: Ensure security with fine-grained access controls, audit trails, and Microsoft Entra ID integration.  
-
 ## Getting started
 
-To use W&B with Azure, add the W&B integration via the [Azure Marketplace](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/weightsandbiasesinc1641502883483.weights_biases_for_azure?tab=Overview).
+To get started using Azure with Weave, simply decorate the function(s) you want to track with `weave.op`.
 
-For a detailed guide describing how to integrate Azure OpenAI fine-tuning with W&B, see [Integrating Weights & Biases with Azure AI Services](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/weights-and-biases-integration).
+```python
+@weave.op()
+def call_azure_chat(model_id: str, messages: list, max_tokens: int = 1000, temperature: float = 0.5):
+    response = client.chat.completions.create(
+        model=model_id,
+        messages=messages,
+        max_tokens=max_tokens,
+        temperature=temperature
+    )
+    return {"status": "success", "response": response.choices[0].message.content}
+
+```
 
 ## Learn more
 
-- [Weights & Biases + Microsoft Azure Overview](https://wandb.ai/site/partners/azure)
-- [How W&B and Microsoft Azure Are Empowering Enterprises](https://techcommunity.microsoft.com/blog/azure-ai-services-blog/how-weights--biases-and-microsoft-azure-are-empowering-enterprises-to-fine-tune-/4303716)
-- [Microsoft Azure OpenAI Service Documentation](https://learn.microsoft.com/en-us/azure/ai-services/openai/)
+Learn more about advanced Azure with Weave topics using the resources below.
+
+### Use the Azure AI Model Inference API with Weave
+
+Learn how to use the [Azure AI Model Inference API] with Weave to gain insights into Azure models in [this guide](https://wandb.ai/byyoung3/ML-NEWS2/reports/A-guide-to-using-the-Azure-AI-model-inference-API--Vmlldzo4OTY1MjEy#tutorial:-implementing-azure-ai-model-inference-api-with-w&b-weave-).
+
+### Trace Azure OpenAI models with Weave
+
+Learn how to trace Azure OpenAI models using Weave in [this guide](https://wandb.ai/a-sh0ts/azure-weave-cookbook/reports/How-to-use-Azure-OpenAI-and-Azure-AI-Studio-with-W-B-Weave--Vmlldzo4MTI0NDgy).  
diff --git a/docs/docs/guides/integrations/bedrock.md b/docs/docs/guides/integrations/bedrock.md
@@ -2,14 +2,12 @@
 
 Weave automatically tracks and logs LLM calls made via Amazon Bedrock, AWS's fully managed service that offers foundation models from leading AI companies through a unified API.
 
+There are multiple ways to log LLM calls to Weave from Amazon Bedrock. You can use `weave.op` to create reusable operations for tracking any calls to a Bedrock model. Optionally, if you're using Anthropic models, you can use Weave’s built-in integration with Anthropic. 
+
 :::tip
 For the latest tutorials, visit [Weights & Biases on Amazon Web Services](https://wandb.ai/site/partners/aws/).
 :::
 
-:::note
-Do you want to experiment with Amazon Bedrock models on Weave without any set up? Try the [LLM Playground](../tools/playground.md).
-:::
-
 ## Traces
 
 Weave will automatically capture traces for Bedrock API calls. You can use the Bedrock client as usual after initializing Weave and patching the client:
@@ -143,3 +141,15 @@ print(result)
 ```
 
 This approach allows you to version your experiments and easily track different configurations of your Bedrock-based application.
+
+## Learn more
+
+Learn more about using Amazon Bedrock with Weave
+
+### Try Bedrock in the Weave Playground
+
+Do you want to experiment with Amazon Bedrock models in the Weave UI without any set up? Try the [LLM Playground](../tools/playground.md).
+
+### Report: Compare LLMs on Bedrock for text summarization with Weave
+
+The [Compare LLMs on Bedrock for text summarization with Weave](https://wandb.ai/byyoung3/ML_NEWS3/reports/Compare-LLMs-on-Amazon-Bedrock-for-text-summarization-with-W-B-Weave--VmlldzoxMDI1MTIzNw) report explains how to use Bedrock in combination with Weave to evaluate and compare LLMs for summarization tasks, code samples included.
diff --git a/...docs/guides/integrations/google-gemini.md → docs/docs/guides/integrations/google.md b/...docs/guides/integrations/google-gemini.md → docs/docs/guides/integrations/google.md
@@ -1,21 +1,21 @@
-# Google Gemini
+# Google 
 
 :::tip
 For the latest tutorials, visit [Weights & Biases on Google Cloud](https://wandb.ai/site/partners/googlecloud/).
 :::
 
 :::note
-Do you want to experiment with Google Gemini models on Weave without any set up? Try the [LLM Playground](../tools/playground.md).
+Do you want to experiment with Google AI models on Weave without any set up? Try the [LLM Playground](../tools/playground.md).
 :::
 
-Google offers two ways of calling Gemini via API:
+This page describes how to use W&B Weave with the Google Vertex AI API and the Google Gemini API.
 
-1. Via the [Vertex APIs](https://cloud.google.com/vertex-ai/docs).
-2. Via the [Gemini API SDK](https://ai.google.dev/gemini-api/docs/quickstart?lang=python).
+You can use Weave to evaluate, monitor, and iterate on your Google GenAI applications. Weave automatically captures traces for the:
 
-## Tracing
+1. [Google Vertex AI API](https://cloud.google.com/vertex-ai/docs), which provides access to Google’s Gemini models and [various partner models](https://cloud.google.com/vertex-ai/generative-ai/docs/partner-models/use-partner-models).
+2. [Google Gemini API](https://ai.google.dev/gemini-api/docs/quickstart?lang=python), which is accessible via Python SDK, Node.js SDK, Go SDK, and REST.
 
-It’s important to store traces of language model applications in a central location, both during development and in production. These traces can be useful for debugging, and as a dataset that will help you improve your application.
+## Get started
 
 Weave will automatically capture traces for [Gemini API SDK](https://ai.google.dev/gemini-api/docs/quickstart?lang=python). To start tracking, calling `weave.init(project_name="<YOUR-WANDB-PROJECT-NAME>")` and use the library as normal.
 
@@ -120,3 +120,4 @@ Given a weave reference to any `weave.Model` object, you can spin up a fastapi s
 ```shell
 weave serve weave:///your_entity/project-name/YourModel:<hash>
 ```
+