Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: [anthropic] Claude 3.7 Sonnet with extended thinking #1370

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

salman1993
Copy link
Collaborator

@salman1993 salman1993 commented Feb 25, 2025

Claude 3.7 Sonnet works out of the box without extended thinking (changes in this PR not needed). To enable extended thinking, we have set some env vars:

cargo build

# configure Anthropic provider with model 'claude-3-7-sonnet-latest'

# start a session
GOOSE_CLI_SHOW_THINKING=true ANTHROPIC_THINKING_ENABLED=true ./target/debug/goose session

# test on a string
GOOSE_CLI_SHOW_THINKING=true ANTHROPIC_THINKING_ENABLED=true ./target/debug/goose run --text "can you explain the code in crates/goose-cli/src/session/output.rs?"

Screenshot 2025-02-24 at 7 08 26 PM

see longer discussion post on reasoning: #1300

@salman1993
Copy link
Collaborator Author

salman1993 commented Feb 25, 2025

tested in goose-server with this script:

  1. Build & start goosed: cargo build; ANTHROPIC_THINKING_ENABLED=true ./target/debug/goosed agent

  2. Run script in new terminal

#!/bin/bash
set -euxo pipefail
IFS=$'\n\t'

# Send request to create an agent
curl --request POST \
  --url http://localhost:3000/agent \
  --header 'Content-Type: application/json' \
  --header 'X-Secret-Key: test' \
  --data '{
    "version": "truncate",
    "provider": "anthropic"
  }'

sleep 5

# Add a system
curl --request POST \
  --url http://localhost:3000/extensions/add \
  --header 'Content-Type: application/json' \
  --header 'X-Secret-Key: test' \
  --data '{
    "type": "builtin",
    "name": "developer"
  }'

sleep 5

# Send a user message 
curl --request POST \
  --url http://localhost:3000/reply \
  --header 'Accept: text/event-stream' \
  --header 'Content-Type: application/json' \
  --header 'X-Secret-Key: test' \
  --header 'x-protocol: data' \
  --data '{
  "messages": [
    {
      "role": "user",
      "created": 1740670518,
      "content": [
        {
          "type": "text",
          "text": "what tools do you have? be concise"
        }
      ]
    }
  ]
}'

Output

data: {"type":"Message","message":{"role":"assistant","created":1740672176,"content":[ 
{"type":"thinking","thinking":"I need to identify what tools are available to me based on the function specifications provided. Let me list them concisely:\n\nFrom the function specifications, I have:\n\n1. `developer__shell` - Execute shell commands\n2. `developer__text_editor` - Edit, view, or create files\n3. `developer__list_windows` - List available window titles for screenshots\n4. `developer__screen_capture` - Capture screenshots of displays or windows\n\nThese tools appear to be part of the \"developer\" extension that allows me to edit code files, run shell commands, and capture screenshots.","signature":"EugBCkYIARgCIkDUanI6laxqz0y/0tsfGT20+sdfdsfdf+sadasd+JABiEcbXHYESBV0qhAXGfpDCjBCzjdfEL/2/c3OxYD/aELuB5WF4CEm0bdSCL9I54GsFmdQ=="}, 
{"type":"text","text":"## Available Tools\n\nI currently have access to these tools from the developer extension:\n\n1. **Shell** - Run shell commands\n2. **Text Editor** - View and edit files\n3. **List Windows** - List available window titles \n4. **Screen Capture** - Take screenshots\n\nThese tools allow me to help with code development, file management, and visual debugging."}]}}

data: {"type":"Finish","reason":"stop"}

@salman1993 salman1993 changed the title feat: Claude 3.7 Sonnet feat: Claude 3.7 Sonnet with extended thinking Feb 26, 2025
@salman1993 salman1993 marked this pull request as ready for review February 27, 2025 14:55
@salman1993 salman1993 changed the title feat: Claude 3.7 Sonnet with extended thinking feat: [anthropic] Claude 3.7 Sonnet with extended thinking Feb 27, 2025
@@ -243,10 +274,13 @@ pub fn create_request(
return Err(anyhow!("No valid messages to send to Anthropic API"));
}

// https://docs.anthropic.com/en/docs/about-claude/models/all-models#model-comparison-table
// Claude 3.7 supports max output tokens up to 8192

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output tokens can now be up to 128k if you add a beta header (this is new with Sonnet 3.7 but its a little hidden).

I'm surprised there isn't an error since the default thinking budget in this PR is 16000 which is less than 8192?

This feature can be enabled by passing an anthropic-beta header of output-128k-2025-02-19.

https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking#extended-output-capabilities-beta

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we set max_tokens to be the sum of (max_tokens + budget_tokens)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay that makes sense then, I see on line 325

Copy link
Collaborator Author

@salman1993 salman1993 Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Penagwin i added the beta headers. gonna keep the default max_tokens for now cause its pretty high especially if the model doesn't use up all the budget tokens. we will allow users to configure these params in the future

Copy link
Collaborator

@baxen baxen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -247,6 +270,18 @@ async fn stream_message(
.await?;
}
}
MessageContent::Thinking(content) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might skip this until we implement? in part because i'm refactoring this

.json(&payload)
.send()
.await?;

let status = response.status();
let payload: Option<Value> = response.json().await.ok();

if std::env::var("GOOSE_DEBUG").is_ok() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do we use this standard elsewhere? definitely seems useful but i'm not sure it makes more sense necessarily than tracing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants