Skip to content

fix(providers/anthropic): pair provider-executed tools#36

Closed
ethanndickson wants to merge 5 commits into
coder_2_33from
chat-sanitize-provider-tool-pairing
Closed

fix(providers/anthropic): pair provider-executed tools#36
ethanndickson wants to merge 5 commits into
coder_2_33from
chat-sanitize-provider-tool-pairing

Conversation

@ethanndickson
Copy link
Copy Markdown
Member

Problem

Anthropic's web-search provider-executed tool is emitted by the API as two adjacent content blocks: a server_tool_use block carrying the search query, and a matching web_search_tool_result block carrying the citations.

Previously this adapter exposed only the server_tool_use half to consumers (as a ToolCallContent with ProviderExecuted: true), with no corresponding ToolResultContent. Sources from the result were emitted as StreamPartTypeSource parts, but the tool call itself was never paired with a tool result.

The visible failure mode: persisted assistant messages contained an unmatched provider-executed tool call. When that message was sent back to Anthropic as part of a multi-turn conversation, the API rejected it because a signed assistant turn containing a server_tool_use block must also contain its web_search_tool_result.

Change

Reshape the Anthropic adapter so the pair is exposed atomically to consumers, only after both blocks arrive.

  • Buffer the server_tool_use input until its matching web_search_tool_result arrives. On result, emit the existing ToolInputStart / ToolInputEnd / ToolCall quartet with ProviderExecuted: true, immediately followed by ToolResult.
  • Web-search citations are still exposed via StreamPartTypeSource and via WebSearchResultMetadata on the result, but only after the pair is verified complete. Sources no longer leak ahead of an orphan result.
  • Reject orphan results (result with no matching server_tool_use), duplicate operation IDs, and operations the stream ends on without a matching result. Each is a typed error.
  • Hoist the incomplete-buffer check above the EOF / non-EOF stream split, so a transport error that occurs mid-operation surfaces both the incomplete-operation error and the transport error wrapped together, rather than the transport error alone.
  • Skip input_json_delta for server_tool_use blocks so the Anthropic SDK's own accumulation drives the final JSON, rather than leaking partial deltas to consumers.

Both Generate and Stream paths are covered.

Encoding choice

This re-encodes provider-executed tools through the existing ToolCall / ToolResult API with ProviderExecuted: true, rather than introducing a new public stream-part type. That mirrors the shape:

  • LiteLLM landed for the same Anthropic multi-turn failure mode (LiteLLM #17798): re-encode through the existing tool-call / tool-result API and carry web-search results as provider_specific_fields.
  • Vercel AI SDK v5 settled on for hosted/server tools with providerExecuted: true and granular lifecycle hooks.

This is more conservative than introducing a StreamPartTypeProviderExecutedOperation: no public surface change, no behavior change for non-Anthropic providers, and existing tool-call consumers ignore the ProviderExecuted: true flag as before.

Tests

New providers/anthropic/anthropic_provider_operation_test.go covers:

  • TestGenerate_WebSearchResponseRejectsDuplicateProviderOperationIDs
  • TestStream_WebSearchResponseRejectsOrphanResultBeforeSources (asserts no sources, no provider tool events, exactly one error)
  • TestStream_WebSearchResponseSkipsProviderInputDeltas (chunked input_json_delta accumulation, asserts no ToolInputDelta parts)
  • TestStream_WebSearchResponseHandlesMultipleProviderOperations (two independent pairs in one stream)
  • TestStream_WebSearchResponseRejectsDuplicateProviderOperationIDs
  • TestStream_WebSearchResponseSurfacesIncompleteOperationOnStreamError (mid-operation transport error)
  • TestStream_WebSearchResponsePreservesProviderMetadata (encrypted content, page age round-trip)

Existing anthropic_test.go tests for unpaired operations and adjacency continue to pass.

Upstream intent

This is a fork-only patch in github.com/coder/fantasy pending upstream landing in charm.land/fantasy. The companion patch against charm has been prepared and the test file is byte-identical between the two; only the surrounding hunks differ due to existing fork drift in anthropic.go.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 95f0e0be85

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread providers/anthropic/anthropic.go Outdated
Anthropic's web-search provider-executed tool is emitted as two
adjacent content blocks: a `server_tool_use` block carrying the
search query, and a matching `web_search_tool_result` block carrying
the citations. Previously this adapter exposed only the
`server_tool_use` half to consumers (as a `ToolCallContent` with
`ProviderExecuted: true`), which produced persisted message history
containing unmatched provider-executed tool calls. Subsequent replay
turns then failed the Anthropic API's validation, since a signed
assistant turn with a server_tool_use must also contain its result.

Reshape the adapter so the pair is exposed atomically, only after
both blocks arrive:

  - Buffer the server_tool_use input until its matching
    web_search_tool_result arrives. On result, emit the existing
    ToolInputStart / ToolInputEnd / ToolCall quartet with
    ProviderExecuted: true, immediately followed by ToolResult.
    Web-search citations are still exposed via StreamPartTypeSource
    and via WebSearchResultMetadata on the result, but only after
    the pair is verified complete (sources no longer leak ahead of
    an orphan result).
  - Reject orphan results (result with no matching server_tool_use),
    duplicate operation IDs, and operations the stream ends on
    without a matching result. Each is a typed error.
  - Hoist the incomplete-buffer check above the EOF / non-EOF
    stream split so transport errors that occur mid-operation
    surface both the incomplete-operation error and the transport
    error wrapped together, rather than the transport error alone.
  - Skip input_json_delta for server_tool_use blocks so the
    Anthropic SDK's own accumulation drives the final JSON,
    rather than leaking partial deltas to consumers.

Both Generate and Stream paths are covered.

The encoding choice (re-encode through the existing
ToolCall/ToolResult API with ProviderExecuted: true rather than
add a new public stream-part type) mirrors LiteLLM's fix for the
same Anthropic multi-turn failure mode (LiteLLM #17798) and the
shape that Vercel AI SDK v5 settled on with providerExecuted: true.

Tests cover: orphan-result rejection before any sources are emitted,
chunked input_json_delta accumulation, multiple independent pairs
in one stream, duplicate-operation-ID rejection in both Generate
and Stream, stream-error mid-operation surfacing the wrapped error,
and provider-metadata preservation (encrypted content, page age).
@ethanndickson ethanndickson force-pushed the chat-sanitize-provider-tool-pairing branch from 95f0e0b to 6ab1650 Compare May 19, 2026 06:46
Anthropic pause_turn responses can legitimately contain server_tool_use
blocks without a matching web_search_tool_result. That means the
provider-executed operation is incomplete, but not invalid: callers must
round-trip the assistant response so Anthropic can continue the paused
server-side tool turn.

Keep the atomic provider-operation invariant for completed turns, but make
it stop-reason-aware:

- Generate appends buffered provider-executed calls instead of returning an
  incomplete-operation error when the raw stop reason is pause_turn.
- Stream yields buffered provider-executed calls before the finish part when
  the accumulated stop reason is pause_turn.
- Non-pause turns and stream transport errors without pause_turn still report
  incomplete provider-executed operations.
- Anthropic finish metadata now carries the raw stop_reason so consumers can
  distinguish pause_turn from generic stop after mapFinishReason collapses it.

Add generate and stream regressions for pause_turn, including the transport
error case where pause_turn should not be reported as an incomplete provider
operation.
# Conflicts:
#	providers/anthropic/anthropic.go
The provider-tool pairing branch now targets the coder_2_33 fork branch so
Coder can consume it without losing the previously merged replay-fidelity
commits. That exposes a few existing coder_2_33 issues under the current lint
and test commands.

Fix those branch-local issues so the PR branch passes the full fantasy check
set:

- Update exported computer-use type comments for revive.
- Avoid Go 1.26-only new(T(value)) expressions while the module language is
  still Go 1.25.
- Allow function_call_output-only follow-up payloads when previous_response_id
  is set, while keeping strict function call/output validation for explicit
  replay prompts.
@ethanndickson ethanndickson changed the base branch from main to coder_2_33 May 19, 2026 07:13
@ethanndickson
Copy link
Copy Markdown
Member Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d308c385fe

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread providers/openai/responses_language_model.go Outdated
@ethanndickson
Copy link
Copy Markdown
Member Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0172f5af1d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

}
content = operation.appendContent(content)
content = append(content, toolResult)
delete(providerExecutedOperations, webSearchResult.ToolUseID)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Remember completed provider operation IDs

Because this is the only state used for duplicate detection, deleting the entry after the first web_search_tool_result lets a later server_tool_use with the same ID pass as a fresh operation. In a response containing two completed pairs with the same srvtoolu_* ID, Generate will persist duplicate provider-executed tool calls/results instead of returning the duplicate-ID error promised by this change, and replaying that assistant message can be rejected by Anthropic. Keep a separate seen-ID set (and apply the same pattern in the streaming path) so duplicates are rejected even after the first pair is completed.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant