fix(providers/anthropic): pair provider-executed tools by ethanndickson · Pull Request #36 · coder/fantasy

ethanndickson · 2026-05-19T06:40:53Z

Problem

Anthropic's web-search provider-executed tool is emitted by the API as two adjacent content blocks: a server_tool_use block carrying the search query, and a matching web_search_tool_result block carrying the citations.

Previously this adapter exposed only the server_tool_use half to consumers (as a ToolCallContent with ProviderExecuted: true), with no corresponding ToolResultContent. Sources from the result were emitted as StreamPartTypeSource parts, but the tool call itself was never paired with a tool result.

The visible failure mode: persisted assistant messages contained an unmatched provider-executed tool call. When that message was sent back to Anthropic as part of a multi-turn conversation, the API rejected it because a signed assistant turn containing a server_tool_use block must also contain its web_search_tool_result.

Change

Reshape the Anthropic adapter so the pair is exposed atomically to consumers, only after both blocks arrive.

Buffer the server_tool_use input until its matching web_search_tool_result arrives. On result, emit the existing ToolInputStart / ToolInputEnd / ToolCall quartet with ProviderExecuted: true, immediately followed by ToolResult.
Web-search citations are still exposed via StreamPartTypeSource and via WebSearchResultMetadata on the result, but only after the pair is verified complete. Sources no longer leak ahead of an orphan result.
Reject orphan results (result with no matching server_tool_use), duplicate operation IDs, and operations the stream ends on without a matching result. Each is a typed error.
Hoist the incomplete-buffer check above the EOF / non-EOF stream split, so a transport error that occurs mid-operation surfaces both the incomplete-operation error and the transport error wrapped together, rather than the transport error alone.
Skip input_json_delta for server_tool_use blocks so the Anthropic SDK's own accumulation drives the final JSON, rather than leaking partial deltas to consumers.

Both Generate and Stream paths are covered.

Encoding choice

This re-encodes provider-executed tools through the existing ToolCall / ToolResult API with ProviderExecuted: true, rather than introducing a new public stream-part type. That mirrors the shape:

LiteLLM landed for the same Anthropic multi-turn failure mode (LiteLLM #17798): re-encode through the existing tool-call / tool-result API and carry web-search results as provider_specific_fields.
Vercel AI SDK v5 settled on for hosted/server tools with providerExecuted: true and granular lifecycle hooks.

This is more conservative than introducing a StreamPartTypeProviderExecutedOperation: no public surface change, no behavior change for non-Anthropic providers, and existing tool-call consumers ignore the ProviderExecuted: true flag as before.

Tests

New providers/anthropic/anthropic_provider_operation_test.go covers:

TestGenerate_WebSearchResponseRejectsDuplicateProviderOperationIDs
TestStream_WebSearchResponseRejectsOrphanResultBeforeSources (asserts no sources, no provider tool events, exactly one error)
TestStream_WebSearchResponseSkipsProviderInputDeltas (chunked input_json_delta accumulation, asserts no ToolInputDelta parts)
TestStream_WebSearchResponseHandlesMultipleProviderOperations (two independent pairs in one stream)
TestStream_WebSearchResponseRejectsDuplicateProviderOperationIDs
TestStream_WebSearchResponseSurfacesIncompleteOperationOnStreamError (mid-operation transport error)
TestStream_WebSearchResponsePreservesProviderMetadata (encrypted content, page age round-trip)

Existing anthropic_test.go tests for unpaired operations and adjacency continue to pass.

Upstream intent

This is a fork-only patch in github.com/coder/fantasy pending upstream landing in charm.land/fantasy. The companion patch against charm has been prepared and the test file is byte-identical between the two; only the surrounding hunks differ due to existing fork drift in anthropic.go.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 95f0e0be85

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Anthropic's web-search provider-executed tool is emitted as two adjacent content blocks: a `server_tool_use` block carrying the search query, and a matching `web_search_tool_result` block carrying the citations. Previously this adapter exposed only the `server_tool_use` half to consumers (as a `ToolCallContent` with `ProviderExecuted: true`), which produced persisted message history containing unmatched provider-executed tool calls. Subsequent replay turns then failed the Anthropic API's validation, since a signed assistant turn with a server_tool_use must also contain its result. Reshape the adapter so the pair is exposed atomically, only after both blocks arrive: - Buffer the server_tool_use input until its matching web_search_tool_result arrives. On result, emit the existing ToolInputStart / ToolInputEnd / ToolCall quartet with ProviderExecuted: true, immediately followed by ToolResult. Web-search citations are still exposed via StreamPartTypeSource and via WebSearchResultMetadata on the result, but only after the pair is verified complete (sources no longer leak ahead of an orphan result). - Reject orphan results (result with no matching server_tool_use), duplicate operation IDs, and operations the stream ends on without a matching result. Each is a typed error. - Hoist the incomplete-buffer check above the EOF / non-EOF stream split so transport errors that occur mid-operation surface both the incomplete-operation error and the transport error wrapped together, rather than the transport error alone. - Skip input_json_delta for server_tool_use blocks so the Anthropic SDK's own accumulation drives the final JSON, rather than leaking partial deltas to consumers. Both Generate and Stream paths are covered. The encoding choice (re-encode through the existing ToolCall/ToolResult API with ProviderExecuted: true rather than add a new public stream-part type) mirrors LiteLLM's fix for the same Anthropic multi-turn failure mode (LiteLLM #17798) and the shape that Vercel AI SDK v5 settled on with providerExecuted: true. Tests cover: orphan-result rejection before any sources are emitted, chunked input_json_delta accumulation, multiple independent pairs in one stream, duplicate-operation-ID rejection in both Generate and Stream, stream-error mid-operation surfacing the wrapped error, and provider-metadata preservation (encrypted content, page age).

Anthropic pause_turn responses can legitimately contain server_tool_use blocks without a matching web_search_tool_result. That means the provider-executed operation is incomplete, but not invalid: callers must round-trip the assistant response so Anthropic can continue the paused server-side tool turn. Keep the atomic provider-operation invariant for completed turns, but make it stop-reason-aware: - Generate appends buffered provider-executed calls instead of returning an incomplete-operation error when the raw stop reason is pause_turn. - Stream yields buffered provider-executed calls before the finish part when the accumulated stop reason is pause_turn. - Non-pause turns and stream transport errors without pause_turn still report incomplete provider-executed operations. - Anthropic finish metadata now carries the raw stop_reason so consumers can distinguish pause_turn from generic stop after mapFinishReason collapses it. Add generate and stream regressions for pause_turn, including the transport error case where pause_turn should not be reported as an incomplete provider operation.

# Conflicts: # providers/anthropic/anthropic.go

The provider-tool pairing branch now targets the coder_2_33 fork branch so Coder can consume it without losing the previously merged replay-fidelity commits. That exposes a few existing coder_2_33 issues under the current lint and test commands. Fix those branch-local issues so the PR branch passes the full fantasy check set: - Update exported computer-use type comments for revive. - Avoid Go 1.26-only new(T(value)) expressions while the module language is still Go 1.25. - Allow function_call_output-only follow-up payloads when previous_response_id is set, while keeping strict function call/output validation for explicit replay prompts.

ethanndickson · 2026-05-19T07:31:52Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d308c385fe

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

ethanndickson · 2026-05-19T08:10:32Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0172f5af1d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-19T08:13:51Z

+			}
+			content = operation.appendContent(content)
 			content = append(content, toolResult)
+			delete(providerExecutedOperations, webSearchResult.ToolUseID)


Remember completed provider operation IDs

Because this is the only state used for duplicate detection, deleting the entry after the first web_search_tool_result lets a later server_tool_use with the same ID pass as a fresh operation. In a response containing two completed pairs with the same srvtoolu_* ID, Generate will persist duplicate provider-executed tool calls/results instead of returning the duplicate-ID error promised by this change, and replaying that assistant message can be rejected by Anthropic. Keep a separate seen-ID set (and apply the same pattern in the streaming path) so duplicates are rejected even after the first pair is completed.

Useful? React with 👍 / 👎.

chatgpt-codex-connector Bot reviewed May 19, 2026

View reviewed changes

Comment thread providers/anthropic/anthropic.go Outdated

ethanndickson force-pushed the chat-sanitize-provider-tool-pairing branch from 95f0e0b to 6ab1650 Compare May 19, 2026 06:46

ethanndickson added 3 commits May 19, 2026 06:56

merge coder_2_33 into provider-tool pairing

cae94e4

# Conflicts: # providers/anthropic/anthropic.go

ethanndickson changed the base branch from main to coder_2_33 May 19, 2026 07:13

chatgpt-codex-connector Bot reviewed May 19, 2026

View reviewed changes

Comment thread providers/openai/responses_language_model.go Outdated

fix(providers/openai): validate previous response tool outputs

0172f5a

chatgpt-codex-connector Bot reviewed May 19, 2026

View reviewed changes

ethanndickson closed this May 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(providers/anthropic): pair provider-executed tools#36

fix(providers/anthropic): pair provider-executed tools#36
ethanndickson wants to merge 5 commits into
coder_2_33from
chat-sanitize-provider-tool-pairing

ethanndickson commented May 19, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

ethanndickson commented May 19, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

ethanndickson commented May 19, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ethanndickson commented May 19, 2026

Problem

Change

Encoding choice

Tests

Upstream intent

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

ethanndickson commented May 19, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

ethanndickson commented May 19, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant