fix(providers/anthropic): pair provider-executed tools#36
fix(providers/anthropic): pair provider-executed tools#36ethanndickson wants to merge 5 commits into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 95f0e0be85
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Anthropic's web-search provider-executed tool is emitted as two
adjacent content blocks: a `server_tool_use` block carrying the
search query, and a matching `web_search_tool_result` block carrying
the citations. Previously this adapter exposed only the
`server_tool_use` half to consumers (as a `ToolCallContent` with
`ProviderExecuted: true`), which produced persisted message history
containing unmatched provider-executed tool calls. Subsequent replay
turns then failed the Anthropic API's validation, since a signed
assistant turn with a server_tool_use must also contain its result.
Reshape the adapter so the pair is exposed atomically, only after
both blocks arrive:
- Buffer the server_tool_use input until its matching
web_search_tool_result arrives. On result, emit the existing
ToolInputStart / ToolInputEnd / ToolCall quartet with
ProviderExecuted: true, immediately followed by ToolResult.
Web-search citations are still exposed via StreamPartTypeSource
and via WebSearchResultMetadata on the result, but only after
the pair is verified complete (sources no longer leak ahead of
an orphan result).
- Reject orphan results (result with no matching server_tool_use),
duplicate operation IDs, and operations the stream ends on
without a matching result. Each is a typed error.
- Hoist the incomplete-buffer check above the EOF / non-EOF
stream split so transport errors that occur mid-operation
surface both the incomplete-operation error and the transport
error wrapped together, rather than the transport error alone.
- Skip input_json_delta for server_tool_use blocks so the
Anthropic SDK's own accumulation drives the final JSON,
rather than leaking partial deltas to consumers.
Both Generate and Stream paths are covered.
The encoding choice (re-encode through the existing
ToolCall/ToolResult API with ProviderExecuted: true rather than
add a new public stream-part type) mirrors LiteLLM's fix for the
same Anthropic multi-turn failure mode (LiteLLM #17798) and the
shape that Vercel AI SDK v5 settled on with providerExecuted: true.
Tests cover: orphan-result rejection before any sources are emitted,
chunked input_json_delta accumulation, multiple independent pairs
in one stream, duplicate-operation-ID rejection in both Generate
and Stream, stream-error mid-operation surfacing the wrapped error,
and provider-metadata preservation (encrypted content, page age).
95f0e0b to
6ab1650
Compare
Anthropic pause_turn responses can legitimately contain server_tool_use blocks without a matching web_search_tool_result. That means the provider-executed operation is incomplete, but not invalid: callers must round-trip the assistant response so Anthropic can continue the paused server-side tool turn. Keep the atomic provider-operation invariant for completed turns, but make it stop-reason-aware: - Generate appends buffered provider-executed calls instead of returning an incomplete-operation error when the raw stop reason is pause_turn. - Stream yields buffered provider-executed calls before the finish part when the accumulated stop reason is pause_turn. - Non-pause turns and stream transport errors without pause_turn still report incomplete provider-executed operations. - Anthropic finish metadata now carries the raw stop_reason so consumers can distinguish pause_turn from generic stop after mapFinishReason collapses it. Add generate and stream regressions for pause_turn, including the transport error case where pause_turn should not be reported as an incomplete provider operation.
# Conflicts: # providers/anthropic/anthropic.go
The provider-tool pairing branch now targets the coder_2_33 fork branch so Coder can consume it without losing the previously merged replay-fidelity commits. That exposes a few existing coder_2_33 issues under the current lint and test commands. Fix those branch-local issues so the PR branch passes the full fantasy check set: - Update exported computer-use type comments for revive. - Avoid Go 1.26-only new(T(value)) expressions while the module language is still Go 1.25. - Allow function_call_output-only follow-up payloads when previous_response_id is set, while keeping strict function call/output validation for explicit replay prompts.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d308c385fe
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0172f5af1d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
| content = operation.appendContent(content) | ||
| content = append(content, toolResult) | ||
| delete(providerExecutedOperations, webSearchResult.ToolUseID) |
There was a problem hiding this comment.
Remember completed provider operation IDs
Because this is the only state used for duplicate detection, deleting the entry after the first web_search_tool_result lets a later server_tool_use with the same ID pass as a fresh operation. In a response containing two completed pairs with the same srvtoolu_* ID, Generate will persist duplicate provider-executed tool calls/results instead of returning the duplicate-ID error promised by this change, and replaying that assistant message can be rejected by Anthropic. Keep a separate seen-ID set (and apply the same pattern in the streaming path) so duplicates are rejected even after the first pair is completed.
Useful? React with 👍 / 👎.
Problem
Anthropic's web-search provider-executed tool is emitted by the API as two adjacent content blocks: a
server_tool_useblock carrying the search query, and a matchingweb_search_tool_resultblock carrying the citations.Previously this adapter exposed only the
server_tool_usehalf to consumers (as aToolCallContentwithProviderExecuted: true), with no correspondingToolResultContent. Sources from the result were emitted asStreamPartTypeSourceparts, but the tool call itself was never paired with a tool result.The visible failure mode: persisted assistant messages contained an unmatched provider-executed tool call. When that message was sent back to Anthropic as part of a multi-turn conversation, the API rejected it because a signed assistant turn containing a
server_tool_useblock must also contain itsweb_search_tool_result.Change
Reshape the Anthropic adapter so the pair is exposed atomically to consumers, only after both blocks arrive.
server_tool_useinput until its matchingweb_search_tool_resultarrives. On result, emit the existingToolInputStart/ToolInputEnd/ToolCallquartet withProviderExecuted: true, immediately followed byToolResult.StreamPartTypeSourceand viaWebSearchResultMetadataon the result, but only after the pair is verified complete. Sources no longer leak ahead of an orphan result.input_json_deltaforserver_tool_useblocks so the Anthropic SDK's own accumulation drives the final JSON, rather than leaking partial deltas to consumers.Both
GenerateandStreampaths are covered.Encoding choice
This re-encodes provider-executed tools through the existing
ToolCall/ToolResultAPI withProviderExecuted: true, rather than introducing a new public stream-part type. That mirrors the shape:provider_specific_fields.providerExecuted: trueand granular lifecycle hooks.This is more conservative than introducing a
StreamPartTypeProviderExecutedOperation: no public surface change, no behavior change for non-Anthropic providers, and existing tool-call consumers ignore theProviderExecuted: trueflag as before.Tests
New
providers/anthropic/anthropic_provider_operation_test.gocovers:TestGenerate_WebSearchResponseRejectsDuplicateProviderOperationIDsTestStream_WebSearchResponseRejectsOrphanResultBeforeSources(asserts no sources, no provider tool events, exactly one error)TestStream_WebSearchResponseSkipsProviderInputDeltas(chunkedinput_json_deltaaccumulation, asserts noToolInputDeltaparts)TestStream_WebSearchResponseHandlesMultipleProviderOperations(two independent pairs in one stream)TestStream_WebSearchResponseRejectsDuplicateProviderOperationIDsTestStream_WebSearchResponseSurfacesIncompleteOperationOnStreamError(mid-operation transport error)TestStream_WebSearchResponsePreservesProviderMetadata(encrypted content, page age round-trip)Existing
anthropic_test.gotests for unpaired operations and adjacency continue to pass.Upstream intent
This is a fork-only patch in
github.com/coder/fantasypending upstream landing incharm.land/fantasy. The companion patch against charm has been prepared and the test file is byte-identical between the two; only the surrounding hunks differ due to existing fork drift inanthropic.go.