Skip to content

feat: add mcp_catalog toolset for on-demand MCP server discovery#2794

Open
dgageot wants to merge 10 commits into
docker:mainfrom
dgageot:mcp-catalog-toolset
Open

feat: add mcp_catalog toolset for on-demand MCP server discovery#2794
dgageot wants to merge 10 commits into
docker:mainfrom
dgageot:mcp-catalog-toolset

Conversation

@dgageot
Copy link
Copy Markdown
Member

@dgageot dgageot commented May 13, 2026

Summary

Adds a new built-in toolset, mcp_catalog, that lets agents discover and on-demand activate remote (streamable-http) MCP servers from the Docker MCP Catalog without declaring each one ahead of time in YAML.

The catalog (45 servers) is embedded at build time from https://desktop.docker.com/mcp/catalog/v3/catalog.json (filtered to type=remote AND remote.transport_type=streamable-http). The toolset surfaces five meta-tools to the model:

  • search_remote_mcp_servers — case-insensitive fuzzy search over id / title / description / category / tags.
  • list_remote_mcp_servers — show currently enabled servers.
  • enable_remote_mcp_server — instantiate an *mcp.Toolset for the server (deferred — no network until tools are next enumerated).
  • disable_remote_mcp_server — stop the toolset and remove its tools.
  • reset_remote_mcp_server_auth — wipe persisted OAuth credentials so the next enable triggers a fresh authorization flow.

How it works

  • Each enabled *mcp.Toolset is wrapped in a tools.NewStartable so it gets the same lazy single-flight Start semantics as YAML-declared toolsets.
  • Tools() lazily Start()s each enabled wrapped toolset on first enumeration. Treats mcp.IsAuthorizationRequired as expected deferral (silent on the non-interactive startup probe; user-facing OAuth dialog on the next interactive turn). Real Start failures get the StartableToolSet de-duplicated warning streak.
  • Elicitation / OAuth-success / managed-OAuth / tools-changed handlers captured by the runtime are re-applied to each new mcp.Toolset on enable so the OAuth dance behaves identically to a YAML-declared mcp.remote toolset.
  • ${ENV_VAR} placeholders in catalog headers (e.g. Apify's Authorization: Bearer ${APIFY_API_KEY}) resolve at enable time via js.Expander — same code path the YAML toolsets use.
  • reset_remote_mcp_server_auth stops the inner toolset (so the live MCP session drops its now-stale token) before notifying the runtime and clearing the keyring entry. The notify fires even when the keyring removal fails, so the runtime's tool list never desyncs from the actual enabled state.

Configuration

agents:
  root:
    model: anthropic/claude-sonnet-4-6
    toolsets:
      - type: mcp_catalog

See examples/mcp_catalog.yaml for a full example.

What's in this branch

  • pkg/tools/builtin/mcpcatalog/{mcpcatalog.go, servers.go, servers.json} — toolset, embedded catalog (45 servers).
  • pkg/teamloader/registry.go — registers the new mcp_catalog toolset type.
  • agent-schema.json — adds mcp_catalog to the toolset-type enums.
  • examples/mcp_catalog.yaml — example agent using anthropic/claude-sonnet-4-6.
  • pkg/tools/builtin/mcpcatalog/mcpcatalog_test.go — 19 tests (load, search, lifecycle, race-friendly concurrent enable/disable, ctx cancellation, lazy-start regression test against a fake MCP server, auth-required deferral, reset_auth happy path / unknown id / no-op for non-oauth / disable-on-reset / error surfacing / notify-on-keyring-failure).

Known limitation

The runtime's MCP-prompt discovery uses tools.As[*mcp.Toolset] to find prompt-capable toolsets. Because mcp_catalog is a container, prompts exposed by servers activated through this catalog are not surfaced via /prompts. Tools (the primary interface) work fine. Plumbing prompt discovery through container toolsets would require a new interface and changes to pkg/runtime/runtime.go — out of scope for this PR.

Validation

  • task build
  • go vet ./...
  • golangci-lint run ./... → 0 issues ✅
  • go test ./pkg/tools/builtin/mcpcatalog/... -race -count=1 → all 19 tests pass ✅

@dgageot dgageot requested a review from a team as a code owner May 13, 2026 16:46
Copy link
Copy Markdown

@docker-agent docker-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: 🟡 NEEDS ATTENTION

Two medium-severity findings in pkg/tools/builtin/mcpcatalog/mcpcatalog.go — see inline comments for details.

Comment thread pkg/tools/builtin/mcpcatalog/mcpcatalog.go Outdated
Comment thread pkg/tools/builtin/mcpcatalog/mcpcatalog.go
Comment thread pkg/tools/builtin/mcpcatalog/mcpcatalog.go Outdated
Comment thread pkg/tools/builtin/mcpcatalog/mcpcatalog.go
Comment thread pkg/tools/builtin/mcpcatalog/mcpcatalog.go
Comment thread pkg/tools/builtin/mcpcatalog/mcpcatalog.go
Comment thread pkg/tools/builtin/mcpcatalog/mcpcatalog.go
Comment thread pkg/tools/builtin/mcpcatalog/mcpcatalog.go
Comment thread pkg/tools/builtin/mcpcatalog/mcpcatalog.go
Comment thread pkg/tools/builtin/mcpcatalog/mcpcatalog_test.go
dgageot added a commit to dgageot/cagent that referenced this pull request May 18, 2026
missingAPIKeyEnv and ExpandMap perform external I/O (env.Get hits the
OS keychain, ExpandMap invokes the expander interface) and were being
called while holding t.mu write lock. This blocked all concurrent
readers (Tools, handleSearch, handleList) for the full duration of
the external round-trips.

Both functions operate exclusively on 'server' from t.byID, which is
read-only after construction and requires no mutex protection. Moving
them before t.mu.Lock() eliminates the lock contention.

Addresses review feedback on PR docker#2794.
dgageot added a commit to dgageot/cagent that referenced this pull request May 18, 2026
Tools() snapshots t.enabled under RLock, then iterates outside the lock.
If handleDisable runs concurrently and removes a server from t.enabled
and calls Stop(), the Tools() iteration could still hold a reference to
the same *tools.StartableToolSet and call Start() on it, potentially
restarting a server the user explicitly disabled.

Add a re-check of t.enabled membership before calling Start() to skip
servers that were disabled after the snapshot was taken.

Addresses review feedback on PR docker#2794.
@dgageot dgageot force-pushed the mcp-catalog-toolset branch from 3d9e7f4 to cbbc61d Compare May 18, 2026 08:31
@dgageot
Copy link
Copy Markdown
Member Author

dgageot commented May 18, 2026

/review

Copy link
Copy Markdown

@docker-agent docker-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: 🟡 NEEDS ATTENTION

Two medium-severity findings in — both in the newly added toolset code. No issues found in the other changed files.

}

if !e.ts.IsStarted() {
if err := e.ts.Start(ctx); err != nil {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] TOCTOU race: disabled server may be restarted via Start() after Stop() in Tools()

The stillEnabled guard in Tools() has a narrow but real race window:

  1. Tools() acquires RLock, confirms the server is still in t.enabled, then releases the lock.
  2. Before Start() is called on the next line, handleDisable acquires the write lock, removes the server from t.enabled, releases the lock, and calls Stop() on the same *tools.StartableToolSet.
  3. Tools() then calls Start() on the already-stopped object.

The PR comments acknowledge this race ("a concurrent handleDisable could remove the server from t.enabled and call Stop(), but we'd still hold a reference and potentially restart it") but rely on the stillEnabled check to prevent it. That check does not eliminate the window — it only narrows it to after the check passes.

Whether this is harmful depends on StartableToolSet's semantics: if Start() after Stop() reconnects (restarting a server the user just disabled) or panics, this is a real correctness bug. A user-visible symptom would be a server that was disabled continuing to surface its tools for one more Tools() cycle.

Suggested fix: Re-check stillEnabled immediately before calling Start() under a short-lived RLock, or have handleDisable set a "stopped" flag on the StartableToolSet that Start() respects.

msg.WriteString("auth: OAuth — an authorization URL will be elicited on the next turn.\n")
case "api_key":
if len(missing) > 0 {
fmt.Fprintf(&msg, "auth: API key — WARNING: the following env vars are NOT set: %s. Set them and re-enable, otherwise tool calls will fail.\n", strings.Join(missing, ", "))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Misleading message: "Set them and re-enable" won't work without disabling first

When enable_remote_mcp_server is called for an api_key server with missing environment variables, the server is added to t.enabled and then the user is told:

"Set them and re-enable, otherwise tool calls will fail."

However, if the user sets the env vars and calls enable_remote_mcp_server again, they will hit the early-return guard at the top of handleEnable:

if _, exists := t.enabled[id]; exists {
    t.mu.Unlock()
    return tools.ResultSuccess(fmt.Sprintf("server %q is already enabled", id)), nil
}

Because the server is already in t.enabled, the second call returns "already enabled" without creating a new toolset or re-expanding the headers. The env var snapshot from the first call is permanently baked into the existing toolset.

The user would need to call disable_remote_mcp_server first, then re-enable — but the message does not say this.

Suggested fix: Change the message to "Set them, then disable and re-enable this server, otherwise tool calls will fail." to give accurate instructions.

dgageot added 10 commits May 18, 2026 14:29
persist managedOAuth flag so it applies to servers enabled after
SetManagedOAuth was called, honor ctx.Err() when iterating enabled
toolsets in Tools() so cancelled contexts return early, fix package
doc comment that said three meta-tools instead of four, and add
three tests for these fixes.
Adds a fifth tool to the mcp_catalog toolset for clearing persisted OAuth
credentials. Useful when a token is revoked, scopes change, or the user
signs in to the wrong account.

For oauth servers: stops the active MCP session, removes the token from
the keyring, and fires the tools-changed handler if enabled. For api_key/
none servers: no-op. Keyring errors surface as tool errors, not panics.

Includes 5 new tests validating the auth reset behavior.
The new pkg/tools/builtin/mcpcatalog package embeds servers.json via
//go:embed. The .dockerignore default-denies everything and only
whitelists specific paths, so the embedded JSON would be missing from
the Docker build context, causing the build to fail.
missingAPIKeyEnv and ExpandMap perform external I/O (env.Get hits the
OS keychain, ExpandMap invokes the expander interface) and were being
called while holding t.mu write lock. This blocked all concurrent
readers (Tools, handleSearch, handleList) for the full duration of
the external round-trips.

Both functions operate exclusively on 'server' from t.byID, which is
read-only after construction and requires no mutex protection. Moving
them before t.mu.Lock() eliminates the lock contention.

Addresses review feedback on PR docker#2794.
Tools() snapshots t.enabled under RLock, then iterates outside the lock.
If handleDisable runs concurrently and removes a server from t.enabled
and calls Stop(), the Tools() iteration could still hold a reference to
the same *tools.StartableToolSet and call Start() on it, potentially
restarting a server the user explicitly disabled.

Add a re-check of t.enabled membership before calling Start() to skip
servers that were disabled after the snapshot was taken.

Addresses review feedback on PR docker#2794.
- Fix TOCTOU race in Tools(): re-check t.enabled after Start so a
  concurrent disable can no longer leave a started session orphaned.
- Cache catalog parse with sync.OnceValues; Load() returns shallow
  copies so test-only mutations stay isolated.
- Stable iteration order in Tools() (sort by id) to keep prompt caches
  and TUI rendering steady across turns.
- Cap empty-query search at 25 results so the model doesn't get the
  whole catalog dumped into its context window.
- Resolve catalog header ${VAR} placeholders directly through the env
  provider (catalog uses bare ${VAR}, not ${env.VAR}); add a
  post-expansion scan that surfaces any placeholder still left in the
  expanded headers, even for auth.type=none servers.
- Replace misleading 'Set them and re-enable' message with explicit
  disable + enable instructions (a plain second enable would short-circuit).
- Document why we pass nil for *RemoteOAuthConfig in NewRemoteToolset.

Tests added for the new behaviours: empty-query truncation, unresolved
header placeholder warnings, sync.OnceValues isolation, stable Tools()
ordering, and the disable/enable wording in the missing-env message.
@dgageot dgageot force-pushed the mcp-catalog-toolset branch from d9f35e5 to 6cff5b0 Compare May 18, 2026 12:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants