Skip to content

HttpServletStreamableServerTransportProvider removes session on non-fatal failure #952

@labkey-jeckels

Description

@labkey-jeckels

Bug description
We use the MCP Java SDK as via our dependency on Spring AI. Our CI/CD test coverage for our MCP implementation fails regularly with a race condition when dealing with intentional error conditions. We could not reproduce it locally on dev machines, but via logging eventually found a workaround to what appears to be a problem in HttpServletStreamableTransportProvider.

Environment
Versions 1.1.1, 1.1.2, and 2.0.0-M2 of the Java SDK. Prerelease versions of Spring AI 2.0, including 2.0.0-M5 and 2.0.0-M4. Java 25.

Steps to reproduce

See attached for a standalone repro. The error manifested for us as a race condition but always of this form:

  java.lang.RuntimeException: MCP session with server terminated
  at io.modelcontextprotocol.spec.McpClientSession.lambda$dismissPendingResponses$1(McpClientSession.java:128)
  at java.base/java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1619)
  at io.modelcontextprotocol.spec.McpClientSession.dismissPendingResponses(McpClientSession.java:126)
  at io.modelcontextprotocol.spec.McpClientSession.close(McpClientSession.java:304)
  at io.modelcontextprotocol.client.LifecycleInitializer$DefaultInitialization.close(LifecycleInitializer.java:225)
  at io.modelcontextprotocol.client.LifecycleInitializer.handleException(LifecycleInitializer.java:257)
  at io.modelcontextprotocol.client.transport.HttpClientStreamableHttpTransport.handleException(HttpClientStreamableHttpTransport.java:231)
  at io.modelcontextprotocol.client.transport.HttpClientStreamableHttpTransport.lambda$sendMessage$33(HttpClientStreamableHttpTransport.java:641)
  at reactor.core.publisher.FluxOnErrorReturn$ReturnSubscriber.onError(FluxOnErrorReturn.java:172)
  at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:107)
  at reactor.core.publisher.Operators.error(Operators.java:198)
  at reactor.core.publisher.FluxError.subscribe(FluxError.java:44)
  at reactor.core.publisher.Flux.subscribe(Flux.java:8888)
  at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:104)
  at reactor.core.publisher.FluxFlatMap$FlatMapMain.checkTerminated(FluxFlatMap.java:847)
  at reactor.core.publisher.FluxFlatMap$FlatMapMain.drainLoop(FluxFlatMap.java:613)
  at reactor.core.publisher.FluxFlatMap$FlatMapMain.drain(FluxFlatMap.java:593)
  at reactor.core.publisher.FluxFlatMap$FlatMapMain.onError(FluxFlatMap.java:456)
  at reactor.core.publisher.SerializedSubscriber.onError(SerializedSubscriber.java:125)
  at reactor.core.publisher.FluxRetryWhen$RetryWhenMainSubscriber.whenError(FluxRetryWhen.java:230)
  at reactor.core.publisher.FluxRetryWhen$RetryWhenOtherSubscriber.onError(FluxRetryWhen.java:282)
  at reactor.core.publisher.FluxFlatMap$FlatMapMain.checkTerminated(FluxFlatMap.java:847)
  at reactor.core.publisher.FluxFlatMap$FlatMapMain.drainLoop(FluxFlatMap.java:613)
  at reactor.core.publisher.FluxFlatMap$FlatMapMain.drain(FluxFlatMap.java:593)
  at reactor.core.publisher.FluxFlatMap$FlatMapMain.onError(FluxFlatMap.java:456)
  at reactor.core.publisher.FluxFlatMap$FlatMapMain.onNext(FluxFlatMap.java:419)
  at reactor.core.publisher.SinkManyEmitterProcessor.drain(SinkManyEmitterProcessor.java:480)
  at reactor.core.publisher.SinkManyEmitterProcessor.tryEmitNext(SinkManyEmitterProcessor.java:278)
  at reactor.core.publisher.SinkManySerialized.tryEmitNext(SinkManySerialized.java:100)
  at reactor.core.publisher.InternalManySink.emitNext(InternalManySink.java:27)
  at reactor.core.publisher.FluxRetryWhen$RetryWhenMainSubscriber.onError(FluxRetryWhen.java:195)
  at reactor.core.publisher.FluxFlatMap$FlatMapMain.checkTerminated(FluxFlatMap.java:847)
  at reactor.core.publisher.FluxFlatMap$FlatMapMain.drainLoop(FluxFlatMap.java:613)
  at reactor.core.publisher.FluxFlatMap$FlatMapMain.drain(FluxFlatMap.java:593)
  at reactor.core.publisher.FluxFlatMap$FlatMapMain.onError(FluxFlatMap.java:456)
  at reactor.core.publisher.FluxFlatMap$FlatMapMain.onNext(FluxFlatMap.java:419)
  at reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner.onNext(MonoFlatMapMany.java:251)
  at reactor.core.publisher.FluxCreate$BufferAsyncSink.drain(FluxCreate.java:887)
  at reactor.core.publisher.FluxCreate$BufferAsyncSink.next(FluxCreate.java:812)
  at reactor.core.publisher.FluxCreate$SerializedFluxSink.next(FluxCreate.java:164)
  at io.modelcontextprotocol.client.transport.ResponseSubscribers$AggregateSubscriber.hookOnComplete(ResponseSubscribers.java:263)
  at reactor.core.publisher.BaseSubscriber.onComplete(BaseSubscriber.java:200)
  at org.reactivestreams.FlowAdapters$FlowToReactiveSubscriber.onComplete(FlowAdapters.java:221)
  at java.net.http/jdk.internal.net.http.LineSubscriberAdapter$LineSubscription.loop(LineSubscriberAdapter.java:430)
  at java.net.http/jdk.internal.net.http.common.SequentialScheduler$LockingRestartableTask.run(SequentialScheduler.java:182)
  at java.net.http/jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(SequentialScheduler.java:149)
  at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:207)
  at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:280)
  at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:233)
  at java.net.http/jdk.internal.net.http.LineSubscriberAdapter$LineSubscription.signalComplete(LineSubscriberAdapter.java:193)
  at java.net.http/jdk.internal.net.http.LineSubscriberAdapter.onComplete(LineSubscriberAdapter.java:114)
  at java.net.http/jdk.internal.net.http.common.HttpBodySubscriberWrapper.complete(HttpBodySubscriberWrapper.java:293)
  at java.net.http/jdk.internal.net.http.common.HttpBodySubscriberWrapper.onComplete(HttpBodySubscriberWrapper.java:401)
  at java.net.http/jdk.internal.net.http.ResponseContent$ChunkedBodyParser.accept(ResponseContent.java:220)
  at java.net.http/jdk.internal.net.http.ResponseContent$ChunkedBodyParser.accept(ResponseContent.java:131)
  at java.net.http/jdk.internal.net.http.Http1Response$BodyReader.handle(Http1Response.java:708)
  at java.net.http/jdk.internal.net.http.Http1Response$BodyReader.handle(Http1Response.java:636)
  at java.net.http/jdk.internal.net.http.Http1Response$Receiver.accept(Http1Response.java:528)
  at java.net.http/jdk.internal.net.http.Http1Response$BodyReader.tryAsyncReceive(Http1Response.java:666)
  at java.net.http/jdk.internal.net.http.Http1AsyncReceiver.flush(Http1AsyncReceiver.java:233)
  at java.net.http/jdk.internal.net.http.common.SequentialScheduler$LockingRestartableTask.run(SequentialScheduler.java:182)
  at java.net.http/jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(SequentialScheduler.java:149)
  at java.net.http/jdk.internal.net.http.common.SequentialScheduler$TryEndDeferredCompleter.complete(SequentialScheduler.java:324)
  at java.net.http/jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(SequentialScheduler.java:151)
  at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:207)
  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1090)
  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614)
  at java.base/java.lang.Thread.run(Thread.java:1474)
  Suppressed: java.lang.Exception: #block terminated with an error
    at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:104)
    at reactor.core.publisher.Mono.block(Mono.java:1773)
    at io.modelcontextprotocol.client.McpSyncClient.callTool(McpSyncClient.java:236)
    at org.labkey.professional.McpServerTest.callTool(McpServerTest.java:358)
    at org.labkey.professional.McpServerTest.callToolExpectingFailure(McpServerTest.java:342)
    at org.labkey.professional.McpServerTest.testBadTableParameters(McpServerTest.java:566)
    at org.labkey.professional.McpServerTest.getSourceForSavedQuery(McpServerTest.java:522)
    at org.labkey.professional.McpServerTest.invokeTools(McpServerTest.java:327)
    at org.labkey.professional.McpServerTest.testAdminAccess(McpServerTest.java:254)
    at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
    at java.base/java.lang.reflect.Method.invoke(Method.java:565)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)

Expected behavior
The session should continue to process requests and respond to the client without fatal errors.

Minimal Complete Reproducible example
See attached.

mcp-sdk-bug-repro.zip

cd mcp-sdk-bug-repro
mvn -q test

The test in src/test/java/io/modelcontextprotocol/bug/SessionRemovalBugReproTest.java:

  1. Builds a real HttpServletStreamableServerTransportProvider and a minimal McpSyncServer with a single tool.
  2. Issues an initialize POST and captures the server-assigned mcp-session-id.
  3. Issues a tools/call POST whose response writer is wired to a Writer that accepts the writes (so the SDK successfully serializes the SSE event into the response body) but throws IOException on flush() — exactly what an OutputStream backed by a closed socket does. This sets PrintWriter.trouble = true so writer.checkError() returns true.
  4. Asserts that:
    • The full SSE event payload was written to the response (data was delivered).
    • The session was removed from the SDK's sessions map (verified via reflection).
  5. Issues a second tools/call POST with the same session id and asserts that the SDK responds 404 Session not found — proving the session is unrecoverable from the client's perspective even though the prior call's response was actually delivered.

The test deterministically reproduces what is otherwise a flaky race between a fast client closing its socket and the server's post-write checkError().

Workaround
Wrap the response in an HttpServletResponseWrapper that returns a PrintWriter whose checkError() always returns false. This suppresses the false positive at the cost of closing the (already broken) post-write disconnect detection. Apply only on the POST path — keep checkError honest for the long-lived GET listening stream.

    private static class _LoggingResponseWrapper extends HttpServletResponseWrapper
    {
        private final String requestUri;
        private _LoggingPrintWriter loggingWriter;

        _LoggingResponseWrapper(HttpServletResponse response, String requestUri)
        {
            super(response);
            this.requestUri = requestUri;
        }

        @Override
        public PrintWriter getWriter() throws IOException
        {
            if (loggingWriter == null)
                loggingWriter = new _LoggingPrintWriter(super.getWriter(), requestUri);
            return loggingWriter;
        }
    }

    private static class _LoggingPrintWriter extends PrintWriter
    {
        private final String requestUri;

        _LoggingPrintWriter(PrintWriter delegate, String requestUri)
        {
            super(delegate);
            this.requestUri = requestUri;
        }

        @Override
        public boolean checkError()
        {
            // Workaround for MCP SDK 1.1.2 bug in HttpServletStreamableServerTransportProvider.sendEvent
            // (https://github.com/modelcontextprotocol/java-sdk). After writing/flushing an SSE event,
            // sendEvent calls writer.checkError() and throws "Client disconnected" if it returns true.
            // The catch block in sendMessage then removes the MCP session from the sessions map - even
            // though the SSE event data was already delivered to the client. For streamable HTTP, each
            // POST is a single-event exchange, so a client closing its socket after receiving the
            // response is normal HTTP/1.1 behavior, not an MCP session termination. The race between
            // the client's socket close and the server's checkError() call is exactly what makes this
            // test flaky on TeamCity. Returning false suppresses the false positive; if the client
            // truly lost data, it will retry on a fresh connection.
            boolean actual = super.checkError();
            if (actual)
            {
                LOG.info("MCP checkError suppressed (returning false): uri={}", requestUri);
            }
            return false;
        }
    }

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions