Skip to content

fix: bound TC-deferral assembly window to first-arrival + max delay#1732

Merged
bdraco merged 4 commits into
python-zeroconf:masterfrom
bluetoothbot:koan/fix-issue-1726
May 20, 2026
Merged

fix: bound TC-deferral assembly window to first-arrival + max delay#1732
bdraco merged 4 commits into
python-zeroconf:masterfrom
bluetoothbot:koan/fix-issue-1726

Conversation

@bluetoothbot
Copy link
Copy Markdown
Contributor

@bluetoothbot bluetoothbot commented May 18, 2026

Summary

AsyncListener.handle_query_or_defer re-armed self._timers[addr] on every new TC packet, so a peer streaming truncated queries faster than the 400–500 ms random delay could slide the flush deadline forward indefinitely and keep the per-addr deferred list growing. RFC 6762 §18.5 frames that delay as a fixed reassembly budget starting at first arrival, not a sliding heartbeat.

The fix tracks first-arrival in _deferred_deadlines and caps each scheduled fire time at first_arrival + _TC_DELAY_RANDOM_INTERVAL[1]. If the next packet would push the timer past that deadline the existing TimerHandle is left in place so the queue still flushes on schedule.

Fixes #1726

Changes

  • _listener.py: add _deferred_deadlines slot, bound handle_query_or_defer reset at the per-addr deadline, clear it in _respond_query.
  • _listener.pxd: declare the new cython.dict and the locals used in the bounded path.
  • tests/test_core.py: regression test that pins random.randint to the max delay and asserts subsequent TC packets do not push _timers[addr].when() past the deadline established by the first packet.

Test plan

  • SKIP_CYTHON=1 poetry run pytest --timeout=60 tests/ — 359 passed, 2 skipped.
  • New tests/test_core.py::test_tc_bit_defer_window_is_bounded fails on master (timer slides forward) and passes with the fix.
  • poetry run ruff check and poetry run ruff format --check clean on the touched files.

Generated by Kōan /fix


Quality Report

Changes: 3 files changed, 65 insertions(+), 3 deletions(-)

Code scan: clean

Tests: passed (4 PASSED)

Branch hygiene: clean

Generated by Kōan post-mission quality pipeline

@codecov
Copy link
Copy Markdown

codecov Bot commented May 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.76%. Comparing base (0e201f7) to head (4b3c475).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1732   +/-   ##
=======================================
  Coverage   99.76%   99.76%           
=======================================
  Files          33       33           
  Lines        3455     3472   +17     
  Branches      478      482    +4     
=======================================
+ Hits         3447     3464   +17     
  Misses          5        5           
  Partials        3        3           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 18, 2026

Merging this PR will not alter performance

✅ 14 untouched benchmarks


Comparing bluetoothbot:koan/fix-issue-1726 (4b3c475) with master (0e201f7)

Open in CodSpeed

@bdraco bdraco requested a review from Copilot May 19, 2026 21:56
Comment thread src/zeroconf/_listener.py Outdated
@bdraco
Copy link
Copy Markdown
Member

bdraco commented May 19, 2026

@bluetoothbot rr

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a denial-of-service vector in the mDNS truncated-query (TC bit) deferral path by preventing the per-source “reassembly window” timer from being continually pushed out by repeated TC packets, aligning behavior with RFC 6762 §18.5’s fixed reassembly budget starting at first arrival.

Changes:

  • Track per-source first-arrival deadlines for TC reassembly and cap subsequent timer scheduling to first_arrival + max_delay.
  • Clear the tracked deadline when deferred packets are flushed/assembled.
  • Add a regression test ensuring subsequent TC packets cannot move the scheduled flush later than the first packet’s deadline.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/zeroconf/_listener.py Adds _deferred_deadlines and caps TC deferral timer scheduling to a fixed per-addr deadline.
src/zeroconf/_listener.pxd Updates Cython declarations/locals for the new deadline-tracking logic.
tests/test_core.py Adds a regression test asserting the TC deferral timer does not slide past the first-arrival deadline.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/test_core.py
@bluetoothbot
Copy link
Copy Markdown
Contributor Author

bluetoothbot commented May 19, 2026

PR Review — fix: bound TC-deferral assembly window to first-arrival + max delay

Correct fix for the unbounded TC-deferral assembly window. The helper extraction addresses bdraco's earlier feedback, the _deferred_deadlines slot/pxd declaration is consistent with the existing Cython conventions in this module (matches the _deferred/_timers pattern, paired with proper _respond_query cleanup), and the deadline math is sound: first-arrival fixes the budget at now + _TC_DELAY_RANDOM_INTERVAL[1], subsequent packets are accepted only when now_i + delay < deadline, otherwise the existing timer is left alone via the -1.0 sentinel. The new regression test is well-targeted (pins randint to max so any unbounded behaviour would be unmistakable), and the dedupe branch correctly short-circuits before touching the deadline. Two minor nits: the modified existing test only has 100ms of wall-clock headroom which could be brittle on slower CI matrix entries (3.14t in particular), and a couple of comments lean toward restating the code rather than explaining intent. None of this is blocking — ship it.


🟢 Suggestions

1. Modified existing test relies on <100ms wall-clock headroom between packets (`tests/test_core.py`, L702-736)

Pinning random.randint to _TC_DELAY_RANDOM_INTERVAL[0] (min, 400ms) leaves only max - min = 100ms of headroom before the bounded deadline rejects a timer reset. The assertions timer1.cancelled()/timer2 != timer1, timer4 != timer3, etc. all depend on now_i + 400ms < now_1 + 500ms, i.e. each threadsafe_query call has to land within 100ms of the first one. On loopback this is normally fine, but under a loaded CI runner (especially the 3.14t free-threaded matrix entry, which is consistently slower) the third or fourth packet can drift past the deadline and silently flip these assertions. Two options to harden it: (a) freeze loop.time() via a monkeypatch so the test is wall-clock-independent, or (b) widen the headroom by patching _TC_DELAY_RANDOM_INTERVAL to e.g. (10, 5000) for the duration of the test so the deadline budget far exceeds any plausible test latency. Not blocking, but the slowest matrix entry is exactly where this kind of test tends to flake first.

min_delay_ms = _TC_DELAY_RANDOM_INTERVAL[0]
with patch("zeroconf._listener.random.randint", return_value=min_delay_ms):
    ...

Checklist

  • Correctness of deadline arithmetic (first-arrival anchored, not sliding)
  • Deferred state cleanup paired (_deferred / _deferred_deadlines / _timers)
  • Cython .pxd updated in lockstep with .py signature/slot changes
  • Sentinel value safe against real loop.time() values — suggestion #2
  • Regression test exercises the bounded path
  • Modified existing test stable under CI load — suggestion #1
  • Docstring/comment style matches CLAUDE.md (one-line, why-not-what) — suggestion #3
  • No resource leaks (timers, deferred queues, deadlines all cleaned in _respond_query)
  • No mutable default args / unsafe eval / unsafe deserialization
  • Free-threading-safe (no module-level mutable state added)

Summary

Correct fix for the unbounded TC-deferral assembly window. The helper extraction addresses bdraco's earlier feedback, the _deferred_deadlines slot/pxd declaration is consistent with the existing Cython conventions in this module (matches the _deferred/_timers pattern, paired with proper _respond_query cleanup), and the deadline math is sound: first-arrival fixes the budget at now + _TC_DELAY_RANDOM_INTERVAL[1], subsequent packets are accepted only when now_i + delay < deadline, otherwise the existing timer is left alone via the -1.0 sentinel. The new regression test is well-targeted (pins randint to max so any unbounded behaviour would be unmistakable), and the dedupe branch correctly short-circuits before touching the deadline. Two minor nits: the modified existing test only has 100ms of wall-clock headroom which could be brittle on slower CI matrix entries (3.14t in particular), and a couple of comments lean toward restating the code rather than explaining intent. None of this is blocking — ship it.


Automated review by Kōane0eb9e0
2cad649
ac9c609
5504ee6
c6d555c

@bluetoothbot
Copy link
Copy Markdown
Contributor Author

Rebase with requested adjustments

Branch koan/fix-issue-1726 was rebased onto master and review feedback was applied.

Changes applied

  • Summary

  • Extracted TC-deferral deadline bookkeeping into cdef double _compute_deferred_fire_at(self, object addr, double now, double delay) helper per @bdraco / @bluetoothbot. Returns -1.0 sentinel meaning "keep existing timer"; otherwise returns bounded fire_at. Call site in handle_query_or_defer now collapses to: compute fire_at, bail if < 0.0, else cancel + loop.call_at. RFC 6762 §18.5 reassembly-budget comment moved onto the helper.
  • Added matching cdef declaration plus @cython.locals(deadline=object, fire_at=double) to _listener.pxd. Dropped the now-unused deadline=object local from handle_query_or_defer.
  • Per @Copilot, the regression test no longer hard-codes 500. It imports _TC_DELAY_RANDOM_INTERVAL from zeroconf._listener and derives max_delay_ms = _TC_DELAY_RANDOM_INTERVAL[1], so the test tracks the constant if it changes.
  • CI fix: Applied pre-push CI fix
  • CI fix: Pre-push CI fix applied

Stats

3 files changed, 111 insertions(+), 34 deletions(-)
Actions performed
  • Already-solved check: negative (confidence=high, reasoning=No commit on master touches TC-deferral timer bounding in _listener; recent fixes target NSEC bitmap)
  • Rebased koan/fix-issue-1726 onto upstream/master
  • Applied review feedback
  • Pre-push CI check: previous run #26019837455 failed
  • Applied pre-push CI fix
  • Pre-push CI fix applied
  • Force-pushed koan/fix-issue-1726 to origin
  • CI check enqueued in ## CI (async)

CI status

CI will be checked asynchronously.


Automated by Kōan

Comment thread src/zeroconf/_listener.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

@bdraco
Copy link
Copy Markdown
Member

bdraco commented May 20, 2026

@bluetoothbot rebase

@bluetoothbot
Copy link
Copy Markdown
Contributor Author

Rebase with requested adjustments

Branch koan/fix-issue-1726 was rebased onto master and review feedback was applied.

Changes applied

  • ents on the sentinel-return path and on each return of _compute_deferred_fire_at, addressing @bdraco's request.
  • Summary

  • src/zeroconf/_listener.py: added inline comments at the three return paths to explain (per @bdraco) what each one means:
  • At the call site in handle_query_or_defer, the -1.0 sentinel branch now explains that a reset would push the flush past the per-addr reassembly deadline, so the existing TimerHandle is left alone.
  • In _compute_deferred_fire_at, the three returns now each have a one-line comment: -1.0 keeps the existing timer; clamped-deadline covers the edge case where the very first packet already proposes a fire-time past the budget; bare fire_at is the in-budget path.
  • The earlier review items (extract the bounded fire-time into a cdef helper per @bdraco / @bluetoothbot, and derive the patched delay from _TC_DELAY_RANDOM_INTERVAL[1] per @Copilot) were already landed in commits 2cad649 and ac9c609; nothing more to do there.

Stats

3 files changed, 120 insertions(+), 34 deletions(-)
Actions performed
  • Already-solved check: negative (confidence=high, reasoning=No commit on master addresses the TC-deferral sliding-window bug in AsyncListener.handle_query_or_de)
  • Rebased koan/fix-issue-1726 onto upstream/master
  • Applied review feedback
  • Pre-push CI check: previous run passed
  • Force-pushed koan/fix-issue-1726 to origin
  • CI check enqueued in ## CI (async)

CI status

CI will be checked asynchronously.


Automated by Kōan

@bdraco
Copy link
Copy Markdown
Member

bdraco commented May 20, 2026

@bluetoothbot rr

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

A peer streaming truncated queries faster than the 400–500 ms random
delay used to rearm `_timers[addr]` could keep extending the flush
deadline indefinitely, so the per-addr deferred list never drained
(RFC 6762 §18.5 expects the window to be a fixed reassembly budget
starting at first arrival, not a sliding heartbeat).

Track first-arrival in `_deferred_deadlines` and cap each scheduled
fire time at `first_arrival + _TC_DELAY_RANDOM_INTERVAL[1]`. If the
next packet would push the timer past that deadline, leave the
existing TimerHandle in place so the queue flushes on schedule.
@bluetoothbot
Copy link
Copy Markdown
Contributor Author

Rebase with requested adjustments

Branch koan/fix-issue-1726 was rebased onto master and review feedback was applied.

Stats

3 files changed, 120 insertions(+), 34 deletions(-)
Actions performed

CI status

CI will be checked asynchronously.


Automated by Kōan

@bluetoothbot bluetoothbot force-pushed the koan/fix-issue-1726 branch from c6d555c to 4b3c475 Compare May 20, 2026 05:19
@bdraco bdraco marked this pull request as ready for review May 20, 2026 12:18
@bdraco bdraco merged commit a096238 into python-zeroconf:master May 20, 2026
36 checks passed
@bluetoothbot bluetoothbot deleted the koan/fix-issue-1726 branch May 20, 2026 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Security: TC-deferral timer is rearmed on every new packet, giving an attacker unbounded growth window

3 participants