Skip to content
8 changes: 8 additions & 0 deletions .server-changes/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,14 @@ Speed up batch queue processing by removing stalls and fixing retry race

The body text (below the frontmatter) is a one-line description of the change. Keep it concise — it will appear in release notes.

### Writing guidance

These entries are public-facing - they ship verbatim in user-visible release notes. A few rules to keep them clean:

- **One sentence is usually enough.** The body is the bullet in the changelog. If you need a paragraph, you're probably describing the implementation rather than the change.
- **Describe behavior, not implementation.** Skip internal scopes, middleware names, library specifics, framework internals. Users care about what's different for them, not how it's wired.
- **Never name internal tools or infra.** Observability stacks, internal services, infra components, monitoring backends, CI surfaces, AWS specifics - none of these belong in user-facing notes.

## Lifecycle

1. Engineer adds a `.server-changes/` file in their PR
Expand Down
6 changes: 6 additions & 0 deletions .server-changes/supervisor-compute-traceparent-forwarding.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
area: supervisor
type: improvement
---

Forward `traceparent` headers on outbound calls to the compute provider so distributed traces stay continuous across services.
6 changes: 6 additions & 0 deletions .server-changes/supervisor-op-field-coverage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
area: supervisor
type: improvement
---

Tag every supervisor structured event with `op` and `kind` fields for consistent filtering and aggregation.
6 changes: 6 additions & 0 deletions .server-changes/supervisor-snapshot-lifecycle-events.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
area: supervisor
type: improvement
---

Add observability events at the schedule, dispatch, and callback phases of the snapshot lifecycle.
6 changes: 6 additions & 0 deletions .server-changes/supervisor-wide-events.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
area: supervisor
type: feature
---

Wide-event observability for the dequeue loop, workload-server routes, and run socket lifecycle. Off by default behind `TRIGGER_WIDE_EVENTS_ENABLED`.
Comment thread
nicktrn marked this conversation as resolved.
9 changes: 9 additions & 0 deletions apps/supervisor/src/env.ts
Original file line number Diff line number Diff line change
Expand Up @@ -256,6 +256,15 @@ const Env = z
// Debug
DEBUG: BoolEnv.default(false),
SEND_RUN_DEBUG_LOGS: BoolEnv.default(false),

// Wide-event observability - off by default. Emits one flat-keyed JSON
// line per natural unit of work (dequeue iteration, HTTP request, socket
// lifecycle). High-QPS hotpath, so the kill switch must be honoured.
TRIGGER_WIDE_EVENTS_ENABLED: BoolEnv.default(false),
// When true, also emit wide events for high-frequency HTTP routes
// (heartbeat, snapshots-since, logs/debug). Off in prod to keep event
// volume manageable; on in test environments for full-fidelity debugging.
TRIGGER_WIDE_EVENTS_NOISY_ROUTES: BoolEnv.default(false),
})
.superRefine((data, ctx) => {
if (data.COMPUTE_SNAPSHOTS_ENABLED && !data.TRIGGER_METADATA_URL) {
Expand Down
Loading
Loading