[copilot-session-insights] Daily Copilot Agent Session Analysis — 2026-05-18 #32986

2026-05-18T08:13:03Z

github-actions[bot]
Bot May 18, 2026

🤖 Copilot Agent Session Analysis — 2026-05-18

Executive Summary

Sessions Analyzed: 50 (most recent workflow runs)
Analysis Period: 2026-05-18 03:01 → 04:03 UTC
Completion Rate: 22% (success conclusion) — up from 8% on the prior two days
Average Active-Session Duration: 9.2 min (median 8.0 min)
Experimental Strategy: None this run (random roll did not trigger 30% experimental path)

⚠️ Data limitation: Per-session conversation transcripts were not retrievable (gh auth login was required for the transcript export). Behavioral analysis below is derived from workflow-run metadata and aggregate trend signals only. Recommendations that depend on internal-monologue inspection are flagged as [needs transcript].

Key Metrics

Metric	Value	Trend vs prior day
Total Sessions	50	→
Successful Completions	11 (22%)	↑
Failed	2 (4%)	↑
`action_required`	37 (74%)	↓
Active Sessions (non-zero duration)	13 (26%)	↑ (was 4)
Avg Active Duration	9.2 min	↓ (was 13.9)
Median Active Duration	8.0 min	↓ (was 15.1)
Long-running Sessions (>10 min)	6	↑ (was 3)

📈 Session Trends Analysis

Completion Patterns

Completion rate climbed from a flat 8% baseline on 05-16 and 05-17 to 22% today — the first material improvement in the three-day window. Successful runs nearly tripled (4 → 11) while failed/action_required runs dropped from 46 to 39, so the gain is real movement out of the failure bucket, not just more total activity.

Duration & Efficiency

The number of sessions that actually ran (non-zero duration) more than tripled (4 → 13) while median duration fell from 15.1 min to 8.0 min — agents are doing more real work and finishing it faster. Long-running (>10 min) sessions doubled in absolute count (3 → 6) but their share of active sessions dropped from 75% to 46%, suggesting fewer stuck/looping runs as a proportion of real work.

Success Factors ✅

Patterns associated with successful or fast completions today:

Running Copilot cloud agent workflows ran to completion: 4 invocations, all completing in 11–18 min — these are the main agent-driven sessions and all completed without entering the action_required bucket.
PR comment response loop is healthy: 4 Addressing comment on PR #XXXXX workflows (PRs Harden privileged checkout path in q.lock.yml for comment-triggered runs #32906/fix(safe-output): prevent silent 422 on PR review submission #32910/fix(daily-model-inventory): remove runner-host /reflect pre-step and query reflect in-agent #32938/Revert default firewall/MCP gateway bump from ac0fd258 #32944) finished in 8–12 min each. These are short, focused agent invocations triggered by reviewer feedback — a tight scope clearly helps.
Smoke CI / CGO / CJS / Doc Build all ran in 5–7 min — the supporting checks that complement agent work are stable and fast.

Failure Signals ⚠️

action_required dominates non-active runs: 37 of 50 runs (74%) ended in action_required. Most are zero-duration workflows like Q, Agentic Commands, Label Closed PRs — they trigger but never start work. This pattern has held for 3 days. Action: investigate whether these workflows are firing on events they shouldn't, or whether a required check / approval gate is blocking them by default.
No conversation-log visibility today: The transcript export required OAuth that was not provisioned to this run, so behavior-level loop detection / reasoning-quality scoring could not be performed. The same will recur every run until the auth path is fixed. Action: provision the OAuth token in the copilot-session-data-fetch step or fall back to GitHub Actions API for the per-run logs.
2 outright failure conclusions today (none in prior two days) — small absolute number but worth a follow-up to see whether they share a workflow.

Prompt Quality Analysis 📝

Quality scoring from agent transcripts requires conversation logs (unavailable this run). Indirect signals from workflow naming and PR titles:

High-quality indicators (today's PR-driven sessions)

Concrete PR references (Addressing comment on PR #32906): 4/4 finished in <12 min — narrow scope = fast feedback.
Targeted fix branches (copilot/fix-code-scanning-alert-585, copilot/aw-fix-daily-compiler-optimizer): specific, actionable names mapped 1:1 to agent runs.

Low-quality indicators

Generic workflow names (Q, Agentic Commands) with 16 invocations each but most ending in action_required and zero duration — suggests trigger noise rather than purposeful sessions. [needs transcript] to confirm whether these are user-driven or scheduled-but-skipped.

Orphaned Branch Escalation Alerts 🚨

Branches with ≥5 simultaneous gate firings and no Copilot agent assigned for >2 hours.

Summary

Open PRs scanned: 22
In-progress workflow runs (last 6h): 3 — all on main, none on PR branches
Orphaned Branches Today: 0 out of 22 (0%)
Historical Baseline: ~40% orphaned rate
Status: ✅ NORMAL (well below the 50% elevated threshold)

Escalation Candidates

✅ No orphaned branches exceed the escalation threshold today.

The two open PRs that have been waiting longest (#32906 ~11h, #32911 ~9h) both have Copilot already assigned alongside pelikhan, so they are not orphaned. The remaining open PRs are <4h old and below the wait-time threshold.

CI Waste Estimate

Orphaned gate-hours today: 0 gate-hours wasted
Recoverable capacity: 0 minutes (no escalation candidates)

Notable Observations

Loop Detection (proxy: duration > 10 min)

Sessions with long runs: 6 of 13 active (46%)
Longest: Running Copilot cloud agent at 17.6 min
Note: True loop detection requires conversation transcripts. Duration alone may flag legitimate long tasks. [needs transcript]

Workflow Mix

Workflow	Runs	Notes
`Agentic Commands`	16	mostly zero-duration; trigger noise candidate
`Q`	16	mostly zero-duration; trigger noise candidate
`Label Closed PRs`	5	maintenance workflow
`Running Copilot cloud agent`	4	primary agent runs — all completed
`Addressing comment on PR #...`	4	reviewer-feedback loop
`Smoke CI`	2	supporting CI
`CGO` / `CJS` / `Doc Build - Deploy`	1 each	supporting CI

Context Issues

Not assessable without transcripts. [needs transcript]

Experimental Analysis

Standard analysis only — no experimental strategy this run (random roll did not trigger the 30% experimental path).

Actionable Recommendations

For Users Writing Task Descriptions

Reference concrete PR / issue numbers in workflow inputs — today's Addressing comment on PR #XXXXX runs averaged 9 min and 4/4 completed cleanly. Naming the artifact you're acting on is the single highest-signal change.
Use targeted fix branches (copilot/fix-..., copilot/aw-fix-...) — these mapped 1:1 to successful agent sessions today and made the scope reviewable from the branch name alone.
[needs transcript] more granular prompt-quality recommendations require behavioral data; revisit once conversation logs are restored.

For System Improvements

Investigate the action_required baseline (74% of runs) — Impact: High. Most of these are zero-duration Q and Agentic Commands runs. Either they're firing on triggers they shouldn't, or a required check/approval is blocking by default. Either way, that's ~37 wasted Actions invocations per scan window.
Restore conversation-log visibility — Impact: High. Today's analysis fell back to workflow metadata because gh auth login was required for the transcript export. Behavioral metrics (loop detection, reasoning quality, context-confusion detection) are blocked until this is fixed.
Track Running Copilot cloud agent duration as the primary efficiency KPI — Impact: Medium. It's the canonical agent-work signal; today's 11–18 min band is a reasonable target.

For Tool Development

Per-session transcript fetcher — Frequency: every run. Replace the OAuth-dependent path with a gh api-based fetch the workflow already has permission for.
Auto-classify zero-duration runs — Frequency: ~37/50 today. A simple duration == 0 filter would let charts focus on real activity and reduce noise in the action_required bucket.

Trends Over Time

Date	Total	Success	Rate	Active	Avg Active (min)
2026-05-16	50	4	8%	4	8.8
2026-05-17	50	4	8%	4	13.9
2026-05-18	50	11	22%	13	9.2

Completion rate: ↑↑ first material improvement (+14 pts) in the three-day window
Active-session count: ↑↑ 3.25× more sessions actually doing work
Median active duration: ↓ from 15.1 → 8.0 min (47% faster)
Long-running share: ↓ from 75% → 46% of active sessions

Statistical Summary

Total Sessions Analyzed:     50
Successful Completions:      11 (22%)
Failed Sessions:             2  (4%)
action_required:             37 (74%)
In-Progress Sessions:        0  (0%)

Active (non-zero) Sessions:  13 (26% of total)
Average Active Duration:     9.2 min
Median Active Duration:      8.0 min
Longest Session:             17.6 min  (Running Copilot cloud agent)
Shortest Active Session:     1.9 min   (CGO)

Long-running (>10 min):      6 active sessions (46%)
Context Issues:              [needs transcript]
Tool Failures:               [needs transcript]

Orphaned Branches:           0 of 22 open PRs (vs ~40% baseline)
In-progress workflow runs:   3 (all on main)

Next Steps

Review the action_required 74%-share root cause with the workflow owners (Q, Agentic Commands)
Restore conversation-log fetch path so behavioral metrics can resume next run
Track whether today's 22% completion rate holds for >3 days before declaring a sustained improvement
Re-run with experimental strategy enabled (semantic clustering or tool-usage patterns) once transcripts are available

References:

§26020802938

Analysis generated automatically on 2026-05-18
Run ID: 26020802938
Workflow: Copilot Session Insights

Generated by 📊 Copilot Session Insights · ● 12.6M · ◷

expires on May 19, 2026, 8:13 AM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[copilot-session-insights] Daily Copilot Agent Session Analysis — 2026-05-18 #32986

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

[copilot-session-insights] Daily Copilot Agent Session Analysis — 2026-05-18 #32986

Uh oh!

github-actions[bot] Bot May 18, 2026

🤖 Copilot Agent Session Analysis — 2026-05-18

Executive Summary

Key Metrics

📈 Session Trends Analysis

Completion Patterns

Duration & Efficiency

Success Factors ✅

Failure Signals ⚠️

Prompt Quality Analysis 📝

High-quality indicators (today's PR-driven sessions)

Low-quality indicators

Orphaned Branch Escalation Alerts 🚨

Summary

Escalation Candidates

CI Waste Estimate

Notable Observations

Loop Detection (proxy: duration > 10 min)

Workflow Mix

Context Issues

Experimental Analysis

Actionable Recommendations

For Users Writing Task Descriptions

For System Improvements

For Tool Development

Trends Over Time

Statistical Summary

Next Steps

Replies: 0 comments

github-actions[bot]
Bot May 18, 2026