feat(discord): ingest webview transcripts into memory#1993
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughBuilds per-channel Discord transcripts from gateway MESSAGE_CREATE/MESSAGE_UPDATE frames using a session DiscordIngestState; backend parses websocket frames, updates in-memory channel/message caches, emits UI discord_memory_ingest snapshots, and queues async memory upserts. Frontend extends ingest types and routes Discord events to avoid duplicate persistence; tests added for both sides. ChangesDiscord Gateway-Driven Message Ingestion
🎯 4 (Complex) | ⏱️ ~75 minutes
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/src-tauri/src/discord_scanner/mod.rs`:
- Around line 1074-1079: The current tokio::spawn block calling
post_memory_doc_ingest(&acct, &payload) can run many concurrent upserts for the
same document and let older snapshots overwrite newer ones; change this to
serialize or coalesce per-document/channel so only one upsert runs at a time and
only the latest snapshot is sent. Implement a per-channel/doc coordination
mechanism (e.g., a shared HashMap keyed by acct or channel id that holds a small
mpsc sender or a Mutex/lock per key) and send payloads into that per-key queue
so a single consumer will debounce/coalesce incoming snapshots and call
post_memory_doc_ingest in order; ensure the code paths that now call
tokio::spawn use that coordinator instead of spawning raw tasks so
post_memory_doc_ingest is invoked serially (or with latest-only coalescing) for
each acct/channel.
- Around line 195-205: The MESSAGE_UPDATE handling currently replaces the cached
DiscordPersistMessage wholesale (DiscordPersistMessage / state.messages), which
drops fields omitted from partial events; change the logic to find the existing
message by id and apply partial updates only to the fields present in the
incoming update (e.g., update body/content, edited timestamp_ms, and source_ref
only when the event provides them) while preserving existing author, author_id,
timestamp_ms and other fields when they are absent; use the event's
Option-wrapped fields to decide which fields to set on the existing message, and
if required data is completely missing (e.g., content missing because
MESSAGE_CONTENT intent is absent), either trigger a REST fetch for the full
message or emit a warning log instead of overwriting the cached entry.
- Around line 992-995: The upsert key currently uses channel_name when
channel_key_looks_clean(channel_name) is true, which can cause collisions across
guilds; change the key logic in the params construction so it uses a stable
identifier that includes channel_id (for example combine channel_id with
channel_name or always use channel_id) instead of relying solely on
channel_name—update the key expression where params is built (referencing
channel_key_looks_clean, channel_name, channel_id, and the
"discord-web:{account_id}" namespace) to ensure uniqueness across channels.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f5015cae-1202-464f-815a-c5fb91c64a03
📒 Files selected for processing (3)
app/src-tauri/src/discord_scanner/mod.rsapp/src/services/__tests__/webviewAccountService.discord.test.tsapp/src/services/webviewAccountService.ts
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/src-tauri/src/discord_scanner/mod.rs`:
- Around line 199-220: The current update unconditionally replaces existing.body
when body is Some, which lets embed/attachment-only updates (from
discord_message_body(value)) overwrite the cached user-visible text; change the
update logic in the message-update block so existing.body is only replaced when
the incoming update explicitly contains the authoritative body fields by
checking discord_message_body_fields_present(value) (i.e., require
discord_message_body_fields_present(value) && body.is_some() before assigning
existing.body = next_body); leave existing.body untouched for partial updates
that lack those body fields and continue to update
author/author_id/timestamp_ms/source_ref as before.
- Around line 1102-1112: The worker currently creates an unbounded mpsc channel
(tx/rx) carrying full transcript Value snapshots and only coalesces on the
receiver side after post_memory_doc_ingest awaits, which allows an unbounded
memory backlog; replace this with a latest-only size-1 handoff (e.g.,
tokio::sync::watch::channel or a custom size-1 overwrite queue) so new snapshots
overwrite the previous pending snapshot before the slow RPC completes.
Concretely: stop using mpsc::unbounded_channel::<Value>(), create a watch
channel (or equivalent) and use tx.send(...) to update the latest snapshot from
the producer side (use clone() of Value as needed), and on the spawned task use
rx.changed().await (or check the slot) then read rx.borrow().clone() into
next_payload and call post_memory_doc_ingest(&account_id_for_task,
&next_payload). Keep worker_key_for_task/account_id_for_task usage the same and
ensure Value is cloned when writing to the shared slot so only one pending
payload is ever buffered.
- Around line 131-138: The branch handling "CHANNEL_CREATE" | "CHANNEL_UPDATE" |
"THREAD_CREATE" | "THREAD_UPDATE" updates cached channel metadata via
apply_channel_meta but returns Vec::new(), so UI/store never sees the updated
title; instead, after calling self.apply_channel_meta(...) produce and return a
refreshed snapshot for the affected channel/thread (use the channel/thread id
from data.get("id") or the appropriate field and the guild_id already extracted)
by calling your existing snapshot/emit helper (e.g.,
self.emit_snapshot_for_channel or self.create_channel_snapshot) or constructing
the snapshot Event and returning vec![snapshot_event] so the updated metadata is
emitted to the UI/memory store.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c74c4b72-390d-4b30-9e58-40cbaea6f675
📒 Files selected for processing (1)
app/src-tauri/src/discord_scanner/mod.rs
Main gained `m.sender` accesses in the discord-ingest PR (tinyhumansai#1993). Without .passthrough(), TypeScript correctly rejects the property access — add sender to the schema to keep parity with the IngestMessage interface and satisfy the type checker. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Summary
discord_scannerinstead of only emitting raw transport envelopes.openhuman.memory_doc_ingestso context is stored even when the React UI is not listening.discord_memory_ingestevent for normalized transcript updates while skipping duplicate frontend memory writes.Problem
Solution
READY,GUILD_CREATE,CHANNEL_*,THREAD_*,MESSAGE_CREATE, andMESSAGE_UPDATEevents.discord_memory_ingestUI event for normalized transcript refreshes and teachwebviewAccountServiceto update account UI state from that event without duplicating the memory write.Submission Checklist
diff-cover) meet the gate enforced by.github/workflows/coverage.yml. Runpnpm test:coverageandpnpm test:rustlocally; PRs below 80% on changed lines will not merge. N/A: relied on CI diff-cover gate; local focused/full validation run below.docs/TEST-COVERAGE-MATRIX.mdreflect this change (orN/A: behaviour-only change) N/A: behaviour-only change.## RelatedN/A: no matrix row changes.docs/RELEASE-MANUAL-SMOKE.md) N/A: no release manual smoke checklist change needed.Closes #NNNin the## Relatedsection N/A: no GitHub issue was provided for closure.Impact
Related
AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
feat/discord-webview-memory6208b36eValidation Run
pnpm --filter openhuman-app format:checkpnpm typecheckpnpm exec vitest run src/services/__tests__/webviewAccountService.discord.test.tsandcargo test --manifest-path app/src-tauri/Cargo.toml discord_scanner -- --nocapture; also ran full app Vitest suite viapnpm testcargo fmt --manifest-path app/src-tauri/Cargo.toml --all --checkandcargo check --manifest-path app/src-tauri/Cargo.tomlcargo fmt --manifest-path app/src-tauri/Cargo.toml --all --checkandcargo check --manifest-path app/src-tauri/Cargo.tomlValidation Blocked
command:N/Aerror:N/Aimpact:N/ABehavior Changes
Parity Contract
discordingestevents no longer trigger generic frontend memory writes; normalizeddiscord_memory_ingestupdates refresh UI state while scanner-side RPC remains the single write path.Duplicate / Superseded PR Handling
Summary by CodeRabbit
New Features
Bug Fixes
Tests