Skip to content

feat(discord): ingest webview transcripts into memory#1993

Merged
senamakel merged 4 commits into
tinyhumansai:mainfrom
senamakel:feat/discord-webview-memory
May 18, 2026
Merged

feat(discord): ingest webview transcripts into memory#1993
senamakel merged 4 commits into
tinyhumansai:mainfrom
senamakel:feat/discord-webview-memory

Conversation

@senamakel
Copy link
Copy Markdown
Member

@senamakel senamakel commented May 17, 2026

Summary

  • Add Discord gateway transcript extraction in the Tauri discord_scanner instead of only emitting raw transport envelopes.
  • Persist normalized Discord channel transcripts directly to openhuman.memory_doc_ingest so context is stored even when the React UI is not listening.
  • Keep the webview UI responsive by emitting a dedicated discord_memory_ingest event for normalized transcript updates while skipping duplicate frontend memory writes.
  • Add focused tests for Discord webview event handling and scanner transcript assembly/update behavior.

Problem

  • Discord was already embeddable in the webview rail, but the adapter stopped at transport observation.
  • The scanner emitted raw CDP HTTP/WebSocket events and sidebar snapshots, but it did not turn Discord message activity into durable memory context like the other native-scanned providers.
  • The React listener's generic ingest path was also the wrong place to persist Discord data because it would have stored raw transport payloads as memory documents and only worked while the app window was actively listening.

Solution

  • Added a Discord gateway ingest state machine in the Tauri scanner that consumes READY, GUILD_CREATE, CHANNEL_*, THREAD_*, MESSAGE_CREATE, and MESSAGE_UPDATE events.
  • Normalize those events into per-channel transcript snapshots with stable channel metadata, author labels, timestamps, and permalinks, then upsert them into core memory through authenticated local RPC.
  • Emit a dedicated discord_memory_ingest UI event for normalized transcript refreshes and teach webviewAccountService to update account UI state from that event without duplicating the memory write.
  • Leave raw HTTP/gateway transport observation in place for diagnostics, but keep it out of the generic frontend memory-ingest path.

Submission Checklist

  • Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy
  • Diff coverage ≥ 80% — changed lines (Vitest + cargo-llvm-cov merged via diff-cover) meet the gate enforced by .github/workflows/coverage.yml. Run pnpm test:coverage and pnpm test:rust locally; PRs below 80% on changed lines will not merge. N/A: relied on CI diff-cover gate; local focused/full validation run below.
  • Coverage matrix updated — added/removed/renamed feature rows in docs/TEST-COVERAGE-MATRIX.md reflect this change (or N/A: behaviour-only change) N/A: behaviour-only change.
  • All affected feature IDs from the matrix are listed in the PR description under ## Related N/A: no matrix row changes.
  • No new external network dependencies introduced (mock backend used per Testing Strategy)
  • Manual smoke checklist updated if this touches release-cut surfaces (docs/RELEASE-MANUAL-SMOKE.md) N/A: no release manual smoke checklist change needed.
  • Linked issue closed via Closes #NNN in the ## Related section N/A: no GitHub issue was provided for closure.

Impact

  • Desktop/Tauri runtime only for the new ingest path; no mobile/web change.
  • Improves memory/context completeness for Discord webview accounts and avoids storing raw CDP transport blobs as user memory documents.
  • No config or migration change; transcripts land in the existing local memory document store under a Discord-specific namespace.

Related

  • Closes: N/A - no linked GitHub issue provided.
  • Follow-up PR(s)/TODOs:
    • Consider response-body backfill if Discord history replay beyond live gateway events becomes necessary.

AI Authored PR Metadata (required for Codex/Linear PRs)

Keep this section for AI-authored PRs. For human-only PRs, mark each field N/A.

Linear Issue

  • Key: N/A
  • URL: N/A

Commit & Branch

  • Branch: feat/discord-webview-memory
  • Commit SHA: 6208b36e

Validation Run

  • pnpm --filter openhuman-app format:check
  • pnpm typecheck
  • Focused tests: pnpm exec vitest run src/services/__tests__/webviewAccountService.discord.test.ts and cargo test --manifest-path app/src-tauri/Cargo.toml discord_scanner -- --nocapture; also ran full app Vitest suite via pnpm test
  • Rust fmt/check (if changed): cargo fmt --manifest-path app/src-tauri/Cargo.toml --all --check and cargo check --manifest-path app/src-tauri/Cargo.toml
  • Tauri fmt/check (if changed): cargo fmt --manifest-path app/src-tauri/Cargo.toml --all --check and cargo check --manifest-path app/src-tauri/Cargo.toml

Validation Blocked

  • command: N/A
  • error: N/A
  • impact: N/A

Behavior Changes

  • Intended behavior change: Discord webview accounts now persist normalized transcript context into memory from the native scanner path.
  • User-visible effect: Discord account activity can be used as durable memory/context, and the accounts UI updates from normalized transcript events instead of raw transport blobs.

Parity Contract

  • Legacy behavior preserved: Discord remains embeddable in the existing webview rail and still surfaces sidebar/account UI updates.
  • Guard/fallback/dispatch parity checks: raw discord ingest events no longer trigger generic frontend memory writes; normalized discord_memory_ingest updates refresh UI state while scanner-side RPC remains the single write path.

Duplicate / Superseded PR Handling

  • Duplicate PR(s): N/A
  • Canonical PR: this PR
  • Resolution (closed/superseded/updated): updated

Summary by CodeRabbit

  • New Features

    • Builds per-channel Discord transcripts with richer channel/guild context and stable scoping for DMs
    • Ingest path refinements to avoid duplicate processing for Discord and to prefer explicit sender/date (with proper timestamp conversion)
  • Bug Fixes

    • Improved message-update handling: body replacement, preserve missing cached fields, and retain text for embed-only updates
  • Tests

    • Added unit tests for gateway state transitions, channel naming, snapshotting, and message update/preservation behaviors

Review Change Stack

@senamakel senamakel requested a review from a team May 17, 2026 09:22
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 17, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 692bbc4e-e9e5-4ffd-b4bf-13be93da1e5d

📥 Commits

Reviewing files that changed from the base of the PR and between b1ae18f and e24eb5d.

📒 Files selected for processing (1)
  • app/src-tauri/src/discord_scanner/mod.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • app/src-tauri/src/discord_scanner/mod.rs

📝 Walkthrough

Walkthrough

Builds per-channel Discord transcripts from gateway MESSAGE_CREATE/MESSAGE_UPDATE frames using a session DiscordIngestState; backend parses websocket frames, updates in-memory channel/message caches, emits UI discord_memory_ingest snapshots, and queues async memory upserts. Frontend extends ingest types and routes Discord events to avoid duplicate persistence; tests added for both sides.

Changes

Discord Gateway-Driven Message Ingestion

Layer / File(s) Summary
Module docs & constants
app/src-tauri/src/discord_scanner/mod.rs
Update module documentation for gateway-driven V1 ingest and add MAX_CHANNEL_MESSAGES constant.
Backend gateway state and message tracking
app/src-tauri/src/discord_scanner/mod.rs
Adds DiscordIngestState, per-channel cached messages, snapshot construction, gateway payload parsing, MESSAGE_CREATE/MESSAGE_UPDATE handling, sorting, and eviction when exceeding MAX_CHANNEL_MESSAGES.
CDP pump state threading
app/src-tauri/src/discord_scanner/mod.rs
Initialize a persistent DiscordIngestState in the CDP pump_events loop and thread it into dispatch_event so session state persists across events.
Stateful dispatch_event & HTTP handling
app/src-tauri/src/discord_scanner/mod.rs
Make dispatch_event accept &mut DiscordIngestState; remove UI ingest emission for Network.requestWillBeSent and stop emitting ingest from Network.responseReceived (TODO for backfill).
WebSocket frame handling and snapshot emission
app/src-tauri/src/discord_scanner/mod.rs
Parse only received opcode=1 websocket frames as gateway payloads, apply them to ingest state, and emit discord_memory_ingest per-channel snapshots derived from updated channel state.
Message formatting, permalink, and async persistence
app/src-tauri/src/discord_scanner/mod.rs
Add RFC3339 timestamp parsing, author/channel labeling, body/permalink formatting, day-range formatting, deterministic discord_channel_doc_key scoping, and async openhuman.memory_doc_ingest upsert plumbing.
Backend gateway ingestion tests
app/src-tauri/src/discord_scanner/mod.rs
Unit tests for guild/channel snapshot creation, DM naming, snapshot emission on channel updates, MESSAGE_UPDATE replacement/preservation behaviors, embed-only updates, and doc key scoping.
Frontend ingest message and payload type contracts
app/src/services/webviewAccountService.ts
IngestMessage gains optional sender and date; IngestPayload gains channelId, channelName, guildId; adds DiscordMemoryIngestPayload requiring channelId.
Frontend event handler routing and normalization
app/src/services/webviewAccountService.ts
Generic ingest normalizes from from m.from or m.sender and skips queue refresh/persistence for Discord; discord_memory_ingest normalizes author and prefers m.date/m.timestamp (sec→ms) for timestamps.
Frontend event handler test suite and cases
app/src/services/__tests__/webviewAccountService.discord.test.ts
Adds Vitest suite mocking Tauri events and core RPC; tests ensure raw Discord ingest bypasses core persistence/queue refresh, non-Discord ingest follows legacy memory path, and discord_memory_ingest refreshes queue and stores messages with expected fields and ms timestamps.

🎯 4 (Complex) | ⏱️ ~75 minutes

"🐰 I hop through frames and threads,
Where messages bloom in cached beds,
Snapshots leap from state to core,
Transcripts saved — I thump for more! 🥕"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 78.13% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(discord): ingest webview transcripts into memory' directly and concisely summarizes the main change: adding Discord transcript ingestion into the memory store via the webview layer.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/src-tauri/src/discord_scanner/mod.rs`:
- Around line 1074-1079: The current tokio::spawn block calling
post_memory_doc_ingest(&acct, &payload) can run many concurrent upserts for the
same document and let older snapshots overwrite newer ones; change this to
serialize or coalesce per-document/channel so only one upsert runs at a time and
only the latest snapshot is sent. Implement a per-channel/doc coordination
mechanism (e.g., a shared HashMap keyed by acct or channel id that holds a small
mpsc sender or a Mutex/lock per key) and send payloads into that per-key queue
so a single consumer will debounce/coalesce incoming snapshots and call
post_memory_doc_ingest in order; ensure the code paths that now call
tokio::spawn use that coordinator instead of spawning raw tasks so
post_memory_doc_ingest is invoked serially (or with latest-only coalescing) for
each acct/channel.
- Around line 195-205: The MESSAGE_UPDATE handling currently replaces the cached
DiscordPersistMessage wholesale (DiscordPersistMessage / state.messages), which
drops fields omitted from partial events; change the logic to find the existing
message by id and apply partial updates only to the fields present in the
incoming update (e.g., update body/content, edited timestamp_ms, and source_ref
only when the event provides them) while preserving existing author, author_id,
timestamp_ms and other fields when they are absent; use the event's
Option-wrapped fields to decide which fields to set on the existing message, and
if required data is completely missing (e.g., content missing because
MESSAGE_CONTENT intent is absent), either trigger a REST fetch for the full
message or emit a warning log instead of overwriting the cached entry.
- Around line 992-995: The upsert key currently uses channel_name when
channel_key_looks_clean(channel_name) is true, which can cause collisions across
guilds; change the key logic in the params construction so it uses a stable
identifier that includes channel_id (for example combine channel_id with
channel_name or always use channel_id) instead of relying solely on
channel_name—update the key expression where params is built (referencing
channel_key_looks_clean, channel_name, channel_id, and the
"discord-web:{account_id}" namespace) to ensure uniqueness across channels.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f5015cae-1202-464f-815a-c5fb91c64a03

📥 Commits

Reviewing files that changed from the base of the PR and between 2e58438 and 7401d27.

📒 Files selected for processing (3)
  • app/src-tauri/src/discord_scanner/mod.rs
  • app/src/services/__tests__/webviewAccountService.discord.test.ts
  • app/src/services/webviewAccountService.ts

Comment thread app/src-tauri/src/discord_scanner/mod.rs Outdated
Comment thread app/src-tauri/src/discord_scanner/mod.rs
Comment thread app/src-tauri/src/discord_scanner/mod.rs Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/src-tauri/src/discord_scanner/mod.rs`:
- Around line 199-220: The current update unconditionally replaces existing.body
when body is Some, which lets embed/attachment-only updates (from
discord_message_body(value)) overwrite the cached user-visible text; change the
update logic in the message-update block so existing.body is only replaced when
the incoming update explicitly contains the authoritative body fields by
checking discord_message_body_fields_present(value) (i.e., require
discord_message_body_fields_present(value) && body.is_some() before assigning
existing.body = next_body); leave existing.body untouched for partial updates
that lack those body fields and continue to update
author/author_id/timestamp_ms/source_ref as before.
- Around line 1102-1112: The worker currently creates an unbounded mpsc channel
(tx/rx) carrying full transcript Value snapshots and only coalesces on the
receiver side after post_memory_doc_ingest awaits, which allows an unbounded
memory backlog; replace this with a latest-only size-1 handoff (e.g.,
tokio::sync::watch::channel or a custom size-1 overwrite queue) so new snapshots
overwrite the previous pending snapshot before the slow RPC completes.
Concretely: stop using mpsc::unbounded_channel::<Value>(), create a watch
channel (or equivalent) and use tx.send(...) to update the latest snapshot from
the producer side (use clone() of Value as needed), and on the spawned task use
rx.changed().await (or check the slot) then read rx.borrow().clone() into
next_payload and call post_memory_doc_ingest(&account_id_for_task,
&next_payload). Keep worker_key_for_task/account_id_for_task usage the same and
ensure Value is cloned when writing to the shared slot so only one pending
payload is ever buffered.
- Around line 131-138: The branch handling "CHANNEL_CREATE" | "CHANNEL_UPDATE" |
"THREAD_CREATE" | "THREAD_UPDATE" updates cached channel metadata via
apply_channel_meta but returns Vec::new(), so UI/store never sees the updated
title; instead, after calling self.apply_channel_meta(...) produce and return a
refreshed snapshot for the affected channel/thread (use the channel/thread id
from data.get("id") or the appropriate field and the guild_id already extracted)
by calling your existing snapshot/emit helper (e.g.,
self.emit_snapshot_for_channel or self.create_channel_snapshot) or constructing
the snapshot Event and returning vec![snapshot_event] so the updated metadata is
emitted to the UI/memory store.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c74c4b72-390d-4b30-9e58-40cbaea6f675

📥 Commits

Reviewing files that changed from the base of the PR and between 7401d27 and b1ae18f.

📒 Files selected for processing (1)
  • app/src-tauri/src/discord_scanner/mod.rs

Comment thread app/src-tauri/src/discord_scanner/mod.rs Outdated
Comment thread app/src-tauri/src/discord_scanner/mod.rs
Comment thread app/src-tauri/src/discord_scanner/mod.rs Outdated
@senamakel senamakel merged commit db99318 into tinyhumansai:main May 18, 2026
23 of 24 checks passed
LawyerLyu pushed a commit to LawyerLyu/openhuman that referenced this pull request May 18, 2026
Main gained `m.sender` accesses in the discord-ingest PR (tinyhumansai#1993).
Without .passthrough(), TypeScript correctly rejects the property
access — add sender to the schema to keep parity with the IngestMessage
interface and satisfy the type checker.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant