SPOTIFY_INTEGRATION_PART_2_SCHEDULED_INTAKE.md
Current Specs/Connector and Integration Specs/SPOTIFY_INTEGRATION_PART_2_SCHEDULED_INTAKE.md
# Spotify Integration — Part 2: Scheduled Intake & Knowledge Graph (Backend Required)
**Date:** 2026-04-11
**Status:** Spec for future build — requires EC, DOC3 extraction pipeline, DOC72 write path, DOC23 task scheduler
**Scope:** Automated listening data capture, preference extraction, pattern learning, proactive music intelligence
**Prerequisites:** Part 1 complete (MCP server running, Spotify connected), backend infrastructure operational
**Depends on:** DOC3 R11.3 (KnowledgeExtractionBundle), DOC72 R5.6+ (entity graph write path), DOC23 (task scheduler), DOC8 (nightly dream cycle)
---
## 1. What This Adds Beyond Part 1
Part 1 gives Elnor on-demand access to Spotify — ask and Elnor checks. Part 2 gives Elnor continuous, ambient understanding of your music behavior without being asked:
| Capability | Part 1 (On-Demand) | Part 2 (Scheduled Intake) |
|-----------|-------------------|--------------------------|
| "What am I listening to now?" | ✅ Calls API on request | ✅ Same |
| "What have I listened to this week?" | ✅ Calls API, gets last 50 tracks | ✅ Has full week's history in DOC72 |
| "What kind of music do I like?" | ❌ No memory across sessions | ✅ Rich preference model in entity graph |
| "Play something for trial prep" | ⚠️ Generic search, no personal context | ✅ Knows what you played during past trial prep |
| "My listening changed this month" | ❌ Cannot compare over time | ✅ Trend analysis from stored history |
| "Make me a playlist like my Wednesday focus sessions" | ❌ No pattern data | ✅ Temporal listening patterns stored |
---
## 2. Architecture Overview
```
┌─────────────────────────────────────────────────────────────┐
│ SCHEDULED INTAKE FLOW │
│ │
│ DOC23 Task Scheduler │
│ ┌──────────────────┐ │
│ │ "Spotify Intake" │ ── runs weekly (configurable) ──┐ │
│ │ Cron: Sun 3:00 AM │ │ │
│ └──────────────────┘ ▼ │
│ ┌──────────────┐ │
│ │ Spotify MCP │ │
│ │ Server │ │
│ │ │ │
│ │ Fetch: │ │
│ │ - recently │ │
│ │ played │ │
│ │ - top tracks │ │
│ │ - top artists│ │
│ │ - playlists │ │
│ └──────┬───────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ DOC3 Extraction │ │
│ │ Pipeline │ │
│ │ │ │
│ │ LLM extracts: │ │
│ │ - entities │ │
│ │ - preferences │ │
│ │ - patterns │ │
│ │ - associations │ │
│ └──────┬───────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────┐ │
│ │ DOC72 Entity Graph │ │
│ │ │ │
│ │ Stores: │ │
│ │ - artist nodes │ │
│ │ - genre nodes │ │
│ │ - preference nodes │ │
│ │ - pattern nodes │ │
│ │ - association nodes│ │
│ └────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
```
---
## 3. DOC23 Task Definition: Spotify Intake
### 3.1 Task Registration
Register a recurring task in the DOC23 task system:
```typescript
const spotifyIntakeTask: TaskDefinition = {
task_id: "spotify_weekly_intake",
display_name: "Spotify Listening Intake",
description: "Fetches recent listening data from Spotify and extracts preferences, patterns, and entities into the knowledge graph.",
category: "system_maintenance",
schedule: {
type: "cron",
expression: "0 3 * * 0", // Every Sunday at 3:00 AM
timezone: "America/Los_Angeles",
},
trigger: "scheduled", // Also triggerable manually from Q
agent_id: "elnor",
priority: "low", // Background task, non-urgent
estimated_duration_seconds: 120,
max_retries: 2,
retry_delay_seconds: 300,
// Task steps (DOC23 module chain)
steps: [
{
step_id: "fetch_recent",
tool: "spotify_get_recently_played",
params: { limit: 50 },
output_key: "recent_tracks",
},
{
step_id: "fetch_top_tracks_short",
tool: "spotify_get_top_tracks",
params: { time_range: "short_term", limit: 50 },
output_key: "top_tracks_short",
},
{
step_id: "fetch_top_tracks_medium",
tool: "spotify_get_top_tracks",
params: { time_range: "medium_term", limit: 50 },
output_key: "top_tracks_medium",
},
{
step_id: "fetch_top_artists_short",
tool: "spotify_get_top_artists",
params: { time_range: "short_term", limit: 30 },
output_key: "top_artists_short",
},
{
step_id: "fetch_top_artists_medium",
tool: "spotify_get_top_artists",
params: { time_range: "medium_term", limit: 30 },
output_key: "top_artists_medium",
},
{
step_id: "fetch_playlists",
tool: "spotify_get_playlists",
params: { limit: 50 },
output_key: "playlists",
},
{
step_id: "extract_knowledge",
tool: "doc3_extract_knowledge",
params: {
source_type: "spotify_intake",
// All fetched data flows in as context
},
output_key: "extraction_bundle",
},
{
step_id: "write_to_graph",
tool: "doc72_write_nodes",
params: {
// Write extracted nodes to entity graph
},
},
],
};
```
### 3.2 Schedule Options
Configurable in Q Settings:
```
Settings > Integrations > Spotify > Listening Intelligence
├─ Scheduled intake: ◉ Enabled ○ Disabled
├─ Frequency
│ ○ Daily (3:00 AM)
│ ● Weekly (Sunday 3:00 AM) — recommended
│ ○ Monthly (1st of month, 3:00 AM)
├─ [Run now] — manual trigger
└─ Last run: Sun Apr 6, 3:01 AM — 142 tracks processed, 8 new entities, 3 preference updates
```
### 3.3 Manual Trigger
The user can trigger the intake manually from:
- Q Settings > Spotify > [Run now]
- Chat: "Elnor, update your knowledge of my music"
- Floating Palette command: "Sync Spotify data"
---
## 4. DOC3 Extraction Pipeline: Spotify Data
### 4.1 KnowledgeExtractionBundle for Spotify
The fetched Spotify data is packaged as a `KnowledgeExtractionBundle` (DOC3 R11.3) and sent through the standard extraction pipeline. The LLM receives the raw data and extracts structured knowledge.
**Extraction prompt (system-level, appended to DOC3 extraction instructions):**
```
You are analyzing Spotify listening data for knowledge extraction. Extract:
1. ENTITIES: Artists, albums, songs, genres, and playlists that appear frequently
or are explicitly saved/liked. Include Spotify URIs for precise identification.
2. PREFERENCES: Infer music preferences from listening patterns.
- Genre preferences (with strength: strong/moderate/weak)
- Artist affinity (frequency-based)
- Tempo/energy preferences (from audio features if available)
- Explicit dislikes (genres/artists never appearing despite being popular)
3. TEMPORAL PATTERNS: When does the user listen to what?
- Time-of-day patterns (morning music vs evening music)
- Day-of-week patterns (workday vs weekend)
- Activity associations (if inferable from playlist names or listening context)
4. CHANGES: Compare with previous intake data (if provided). What's new?
- New artists discovered
- Genres gaining or losing share
- Playlist additions/removals
- Listening volume changes (more or less music overall)
5. ASSOCIATIONS: Link music entities to other known entities in the user's world.
- Playlist names referencing cases, projects, or activities
- Temporal correlation with known calendar events or work periods
Output as a KnowledgeExtractionBundle with typed nodes.
```
### 4.2 Input Data Shape
The extraction pipeline receives this data from the MCP fetches:
```typescript
interface SpotifyIntakeData {
// From spotify_get_recently_played
recent_tracks: {
track: { name: string; uri: string; artists: { name: string; uri: string }[]; album: { name: string; uri: string } };
played_at: string; // ISO timestamp
}[];
// From spotify_get_top_tracks (short + medium term)
top_tracks_short: { name: string; uri: string; artists: { name: string }[]; popularity: number }[];
top_tracks_medium: { name: string; uri: string; artists: { name: string }[]; popularity: number }[];
// From spotify_get_top_artists (short + medium term)
top_artists_short: { name: string; uri: string; genres: string[]; popularity: number }[];
top_artists_medium: { name: string; uri: string; genres: string[]; popularity: number }[];
// From spotify_get_playlists
playlists: { name: string; uri: string; track_count: number; public: boolean; description: string }[];
// Previous intake summary (for change detection)
previous_intake_summary?: string;
}
```
### 4.3 LLM Cost Estimate
The extraction payload is small:
- 50 recent tracks × ~100 tokens each = ~5,000 tokens
- 80 top tracks × ~60 tokens each = ~4,800 tokens
- 60 top artists × ~80 tokens each = ~4,800 tokens
- 50 playlists × ~40 tokens each = ~2,000 tokens
- Extraction prompt: ~500 tokens
- Output: ~2,000-4,000 tokens
**Total per run: ~20,000 tokens input + ~3,000 tokens output.**
At Gemini 2.5 Pro rates, that's roughly $0.02-0.05 per weekly run. Negligible.
---
## 5. DOC72 Entity Graph: Music Knowledge Nodes
### 5.1 Node Types for Music Data
All music data maps to DOC72's existing 10 canonical node types. No new node types needed.
| DOC72 Node Type | Music Usage | Example |
|----------------|-------------|---------|
| `entity` | Artists, albums, songs, playlists, genres | `{type: "entity", subtype: "artist", name: "John Coltrane", spotify_uri: "spotify:artist:..."}` |
| `preference` | Music taste declarations | `{type: "preference", domain: "music", subject: "jazz_fusion", valence: "positive", strength: 0.85}` |
| `observation` | Listening events and patterns | `{type: "observation", domain: "music", content: "Listened to ambient electronic 80% of morning sessions in March 2026"}` |
| `association` | Links between music and other life entities | `{type: "association", entity_a: "henderson_trial_prep", entity_b: "ambient_electronic", relation: "activity_music"}` |
| `fact` | Stable music facts | `{type: "fact", domain: "music", content: "Will's Spotify account has 47 playlists, 1,200 saved tracks"}` |
| `procedure` | Learned music workflows | `{type: "procedure", trigger: "focus_music_request", steps: "1. Check DOC72 for focus genre preference 2. Search Spotify 3. Play"}` |
### 5.2 Entity Schemas
```typescript
// Artist entity
interface MusicArtistNode {
node_id: string;
type: "entity";
subtype: "music_artist";
name: string;
spotify_uri: string;
genres: string[];
affinity_score: number; // 0-1, based on listening frequency
first_observed: string; // ISO date
last_observed: string; // ISO date
play_count_estimate: number; // Approximate from intake data
principal_id: "will";
scope: "personal";
}
// Genre preference
interface MusicPreferenceNode {
node_id: string;
type: "preference";
domain: "music";
subject: string; // "jazz", "ambient_electronic", "classic_rock"
valence: "positive" | "negative" | "neutral";
strength: number; // 0-1, Beta distribution confidence
context?: string; // "for_focus", "for_relaxation", "morning", "weekend"
evidence_count: number; // How many data points support this
last_updated: string;
principal_id: "will";
scope: "personal";
}
// Temporal pattern
interface MusicPatternNode {
node_id: string;
type: "observation";
domain: "music_pattern";
pattern_type: "temporal" | "activity" | "mood" | "trend";
content: string; // Natural language description
data: {
time_period?: string; // "morning", "evening", "weekend"
genre_distribution?: Record<string, number>; // {"jazz": 0.4, "ambient": 0.3, ...}
trend_direction?: "increasing" | "decreasing" | "stable";
};
observed_period: string; // "2026-W14" or "2026-03"
principal_id: "will";
scope: "personal";
}
// Activity association
interface MusicAssociationNode {
node_id: string;
type: "association";
entity_a_ref: string; // DOC72 node_id of activity/case/project
entity_b_ref: string; // DOC72 node_id of genre/playlist/artist
relation: "activity_music" | "case_playlist" | "mood_genre";
strength: number; // 0-1
evidence: string; // "Playlist 'Henderson Prep' played during Henderson case work"
principal_id: "will";
scope: "personal";
}
```
### 5.3 Deduplication
The entity graph must deduplicate across intakes. An artist node for "John Coltrane" created in week 1 should be UPDATED in week 2, not duplicated.
**Dedup strategy:**
- `spotify_uri` is the unique key for Spotify entities (artists, tracks, albums, playlists)
- On each intake, check if a node with that `spotify_uri` already exists
- If yes: update `affinity_score`, `last_observed`, `play_count_estimate`
- If no: create new node
- Preference nodes deduplicate on `{domain, subject, context}` tuple
- Pattern nodes are temporal — each intake period gets its own observation, building a time series
### 5.4 Confidence Scoring
DOC72 uses Beta distribution confidence scoring. For music preferences:
- A genre that appears in 90% of recent plays with 200+ data points → high confidence (α=180, β=20)
- A genre that appeared 3 times last week → low confidence (α=3, β=47)
- Confidence naturally decays if a genre stops appearing in subsequent intakes
This means Elnor can express uncertainty: "You've been listening to a lot of classical lately, but I'm not sure if it's a lasting preference or just a phase."
---
## 6. Nightly Dream Cycle Integration (DOC8)
### 6.1 Music in the Dream Cycle
DOC8's nightly dream cycle runs lightweight consolidation tasks. Music knowledge participates:
**Consolidation tasks:**
- Merge redundant artist/genre nodes (e.g., "electronic" and "electronica" → single node)
- Decay affinity scores for artists not listened to in 30+ days
- Promote provisional preferences to confirmed (if evidence_count > threshold)
- Generate weekly/monthly listening summary nodes
- Detect significant changes: "Listening shifted from jazz to classical this month"
**Weekly consolidation (heavier):**
- Compare current top artists/tracks with previous month
- Generate trend analysis: "New discovery: Nils Frahm (first appeared 2 weeks ago, now in top 10)"
- Cross-reference with calendar: "Classical listening spiked during the week of the Henderson motion deadline"
### 6.2 Dream Cycle Task Definition
```typescript
const musicDreamCycleTask: DreamCycleTask = {
task_id: "music_knowledge_consolidation",
frequency: "nightly",
priority: "low",
steps: [
"decay_stale_affinities", // Reduce affinity for artists not seen in 30+ days
"merge_redundant_genres", // Deduplicate genre nodes
"promote_provisional_preferences",// Move from provisional to confirmed
"generate_period_summary", // Create observation node for the day/week
],
weekly_extra_steps: [
"trend_analysis", // Compare with last month
"calendar_cross_reference", // Correlate with DOC72 calendar entities
"generate_weekly_digest_entry", // Add music section to weekly digest
],
};
```
---
## 7. What Elnor Can Do With This Knowledge
Once the scheduled intake is running and DOC72 has music data, Elnor can:
### 7.1 Proactive Recommendations
- "You usually listen to ambient during morning focus sessions. Want me to put something on?" (based on temporal pattern nodes)
- "You haven't listened to Coltrane in a while — want me to queue up A Love Supreme?" (based on decaying affinity + historical high affinity)
### 7.2 Context-Aware Playback
- "Play something for trial prep" → Elnor checks DOC72 for association nodes linking trial prep to music → finds "ambient electronic" pattern → searches Spotify → plays
- "Play what I usually listen to on Sunday mornings" → temporal pattern lookup → genre match → play
### 7.3 Trend Awareness
- "How has my music taste changed this year?" → Elnor queries pattern nodes across months → generates trend narrative
- "I feel like I've been in a rut musically" → Elnor can suggest genres/artists outside your recent patterns
### 7.4 Cross-Domain Intelligence
- Elnor notices you play aggressive music before deadlines and calm music after → stores this as a mood/activity pattern
- Elnor can correlate music changes with case milestones: "Your listening got more intense during the Paramount expert discovery phase"
### 7.5 Playlist Intelligence
- "Make me a playlist of my top discoveries from the last 3 months" → Elnor queries DOC72 for artists with `first_observed` in last 90 days + high affinity → creates Spotify playlist
- "What playlist was I playing during the Henderson depo prep?" → DOC72 association lookup → finds playlist entity → plays it
---
## 8. Settings (Backend Section)
Additions to the Spotify settings from Part 1:
```
Settings > Integrations > Spotify > Listening Intelligence
├─ Knowledge capture
│ ├─ Scheduled intake: ◉ Enabled ○ Disabled
│ ├─ Frequency: ○ Daily ● Weekly ○ Monthly
│ ├─ Include in nightly dream cycle: ☑ Yes
│ ├─ [Run now]
│ └─ Last run: Sun Apr 6, 3:01 AM — 142 tracks, 8 entities, 3 preferences
├─ What Elnor remembers
│ ├─ ☑ Artists and genres I listen to
│ ├─ ☑ Listening patterns (time of day, day of week)
│ ├─ ☑ Playlist contents and changes
│ ├─ ☑ Music-activity associations (e.g., trial prep → ambient)
│ └─ ☐ Exact play counts and timestamps (detailed — more storage)
├─ Proactive suggestions
│ ├─ ☑ Offer to play music based on time/activity context
│ └─ ☑ Mention music trends in weekly digest
└─ Data management
├─ Music entities in knowledge graph: 247 nodes
├─ [View music knowledge] — opens DOC72 filtered to music domain
└─ [Clear all music knowledge] — removes all music nodes from graph
```
---
## 9. Cross-Document Obligations
### 9.1 DOC3 (Semantic Skill Learning)
- Register `spotify_intake` as a recognized `source_type` in the `KnowledgeExtractionBundle` pipeline
- Add music-domain extraction instructions to the skill library (§4.1 prompt above)
- The conversational learning path (Part 1 §5.2) already uses DOC3's standard extraction — no changes needed
### 9.2 DOC72 (Entity Graph)
- No new node types needed — all music data fits existing canonical types
- Add `spotify_uri` as a recognized external identifier field on entity nodes (alongside any existing external ID fields)
- Add `domain: "music"` as a recognized domain tag for filtering/querying
- Deduplication logic must handle `spotify_uri` as a unique key (§5.3)
- Beta confidence scoring applies unchanged (§5.4)
### 9.3 DOC23 (Task System)
- Register `spotify_weekly_intake` as a system task template
- Support `cron` schedule type (if not already supported)
- The task should appear in the Tasks page as a system maintenance task, not a user-created task
- Manual trigger via "Run now" button or chat command
### 9.4 DOC8 (Dream Cycle)
- Register `music_knowledge_consolidation` as a dream cycle participant
- Nightly: affinity decay, dedup, promotion
- Weekly: trend analysis, calendar cross-reference, digest entry
### 9.5 DOC24 (Knowledge Delivery)
- Music knowledge nodes participate in standard DOC24 retrieval (RRF, three-lane retrieval)
- When Elnor receives a music-related query, DOC24's semantic routing should include music domain nodes in the retrieval context
- The `inspect_knowledge_summary` tool (GBrain proposal) should include a music section when queried: "What does Elnor know about my music?"
---
## 10. Implementation Sequence
```
1. Part 1 complete (MCP server running, web player working) ← prerequisite
2. EC + DOC72 write path operational ← prerequisite
3. DOC3 extraction pipeline operational ← prerequisite
4. Register spotify_intake source_type in DOC3 ← 30 min
5. Create SpotifyIntakeService in EC ← 2-3 hours
- Fetches data via MCP tools
- Packages as KnowledgeExtractionBundle
- Sends through DOC3 pipeline
- Writes results to DOC72
6. Register task in DOC23 ← 30 min
7. Add music consolidation to DOC8 dream cycle ← 1 hour
8. Add Listening Intelligence settings to Q ← 1 hour
9. Test end-to-end: scheduled run → extraction → graph write ← 1-2 hours
10. Test proactive recommendations: "play something for focus" ← testing
```
**Total estimated build time (after prerequisites):** 6-8 hours