SPOTIFY_INTEGRATION_PART_2_SCHEDULED_INTAKE.md

Current Specs/Connector and Integration Specs/SPOTIFY_INTEGRATION_PART_2_SCHEDULED_INTAKE.md
Generated 2026-06-09T01:23:58.539Z from commit dbaa25962edc11ab30e8d4ca1715f9ae5bf77331. Worktree: clean.
Open text page · Open raw txt · Open path URL
# Spotify Integration — Part 2: Scheduled Intake & Knowledge Graph (Backend Required)

**Date:** 2026-04-11
**Status:** Spec for future build — requires EC, DOC3 extraction pipeline, DOC72 write path, DOC23 task scheduler
**Scope:** Automated listening data capture, preference extraction, pattern learning, proactive music intelligence
**Prerequisites:** Part 1 complete (MCP server running, Spotify connected), backend infrastructure operational
**Depends on:** DOC3 R11.3 (KnowledgeExtractionBundle), DOC72 R5.6+ (entity graph write path), DOC23 (task scheduler), DOC8 (nightly dream cycle)

---

## 1. What This Adds Beyond Part 1

Part 1 gives Elnor on-demand access to Spotify — ask and Elnor checks. Part 2 gives Elnor continuous, ambient understanding of your music behavior without being asked:

| Capability | Part 1 (On-Demand) | Part 2 (Scheduled Intake) |
|-----------|-------------------|--------------------------|
| "What am I listening to now?" | ✅ Calls API on request | ✅ Same |
| "What have I listened to this week?" | ✅ Calls API, gets last 50 tracks | ✅ Has full week's history in DOC72 |
| "What kind of music do I like?" | ❌ No memory across sessions | ✅ Rich preference model in entity graph |
| "Play something for trial prep" | ⚠️ Generic search, no personal context | ✅ Knows what you played during past trial prep |
| "My listening changed this month" | ❌ Cannot compare over time | ✅ Trend analysis from stored history |
| "Make me a playlist like my Wednesday focus sessions" | ❌ No pattern data | ✅ Temporal listening patterns stored |

---

## 2. Architecture Overview

```
┌─────────────────────────────────────────────────────────────┐
│                    SCHEDULED INTAKE FLOW                      │
│                                                               │
│  DOC23 Task Scheduler                                         │
│  ┌──────────────────┐                                         │
│  │ "Spotify Intake"  │ ── runs weekly (configurable) ──┐      │
│  │ Cron: Sun 3:00 AM │                                 │      │
│  └──────────────────┘                                  ▼      │
│                                              ┌──────────────┐ │
│                                              │ Spotify MCP  │ │
│                                              │ Server       │ │
│                                              │              │ │
│                                              │ Fetch:       │ │
│                                              │ - recently   │ │
│                                              │   played     │ │
│                                              │ - top tracks │ │
│                                              │ - top artists│ │
│                                              │ - playlists  │ │
│                                              └──────┬───────┘ │
│                                                     │         │
│                                                     ▼         │
│                                          ┌──────────────────┐ │
│                                          │ DOC3 Extraction  │ │
│                                          │ Pipeline         │ │
│                                          │                  │ │
│                                          │ LLM extracts:    │ │
│                                          │ - entities       │ │
│                                          │ - preferences    │ │
│                                          │ - patterns       │ │
│                                          │ - associations   │ │
│                                          └──────┬───────────┘ │
│                                                 │             │
│                                                 ▼             │
│                                      ┌────────────────────┐   │
│                                      │ DOC72 Entity Graph │   │
│                                      │                    │   │
│                                      │ Stores:            │   │
│                                      │ - artist nodes     │   │
│                                      │ - genre nodes      │   │
│                                      │ - preference nodes │   │
│                                      │ - pattern nodes    │   │
│                                      │ - association nodes│   │
│                                      └────────────────────┘   │
│                                                               │
└─────────────────────────────────────────────────────────────┘
```

---

## 3. DOC23 Task Definition: Spotify Intake

### 3.1 Task Registration

Register a recurring task in the DOC23 task system:

```typescript
const spotifyIntakeTask: TaskDefinition = {
  task_id: "spotify_weekly_intake",
  display_name: "Spotify Listening Intake",
  description: "Fetches recent listening data from Spotify and extracts preferences, patterns, and entities into the knowledge graph.",
  category: "system_maintenance",
  schedule: {
    type: "cron",
    expression: "0 3 * * 0",  // Every Sunday at 3:00 AM
    timezone: "America/Los_Angeles",
  },
  trigger: "scheduled",       // Also triggerable manually from Q
  agent_id: "elnor",
  priority: "low",            // Background task, non-urgent
  estimated_duration_seconds: 120,
  max_retries: 2,
  retry_delay_seconds: 300,
  
  // Task steps (DOC23 module chain)
  steps: [
    {
      step_id: "fetch_recent",
      tool: "spotify_get_recently_played",
      params: { limit: 50 },
      output_key: "recent_tracks",
    },
    {
      step_id: "fetch_top_tracks_short",
      tool: "spotify_get_top_tracks",
      params: { time_range: "short_term", limit: 50 },
      output_key: "top_tracks_short",
    },
    {
      step_id: "fetch_top_tracks_medium",
      tool: "spotify_get_top_tracks",
      params: { time_range: "medium_term", limit: 50 },
      output_key: "top_tracks_medium",
    },
    {
      step_id: "fetch_top_artists_short",
      tool: "spotify_get_top_artists",
      params: { time_range: "short_term", limit: 30 },
      output_key: "top_artists_short",
    },
    {
      step_id: "fetch_top_artists_medium",
      tool: "spotify_get_top_artists",
      params: { time_range: "medium_term", limit: 30 },
      output_key: "top_artists_medium",
    },
    {
      step_id: "fetch_playlists",
      tool: "spotify_get_playlists",
      params: { limit: 50 },
      output_key: "playlists",
    },
    {
      step_id: "extract_knowledge",
      tool: "doc3_extract_knowledge",
      params: {
        source_type: "spotify_intake",
        // All fetched data flows in as context
      },
      output_key: "extraction_bundle",
    },
    {
      step_id: "write_to_graph",
      tool: "doc72_write_nodes",
      params: {
        // Write extracted nodes to entity graph
      },
    },
  ],
};
```

### 3.2 Schedule Options

Configurable in Q Settings:

```
Settings > Integrations > Spotify > Listening Intelligence
├─ Scheduled intake: ◉ Enabled  ○ Disabled
├─ Frequency
│   ○ Daily (3:00 AM)
│   ● Weekly (Sunday 3:00 AM) — recommended
│   ○ Monthly (1st of month, 3:00 AM)
├─ [Run now] — manual trigger
└─ Last run: Sun Apr 6, 3:01 AM — 142 tracks processed, 8 new entities, 3 preference updates
```

### 3.3 Manual Trigger

The user can trigger the intake manually from:
- Q Settings > Spotify > [Run now]
- Chat: "Elnor, update your knowledge of my music"
- Floating Palette command: "Sync Spotify data"

---

## 4. DOC3 Extraction Pipeline: Spotify Data

### 4.1 KnowledgeExtractionBundle for Spotify

The fetched Spotify data is packaged as a `KnowledgeExtractionBundle` (DOC3 R11.3) and sent through the standard extraction pipeline. The LLM receives the raw data and extracts structured knowledge.

**Extraction prompt (system-level, appended to DOC3 extraction instructions):**

```
You are analyzing Spotify listening data for knowledge extraction. Extract:

1. ENTITIES: Artists, albums, songs, genres, and playlists that appear frequently 
   or are explicitly saved/liked. Include Spotify URIs for precise identification.

2. PREFERENCES: Infer music preferences from listening patterns.
   - Genre preferences (with strength: strong/moderate/weak)
   - Artist affinity (frequency-based)
   - Tempo/energy preferences (from audio features if available)
   - Explicit dislikes (genres/artists never appearing despite being popular)

3. TEMPORAL PATTERNS: When does the user listen to what?
   - Time-of-day patterns (morning music vs evening music)
   - Day-of-week patterns (workday vs weekend)
   - Activity associations (if inferable from playlist names or listening context)

4. CHANGES: Compare with previous intake data (if provided). What's new?
   - New artists discovered
   - Genres gaining or losing share
   - Playlist additions/removals
   - Listening volume changes (more or less music overall)

5. ASSOCIATIONS: Link music entities to other known entities in the user's world.
   - Playlist names referencing cases, projects, or activities
   - Temporal correlation with known calendar events or work periods

Output as a KnowledgeExtractionBundle with typed nodes.
```

### 4.2 Input Data Shape

The extraction pipeline receives this data from the MCP fetches:

```typescript
interface SpotifyIntakeData {
  // From spotify_get_recently_played
  recent_tracks: {
    track: { name: string; uri: string; artists: { name: string; uri: string }[]; album: { name: string; uri: string } };
    played_at: string;  // ISO timestamp
  }[];
  
  // From spotify_get_top_tracks (short + medium term)
  top_tracks_short: { name: string; uri: string; artists: { name: string }[]; popularity: number }[];
  top_tracks_medium: { name: string; uri: string; artists: { name: string }[]; popularity: number }[];
  
  // From spotify_get_top_artists (short + medium term)
  top_artists_short: { name: string; uri: string; genres: string[]; popularity: number }[];
  top_artists_medium: { name: string; uri: string; genres: string[]; popularity: number }[];
  
  // From spotify_get_playlists
  playlists: { name: string; uri: string; track_count: number; public: boolean; description: string }[];
  
  // Previous intake summary (for change detection)
  previous_intake_summary?: string;
}
```

### 4.3 LLM Cost Estimate

The extraction payload is small:
- 50 recent tracks × ~100 tokens each = ~5,000 tokens
- 80 top tracks × ~60 tokens each = ~4,800 tokens
- 60 top artists × ~80 tokens each = ~4,800 tokens
- 50 playlists × ~40 tokens each = ~2,000 tokens
- Extraction prompt: ~500 tokens
- Output: ~2,000-4,000 tokens

**Total per run: ~20,000 tokens input + ~3,000 tokens output.**

At Gemini 2.5 Pro rates, that's roughly $0.02-0.05 per weekly run. Negligible.

---

## 5. DOC72 Entity Graph: Music Knowledge Nodes

### 5.1 Node Types for Music Data

All music data maps to DOC72's existing 10 canonical node types. No new node types needed.

| DOC72 Node Type | Music Usage | Example |
|----------------|-------------|---------|
| `entity` | Artists, albums, songs, playlists, genres | `{type: "entity", subtype: "artist", name: "John Coltrane", spotify_uri: "spotify:artist:..."}` |
| `preference` | Music taste declarations | `{type: "preference", domain: "music", subject: "jazz_fusion", valence: "positive", strength: 0.85}` |
| `observation` | Listening events and patterns | `{type: "observation", domain: "music", content: "Listened to ambient electronic 80% of morning sessions in March 2026"}` |
| `association` | Links between music and other life entities | `{type: "association", entity_a: "henderson_trial_prep", entity_b: "ambient_electronic", relation: "activity_music"}` |
| `fact` | Stable music facts | `{type: "fact", domain: "music", content: "Will's Spotify account has 47 playlists, 1,200 saved tracks"}` |
| `procedure` | Learned music workflows | `{type: "procedure", trigger: "focus_music_request", steps: "1. Check DOC72 for focus genre preference 2. Search Spotify 3. Play"}` |

### 5.2 Entity Schemas

```typescript
// Artist entity
interface MusicArtistNode {
  node_id: string;
  type: "entity";
  subtype: "music_artist";
  name: string;
  spotify_uri: string;
  genres: string[];
  affinity_score: number;          // 0-1, based on listening frequency
  first_observed: string;          // ISO date
  last_observed: string;           // ISO date
  play_count_estimate: number;     // Approximate from intake data
  principal_id: "will";
  scope: "personal";
}

// Genre preference
interface MusicPreferenceNode {
  node_id: string;
  type: "preference";
  domain: "music";
  subject: string;                 // "jazz", "ambient_electronic", "classic_rock"
  valence: "positive" | "negative" | "neutral";
  strength: number;                // 0-1, Beta distribution confidence
  context?: string;                // "for_focus", "for_relaxation", "morning", "weekend"
  evidence_count: number;          // How many data points support this
  last_updated: string;
  principal_id: "will";
  scope: "personal";
}

// Temporal pattern
interface MusicPatternNode {
  node_id: string;
  type: "observation";
  domain: "music_pattern";
  pattern_type: "temporal" | "activity" | "mood" | "trend";
  content: string;                 // Natural language description
  data: {
    time_period?: string;          // "morning", "evening", "weekend"
    genre_distribution?: Record<string, number>;  // {"jazz": 0.4, "ambient": 0.3, ...}
    trend_direction?: "increasing" | "decreasing" | "stable";
  };
  observed_period: string;         // "2026-W14" or "2026-03"
  principal_id: "will";
  scope: "personal";
}

// Activity association
interface MusicAssociationNode {
  node_id: string;
  type: "association";
  entity_a_ref: string;            // DOC72 node_id of activity/case/project
  entity_b_ref: string;            // DOC72 node_id of genre/playlist/artist
  relation: "activity_music" | "case_playlist" | "mood_genre";
  strength: number;                // 0-1
  evidence: string;                // "Playlist 'Henderson Prep' played during Henderson case work"
  principal_id: "will";
  scope: "personal";
}
```

### 5.3 Deduplication

The entity graph must deduplicate across intakes. An artist node for "John Coltrane" created in week 1 should be UPDATED in week 2, not duplicated.

**Dedup strategy:**
- `spotify_uri` is the unique key for Spotify entities (artists, tracks, albums, playlists)
- On each intake, check if a node with that `spotify_uri` already exists
- If yes: update `affinity_score`, `last_observed`, `play_count_estimate`
- If no: create new node
- Preference nodes deduplicate on `{domain, subject, context}` tuple
- Pattern nodes are temporal — each intake period gets its own observation, building a time series

### 5.4 Confidence Scoring

DOC72 uses Beta distribution confidence scoring. For music preferences:

- A genre that appears in 90% of recent plays with 200+ data points → high confidence (α=180, β=20)
- A genre that appeared 3 times last week → low confidence (α=3, β=47)
- Confidence naturally decays if a genre stops appearing in subsequent intakes

This means Elnor can express uncertainty: "You've been listening to a lot of classical lately, but I'm not sure if it's a lasting preference or just a phase."

---

## 6. Nightly Dream Cycle Integration (DOC8)

### 6.1 Music in the Dream Cycle

DOC8's nightly dream cycle runs lightweight consolidation tasks. Music knowledge participates:

**Consolidation tasks:**
- Merge redundant artist/genre nodes (e.g., "electronic" and "electronica" → single node)
- Decay affinity scores for artists not listened to in 30+ days
- Promote provisional preferences to confirmed (if evidence_count > threshold)
- Generate weekly/monthly listening summary nodes
- Detect significant changes: "Listening shifted from jazz to classical this month"

**Weekly consolidation (heavier):**
- Compare current top artists/tracks with previous month
- Generate trend analysis: "New discovery: Nils Frahm (first appeared 2 weeks ago, now in top 10)"
- Cross-reference with calendar: "Classical listening spiked during the week of the Henderson motion deadline"

### 6.2 Dream Cycle Task Definition

```typescript
const musicDreamCycleTask: DreamCycleTask = {
  task_id: "music_knowledge_consolidation",
  frequency: "nightly",
  priority: "low",
  steps: [
    "decay_stale_affinities",        // Reduce affinity for artists not seen in 30+ days
    "merge_redundant_genres",         // Deduplicate genre nodes
    "promote_provisional_preferences",// Move from provisional to confirmed
    "generate_period_summary",        // Create observation node for the day/week
  ],
  weekly_extra_steps: [
    "trend_analysis",                 // Compare with last month
    "calendar_cross_reference",       // Correlate with DOC72 calendar entities
    "generate_weekly_digest_entry",   // Add music section to weekly digest
  ],
};
```

---

## 7. What Elnor Can Do With This Knowledge

Once the scheduled intake is running and DOC72 has music data, Elnor can:

### 7.1 Proactive Recommendations
- "You usually listen to ambient during morning focus sessions. Want me to put something on?" (based on temporal pattern nodes)
- "You haven't listened to Coltrane in a while — want me to queue up A Love Supreme?" (based on decaying affinity + historical high affinity)

### 7.2 Context-Aware Playback
- "Play something for trial prep" → Elnor checks DOC72 for association nodes linking trial prep to music → finds "ambient electronic" pattern → searches Spotify → plays
- "Play what I usually listen to on Sunday mornings" → temporal pattern lookup → genre match → play

### 7.3 Trend Awareness
- "How has my music taste changed this year?" → Elnor queries pattern nodes across months → generates trend narrative
- "I feel like I've been in a rut musically" → Elnor can suggest genres/artists outside your recent patterns

### 7.4 Cross-Domain Intelligence
- Elnor notices you play aggressive music before deadlines and calm music after → stores this as a mood/activity pattern
- Elnor can correlate music changes with case milestones: "Your listening got more intense during the Paramount expert discovery phase"

### 7.5 Playlist Intelligence
- "Make me a playlist of my top discoveries from the last 3 months" → Elnor queries DOC72 for artists with `first_observed` in last 90 days + high affinity → creates Spotify playlist
- "What playlist was I playing during the Henderson depo prep?" → DOC72 association lookup → finds playlist entity → plays it

---

## 8. Settings (Backend Section)

Additions to the Spotify settings from Part 1:

```
Settings > Integrations > Spotify > Listening Intelligence
├─ Knowledge capture
│   ├─ Scheduled intake: ◉ Enabled  ○ Disabled
│   ├─ Frequency: ○ Daily  ● Weekly  ○ Monthly
│   ├─ Include in nightly dream cycle: ☑ Yes
│   ├─ [Run now]
│   └─ Last run: Sun Apr 6, 3:01 AM — 142 tracks, 8 entities, 3 preferences
├─ What Elnor remembers
│   ├─ ☑ Artists and genres I listen to
│   ├─ ☑ Listening patterns (time of day, day of week)
│   ├─ ☑ Playlist contents and changes
│   ├─ ☑ Music-activity associations (e.g., trial prep → ambient)
│   └─ ☐ Exact play counts and timestamps (detailed — more storage)
├─ Proactive suggestions
│   ├─ ☑ Offer to play music based on time/activity context
│   └─ ☑ Mention music trends in weekly digest
└─ Data management
    ├─ Music entities in knowledge graph: 247 nodes
    ├─ [View music knowledge] — opens DOC72 filtered to music domain
    └─ [Clear all music knowledge] — removes all music nodes from graph
```

---

## 9. Cross-Document Obligations

### 9.1 DOC3 (Semantic Skill Learning)

- Register `spotify_intake` as a recognized `source_type` in the `KnowledgeExtractionBundle` pipeline
- Add music-domain extraction instructions to the skill library (§4.1 prompt above)
- The conversational learning path (Part 1 §5.2) already uses DOC3's standard extraction — no changes needed

### 9.2 DOC72 (Entity Graph)

- No new node types needed — all music data fits existing canonical types
- Add `spotify_uri` as a recognized external identifier field on entity nodes (alongside any existing external ID fields)
- Add `domain: "music"` as a recognized domain tag for filtering/querying
- Deduplication logic must handle `spotify_uri` as a unique key (§5.3)
- Beta confidence scoring applies unchanged (§5.4)

### 9.3 DOC23 (Task System)

- Register `spotify_weekly_intake` as a system task template
- Support `cron` schedule type (if not already supported)
- The task should appear in the Tasks page as a system maintenance task, not a user-created task
- Manual trigger via "Run now" button or chat command

### 9.4 DOC8 (Dream Cycle)

- Register `music_knowledge_consolidation` as a dream cycle participant
- Nightly: affinity decay, dedup, promotion
- Weekly: trend analysis, calendar cross-reference, digest entry

### 9.5 DOC24 (Knowledge Delivery)

- Music knowledge nodes participate in standard DOC24 retrieval (RRF, three-lane retrieval)
- When Elnor receives a music-related query, DOC24's semantic routing should include music domain nodes in the retrieval context
- The `inspect_knowledge_summary` tool (GBrain proposal) should include a music section when queried: "What does Elnor know about my music?"

---

## 10. Implementation Sequence

```
1. Part 1 complete (MCP server running, web player working)        ← prerequisite
2. EC + DOC72 write path operational                                ← prerequisite
3. DOC3 extraction pipeline operational                             ← prerequisite
4. Register spotify_intake source_type in DOC3                      ← 30 min
5. Create SpotifyIntakeService in EC                                ← 2-3 hours
   - Fetches data via MCP tools
   - Packages as KnowledgeExtractionBundle
   - Sends through DOC3 pipeline
   - Writes results to DOC72
6. Register task in DOC23                                           ← 30 min
7. Add music consolidation to DOC8 dream cycle                     ← 1 hour
8. Add Listening Intelligence settings to Q                         ← 1 hour
9. Test end-to-end: scheduled run → extraction → graph write        ← 1-2 hours
10. Test proactive recommendations: "play something for focus"      ← testing
```

**Total estimated build time (after prerequisites):** 6-8 hours