06 / Curator Engine - Process Visualization

06 / CURATOR ENGINE

Multimodal Indexing Blueprint

STEP 1: SMART EXTRACTION FUNNEL

1_detect_scenes.py

                        from scenedetect import detect,
                        ContentDetector

                        # 不再傻傻每60秒切一張。

                        # 只有當畫面發生「顯著變化」（如換頁）時才截圖。

                        scene_list = detect(video_path, ContentDetector(threshold=30.0))

2_remove_dupes.py

                        import imagehash

                        # 計算圖片指紋 (Hash)。如果兩張圖指紋太像，

                        # 代表講師只移動了滑鼠，這就是重複的，刪掉。

                        if (hash1 - hash2) < 5: return "DUPLICATE"

RAW VIDEO (120 mins)

↓

SCENE DETECT (Capture Changes)

↓

pHASH (Remove Duplicates)

↓

40-50 HIGH-VALUE SLIDES

智能抽幀漏斗 (Smart Extraction)： 解決「重複」與「沒重點」的問題。
固定間隔截圖會產生大量垃圾（如講師發呆的畫面）。我們採用三層過濾網： 1. 場景偵測（只抓換頁瞬間） 2. 感知雜湊（移除相似畫面） 3. 內容過濾（用 AI 判斷是否為黑屏或無意義過場）。
這樣能把 2 小時的影片，精煉成 40-50 張真正精華的投影片，一張不多，一張不少。

使用 ffmpeg 或 OpenCV，以固定時間間隔（例如每 60 秒）或場景變換偵測（Scene Detection）來抽取影片截圖。這些圖片將成為未來記憶檢索的視覺錨點。

STEP 2: MULTIMODAL CONTEXT BINDING (GEMINI/GPT-4V)

bind_context.js

                        const analyzeFrame = async (img, transcript) => {

                          // Vision + Text Context

                          const prompt = `Describe this
                            image contextually based on the transcript segment: "${transcript}"`;

                          return await model.generate([img, prompt]);

                        }

IMG

TEXT

→

GEMINI

→

METADATA

多模態脈絡綁定： 賦予圖片靈魂。
圖片本身是啞巴。我們將圖片與「當下時間點」前後的逐字稿一同送入 Multimodal LLM (如 Gemini 1.5 Pro)。AI 會分析畫面內容與課程內容的關聯，生成帶有豐富語義的 Metadata。這張圖從此有了「索引能力」。

STEP 3: MARKDOWN INJECTION & MEMORY TRIGGER

final_note.md

                        ## 04. Transformer Architecture

                        In this section, we discuss the self-attention mechanism...

                        ![Diagram of Attention Head](assets/0045.png)

                        <!-- AI Note: This visualizes the Q/K/V matrix multiplication discussed
                            above. -->

                        This allows the model to weigh the importance...

視覺板機與記憶喚醒： 完成自動策展。
程式腳本將處理好的圖片自動插入 Markdown 筆記的對應段落。當你未來複習這份筆記時，這張圖片會充當「視覺板機 (Visual Trigger)」，瞬間喚醒你當時觀看影片的動態記憶與情境，達成知識的雙重強化。