File: data-flow-simplified-router-mode.md

package info (click to toggle)
llama.cpp 8064%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 76,488 kB
  • sloc: cpp: 353,828; ansic: 51,268; python: 30,090; lisp: 11,788; sh: 6,290; objc: 1,395; javascript: 924; xml: 384; makefile: 233
file content (77 lines) | stat: -rw-r--r-- 2,404 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
```mermaid
%% ROUTER Mode Data Flow (multi-model)
%% Detailed flows: ./flows/server-flow.mmd, ./flows/models-flow.mmd, ./flows/chat-flow.mmd

sequenceDiagram
    participant User as 👤 User
    participant UI as 🧩 UI
    participant Stores as đŸ—„ī¸ Stores
    participant DB as 💾 IndexedDB
    participant API as 🌐 llama-server

    Note over User,API: 🚀 Initialization (see: server-flow.mmd, models-flow.mmd)

    UI->>Stores: initialize()
    Stores->>DB: load conversations
    Stores->>API: GET /props
    API-->>Stores: {role: "router"}
    Stores->>API: GET /v1/models
    API-->>Stores: models[] with status (loaded/available)
    loop each loaded model
        Stores->>API: GET /props?model=X
        API-->>Stores: modalities (vision/audio)
    end

    Note over User,API: 🔄 Model Selection (see: models-flow.mmd)

    User->>UI: select model
    alt model not loaded
        Stores->>API: POST /models/load
        loop poll status
            Stores->>API: GET /v1/models
            API-->>Stores: check if loaded
        end
        Stores->>API: GET /props?model=X
        API-->>Stores: cache modalities
    end
    Stores->>Stores: validate modalities vs conversation
    alt valid
        Stores->>Stores: select model
    else invalid
        Stores->>API: POST /models/unload
        UI->>User: show error toast
    end

    Note over User,API: đŸ’Ŧ Chat Flow (see: chat-flow.mmd)

    User->>UI: send message
    UI->>Stores: sendMessage()
    Stores->>DB: save user message
    Stores->>API: POST /v1/chat/completions {model: X}
    Note right of API: router forwards to model
    loop streaming
        API-->>Stores: SSE chunks + model info
        Stores-->>UI: reactive update
    end
    API-->>Stores: done + timings
    Stores->>DB: save assistant message + model used

    Note over User,API: 🔁 Regenerate (optional: different model)

    User->>UI: regenerate
    Stores->>Stores: validate modalities up to this message
    Stores->>DB: create message branch
    Note right of Stores: same streaming flow

    Note over User,API: âšī¸ Stop

    User->>UI: stop
    Stores->>Stores: abort stream
    Stores->>DB: save partial response

    Note over User,API: đŸ—‘ī¸ LRU Unloading

    Note right of API: Server auto-unloads LRU models<br/>when cache full
    User->>UI: select unloaded model
    Note right of Stores: triggers load flow again
```