Files

klas 40143734fc Initial commit: COPILOT D6 Flutter keyboard controller

Flutter web app replacing legacy WPF CCTV surveillance keyboard controller.
Includes wall overview, section view with monitor grid, camera input,
PTZ control, alarm/lock/sequence BLoCs, and legacy-matching UI styling.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-12 14:57:38 +01:00

18 KiB

Raw Blame History

Architecture Review: Legacy vs New — Critical Infrastructure Improvements

Pre-implementation review. This system controls traffic/tunnel cameras in critical infrastructure. Every failure mode must be addressed. The system may run on Windows, Linux, or Android tablets in the future.

1. Side-by-Side Failure Mode Comparison

1.1 Camera Server Unreachable

Aspect	Legacy (WPF)	New (Flutter)	Verdict
Detection	Driver `IsConnected` check every 2 seconds	HTTP timeout (5s)	Legacy better — faster detection
Recovery	`CameraServerDriverReconnectService` retries every 2s	None — user must click retry button	Critical gap
Partial failure	Skips disconnected drivers, other servers still work	Each bridge is independent — OK	Equal
State on reconnect	Reloads media channels, fires `DriverConnected` event	No state resync after reconnect	Gap

1.2 Coordination Layer Down (AppServer / PRIMARY)

Aspect	Legacy (WPF)	New (Flutter)	Verdict
Detection	SignalR built-in disconnect detection	Not implemented yet	Equal (both need this)
Recovery	SignalR auto-reconnect: 0s, 5s, 10s, 15s fixed delays	Not implemented yet	To be built
Degraded mode	CrossSwitch/PTZ work, locks/sequences don't	Same design — correct	Equal
State on reconnect	Hub client calls `GetLockedCameraIds()`, `GetRunningSequences()`	Not implemented yet	Must match

1.3 Network Failure

Aspect	Legacy (WPF)	New (Flutter)	Verdict
Detection	`NetworkAvailabilityWorker` polls every 5s (checks NIC status)	None — no network detection	Critical gap
UI feedback	`NetworkAvailabilityState` updates UI commands	Connection status bar (manual)	Gap
Recovery	Automatic — reconnect services activate when NIC comes back	Manual only — user clicks retry	Critical gap

1.4 Bridge Process Crash

Aspect	Legacy (WPF)	New (Flutter)	Verdict
Detection	N/A (SDK was in-process)	HTTP timeout → connection status false	OK
Recovery	N/A (app restarts)	None — bridge stays dead	Critical gap
Prevention	N/A	Process supervision needed	Must add

1.5 Flutter App Crash

Aspect	Legacy (WPF)	New (Flutter)	Verdict
Recovery	App restarts, reconnects in ~5s	App restarts, must reinitialize	Equal
State recovery	Queries AppServer for locks, sequences, viewer states	Queries bridges for monitor states, alarms	Equal
Lock state	Restored via `GetLockedCameraIds()`	Restored from coordination service	Equal

2. Critical Improvements Required

2.1 Automatic Reconnection (MUST HAVE)

The legacy system reconnects automatically at every level. Our Flutter app does not. For tunnel/traffic camera control, an operator cannot be expected to click a retry button during an emergency.

Required reconnection layers:

Layer 1: Bridge Health Polling
  Flutter → periodic GET /health to each bridge
  If bridge was down and comes back → auto-reconnect WebSocket + resync state

Layer 2: WebSocket Auto-Reconnect
  On disconnect → exponential backoff retry (1s, 2s, 4s, 8s, max 30s)
  On reconnect → resync state from bridge

Layer 3: Coordination Auto-Reconnect
  On PRIMARY disconnect → retry connection with backoff
  After 6s → STANDBY promotion (if configured)
  On reconnect to (new) PRIMARY → resync lock/sequence state

Layer 4: Network Change Detection
  Monitor network interface status
  On network restored → trigger reconnection at all layers

Legacy equivalent:

Camera drivers: 2-second reconnect loop (CameraServerDriverReconnectService)
SignalR: built-in auto-reconnect with HubRetryPolicy (0s, 5s, 10s, 15s)
Network: 5-second NIC polling (NetworkAvailabilityWorker)

2.2 Process Supervision (MUST HAVE)

Every .NET process (bridges + coordination service) must auto-restart on crash. An operator should never have to SSH into a machine to restart a bridge.

Platform	Supervision Method
Windows	Windows Service (via `Microsoft.Extensions.Hosting.WindowsServices`) or NSSM
Linux	systemd units with `Restart=always`
Docker	`restart: always` policy
Android tablet	Bridges run on server, not locally

Proposed process tree:

LattePanda Sigma (per keyboard)
├── copilot-geviscope-bridge.service    (auto-restart)
├── copilot-gcore-bridge.service        (auto-restart)
├── copilot-geviserver-bridge.service   (auto-restart)
├── copilot-coordinator.service         (auto-restart, PRIMARY only)
└── copilot-keyboard.service            (auto-restart, Flutter desktop)
    or browser tab (Flutter web)

2.3 Health Monitoring Dashboard (SHOULD HAVE)

The operator must see at a glance what's working and what's not.

┌──────────────────────────────────────────────────────────┐
│  System Status                                            │
│  ┌────────────┐  ┌────────────┐  ┌────────────────────┐  │
│  │ GeViScope  │  │  G-Core    │  │  Coordination      │  │
│  │  ● Online  │  │  ● Online  │  │  ● PRIMARY active  │  │
│  │  12 cams   │  │  8 cams    │  │  2 keyboards       │  │
│  │  6 viewers │  │  4 viewers │  │  1 lock active     │  │
│  └────────────┘  └────────────┘  └────────────────────┘  │
│                                                           │
│  ⚠ G-Core bridge reconnecting (attempt 3/∞)              │
└──────────────────────────────────────────────────────────┘

2.4 Command Retry with Idempotency (SHOULD HAVE)

Critical commands (CrossSwitch) should retry on transient failure:

Future<bool> viewerConnectLive(int viewer, int channel) async {
  for (int attempt = 1; attempt <= 3; attempt++) {
    try {
      final response = await _client.post('/viewer/connect-live', ...);
      if (response.statusCode == 200) return true;
    } catch (e) {
      if (attempt == 3) rethrow;
      await Future.delayed(Duration(milliseconds: 200 * attempt));
    }
  }
  return false;
}

PTZ commands should NOT retry (they're continuous — a stale retry would cause unexpected movement).

2.5 State Verification After Reconnection (MUST HAVE)

After any reconnection event, the app must not trust its cached state:

On bridge reconnect:
  1. Query GET /monitors → rebuild monitor state
  2. Query GET /alarms/active → rebuild alarm state
  3. Re-subscribe WebSocket events

On coordination reconnect:
  1. Query locks → rebuild lock state
  2. Query running sequences → update sequence state
  3. Re-subscribe lock/sequence change events

Legacy does this: ViewerStatesInitWorker rebuilds viewer state on startup/reconnect. ConfigurationService.OnChangeAvailability resyncs config when AppServer comes back.

3. Platform Independence Analysis

3.1 Current Platform Assumptions

Component	Current Assumption	Future Need
C# Bridges	Run locally on Windows (LattePanda)	Linux, Docker, remote server
Flutter App	Windows desktop or browser	Linux, Android tablet, browser
Coordination	Runs on PRIMARY keyboard (Windows)	Linux, Docker, any host
Hardware I/O	USB Serial + HID on local machine	Remote keyboard via network, or Bluetooth
Bridge URLs	`http://localhost:7720`	`http://192.168.x.y:7720` (already configurable)

3.2 Architecture for Platform Independence

graph TB
    subgraph "Deployment A: LattePanda (Current)"
        LP_App["Flutter Desktop"]
        LP_Bridge1["GeViScope Bridge"]
        LP_Bridge2["G-Core Bridge"]
        LP_Coord["Coordinator"]
        LP_Serial["USB Serial/HID"]
        LP_App --> LP_Bridge1
        LP_App --> LP_Bridge2
        LP_App --> LP_Coord
        LP_Serial --> LP_App
    end

    subgraph "Deployment B: Android Tablet (Future)"
        AT_App["Flutter Android"]
        AT_BT["Bluetooth Keyboard"]
        AT_App -->|"HTTP over WiFi"| Remote_Bridge1["Bridge on Server"]
        AT_App -->|"HTTP over WiFi"| Remote_Bridge2["Bridge on Server"]
        AT_App -->|"WebSocket"| Remote_Coord["Coordinator on Server"]
        AT_BT --> AT_App
    end

    subgraph "Deployment C: Linux Kiosk (Future)"
        LX_App["Flutter Linux"]
        LX_Bridge1["GeViScope Bridge"]
        LX_Bridge2["G-Core Bridge"]
        LX_Coord["Coordinator"]
        LX_Serial["USB Serial/HID"]
        LX_App --> LX_Bridge1
        LX_App --> LX_Bridge2
        LX_App --> LX_Coord
        LX_Serial --> LX_App
    end

    Remote_Bridge1 --> CS1["Camera Server 1"]
    Remote_Bridge2 --> CS2["Camera Server 2"]
    LP_Bridge1 --> CS1
    LP_Bridge2 --> CS2
    LX_Bridge1 --> CS1
    LX_Bridge2 --> CS2

3.3 Key Design Rules for Platform Independence

Flutter app never assumes bridges are on localhost. Bridge URLs come from servers.json. Already the case.
Bridges are deployable anywhere .NET 8 runs. Currently Windows x86/x64. Must also build for Linux x64 and linux-arm64.
Coordination service is just another network service. Flutter app connects to it like a bridge — via configured URL.
Hardware I/O is abstracted behind a service interface. KeyboardService interface has platform-specific implementations:
- NativeSerialKeyboardService (desktop with USB)
- WebSerialKeyboardService (browser with Web Serial API)
- BluetoothKeyboardService (tablet with BT keyboard, future)
- EmulatedKeyboardService (development/testing)
No platform-specific code in business logic. All platform differences are in the service layer, injected via DI.

4. Coordination Service Design (Option B)

4.1 Service Overview

A minimal .NET 8 ASP.NET Core application (~400 lines) running on the PRIMARY keyboard:

copilot-coordinator/
├── Program.cs              # Minimal API setup, WebSocket, endpoints
├── Services/
│   ├── LockManager.cs      # Camera lock state (ported from legacy CameraLocksService)
│   ├── SequenceRunner.cs   # Sequence execution (ported from legacy SequenceService)
│   └── KeyboardRegistry.cs # Track connected keyboards
├── Models/
│   ├── CameraLock.cs       # Lock state model
│   ├── SequenceState.cs    # Running sequence model
│   └── Messages.cs         # WebSocket message types
└── appsettings.json        # Lock timeout, heartbeat interval config

4.2 REST API

GET  /health                              → Service health
GET  /status                              → Connected keyboards, active locks, sequences

POST /locks/try         {cameraId, keyboardId, priority}  → Acquire lock
POST /locks/release     {cameraId, keyboardId}             → Release lock
POST /locks/takeover    {cameraId, keyboardId, priority}   → Request takeover
POST /locks/confirm     {cameraId, keyboardId, confirm}    → Confirm/reject takeover
POST /locks/reset       {cameraId, keyboardId}             → Reset expiration
GET  /locks                                                → All active locks
GET  /locks/{keyboardId}                                   → Locks held by keyboard

POST /sequences/start   {viewerId, sequenceId}             → Start sequence
POST /sequences/stop    {viewerId}                          → Stop sequence
GET  /sequences/running                                     → Active sequences

WS   /ws                                                    → Real-time events

4.3 WebSocket Events (broadcast to all connected keyboards)

{"type": "lock_acquired",    "cameraId": 5, "keyboardId": "KB1", "expiresAt": "..."}
{"type": "lock_released",    "cameraId": 5}
{"type": "lock_expiring",    "cameraId": 5, "keyboardId": "KB1", "expiresIn": 60}
{"type": "lock_takeover",    "cameraId": 5, "from": "KB1", "to": "KB2"}
{"type": "sequence_started", "viewerId": 1001, "sequenceId": 3}
{"type": "sequence_stopped", "viewerId": 1001}
{"type": "keyboard_online",  "keyboardId": "KB2"}
{"type": "keyboard_offline", "keyboardId": "KB2"}
{"type": "heartbeat"}

4.4 Failover (Configured STANDBY)

keyboards.json:
{
  "keyboards": [
    {"id": "KB1", "role": "PRIMARY", "coordinatorPort": 8090},
    {"id": "KB2", "role": "STANDBY", "coordinatorPort": 8090}
  ]
}

PRIMARY starts coordinator service on :8090
STANDBY monitors PRIMARY's /health endpoint
If PRIMARY unreachable for 6 seconds → STANDBY starts its own coordinator
When old PRIMARY recovers → checks if another coordinator is running → defers (becomes STANDBY)
Lock state after failover: empty (locks expire naturally in ≤5 minutes, same as legacy AppServer restart behavior)

5. Improvement Summary: Legacy vs New

What the New System Does BETTER

Improvement	Detail
No central server hardware	Coordinator runs on keyboard, not separate machine
Alarm reliability	Query + Subscribe + Periodic sync (legacy had event-only + hourly refresh)
Direct command path	CrossSwitch/PTZ bypass coordinator entirely (legacy routed some through AppServer)
Multiplatform	Flutter + .NET 8 run on Windows, Linux, Android. Legacy was Windows-only WPF
No SDK dependency in UI	Bridges abstract SDKs behind REST. UI never touches native code
Independent operation	Each keyboard works standalone for critical ops. Legacy needed AppServer for several features
Deployable anywhere	Bridges + coordinator can run on any server, not just the keyboard

What the New System Must MATCH (Currently Missing)

Legacy Feature	Legacy Implementation	New Implementation Needed
Auto-reconnect to camera servers	2-second periodic retry service	Bridge health polling + WebSocket auto-reconnect
Auto-reconnect to AppServer	SignalR built-in (0s, 5s, 10s, 15s)	Coordinator WebSocket auto-reconnect with backoff
Network detection	5-second NIC polling worker	`connectivity_plus` package or periodic health checks
State resync on reconnect	`ViewerStatesInitWorker`, config resync on availability change	Query bridges + coordinator on any reconnect event
Graceful partial failure	`Parallel.ForEach` with per-driver try-catch	Already OK (each bridge independent)
Process watchdog	Windows Service	systemd / Windows Service / Docker restart policy
Media channel refresh	10-minute periodic refresh	Periodic bridge status query

What the New System Should Do BETTER THAN Legacy

Improvement	Legacy Gap	New Approach
Exponential backoff	Fixed delays (0, 5, 10, 15s) — no backoff	Exponential: 1s, 2s, 4s, 8s, max 30s with jitter
Circuit breaker	None — retries forever even if server is gone	After N failures, back off to slow polling (60s)
Command retry	None — single attempt	Retry critical commands (CrossSwitch) 3x with 200ms delay
Health visibility	Hidden in logs	Operator-facing status dashboard in UI
Structured logging	Basic ILogger	JSON structured logging → ELK (already in design)
Graceful degradation UI	Commands silently disabled	Clear visual indicator: "Degraded mode — locks unavailable"

6. Proposed Resilience Architecture

graph TB
    subgraph "Flutter App"
        UI["UI Layer"]
        BLoCs["BLoC Layer"]
        RS["ReconnectionService"]
        HS["HealthService"]
        BS["BridgeService"]
        CS["CoordinationClient"]
        KS["KeyboardService"]
    end

    subgraph "Health & Reconnection"
        RS -->|"periodic /health"| Bridge1["GeViScope Bridge"]
        RS -->|"periodic /health"| Bridge2["G-Core Bridge"]
        RS -->|"periodic /health"| Coord["Coordinator"]
        RS -->|"on failure"| BS
        RS -->|"on failure"| CS
        HS -->|"status stream"| BLoCs
    end

    subgraph "Normal Operation"
        BS -->|"REST commands"| Bridge1
        BS -->|"REST commands"| Bridge2
        BS -->|"WebSocket events"| Bridge1
        BS -->|"WebSocket events"| Bridge2
        CS -->|"REST + WebSocket"| Coord
    end

    BLoCs --> UI
    KS -->|"Serial/HID"| BLoCs

New services needed in Flutter app:

Service	Responsibility
`ReconnectionService`	Polls bridge `/health` endpoints, auto-reconnects WebSocket, triggers state resync
`HealthService`	Aggregates health of all bridges + coordinator, exposes stream to UI
`CoordinationClient`	REST + WebSocket client to coordinator (locks, sequences, heartbeat)

7. Action Items Before Implementation

Create coordination service (.NET 8 minimal API, ~400 lines)
Add ReconnectionService to Flutter app (exponential backoff, health polling)
Add HealthService to Flutter app (status aggregation for UI)
Add CoordinationClient to Flutter app (locks, sequences)
Fix WebSocket auto-reconnect in BridgeService
Add command retry for CrossSwitch (3x with backoff)
Add bridge process supervision (systemd/Windows Service configs)
Add state resync on every reconnect event
Build health status UI component
Update servers.json schema to include coordinator URL
Build for Linux — verify .NET 8 bridges compile for linux-x64
Abstract keyboard input behind KeyboardService interface with platform impls

18 KiB Raw Blame History