Flutter web app replacing legacy WPF CCTV surveillance keyboard controller. Includes wall overview, section view with monitor grid, camera input, PTZ control, alarm/lock/sequence BLoCs, and legacy-matching UI styling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
18 KiB
Architecture Review: Legacy vs New — Critical Infrastructure Improvements
Pre-implementation review. This system controls traffic/tunnel cameras in critical infrastructure. Every failure mode must be addressed. The system may run on Windows, Linux, or Android tablets in the future.
1. Side-by-Side Failure Mode Comparison
1.1 Camera Server Unreachable
| Aspect | Legacy (WPF) | New (Flutter) | Verdict |
|---|---|---|---|
| Detection | Driver IsConnected check every 2 seconds |
HTTP timeout (5s) | Legacy better — faster detection |
| Recovery | CameraServerDriverReconnectService retries every 2s |
None — user must click retry button | Critical gap |
| Partial failure | Skips disconnected drivers, other servers still work | Each bridge is independent — OK | Equal |
| State on reconnect | Reloads media channels, fires DriverConnected event |
No state resync after reconnect | Gap |
1.2 Coordination Layer Down (AppServer / PRIMARY)
| Aspect | Legacy (WPF) | New (Flutter) | Verdict |
|---|---|---|---|
| Detection | SignalR built-in disconnect detection | Not implemented yet | Equal (both need this) |
| Recovery | SignalR auto-reconnect: 0s, 5s, 10s, 15s fixed delays | Not implemented yet | To be built |
| Degraded mode | CrossSwitch/PTZ work, locks/sequences don't | Same design — correct | Equal |
| State on reconnect | Hub client calls GetLockedCameraIds(), GetRunningSequences() |
Not implemented yet | Must match |
1.3 Network Failure
| Aspect | Legacy (WPF) | New (Flutter) | Verdict |
|---|---|---|---|
| Detection | NetworkAvailabilityWorker polls every 5s (checks NIC status) |
None — no network detection | Critical gap |
| UI feedback | NetworkAvailabilityState updates UI commands |
Connection status bar (manual) | Gap |
| Recovery | Automatic — reconnect services activate when NIC comes back | Manual only — user clicks retry | Critical gap |
1.4 Bridge Process Crash
| Aspect | Legacy (WPF) | New (Flutter) | Verdict |
|---|---|---|---|
| Detection | N/A (SDK was in-process) | HTTP timeout → connection status false | OK |
| Recovery | N/A (app restarts) | None — bridge stays dead | Critical gap |
| Prevention | N/A | Process supervision needed | Must add |
1.5 Flutter App Crash
| Aspect | Legacy (WPF) | New (Flutter) | Verdict |
|---|---|---|---|
| Recovery | App restarts, reconnects in ~5s | App restarts, must reinitialize | Equal |
| State recovery | Queries AppServer for locks, sequences, viewer states | Queries bridges for monitor states, alarms | Equal |
| Lock state | Restored via GetLockedCameraIds() |
Restored from coordination service | Equal |
2. Critical Improvements Required
2.1 Automatic Reconnection (MUST HAVE)
The legacy system reconnects automatically at every level. Our Flutter app does not. For tunnel/traffic camera control, an operator cannot be expected to click a retry button during an emergency.
Required reconnection layers:
Layer 1: Bridge Health Polling
Flutter → periodic GET /health to each bridge
If bridge was down and comes back → auto-reconnect WebSocket + resync state
Layer 2: WebSocket Auto-Reconnect
On disconnect → exponential backoff retry (1s, 2s, 4s, 8s, max 30s)
On reconnect → resync state from bridge
Layer 3: Coordination Auto-Reconnect
On PRIMARY disconnect → retry connection with backoff
After 6s → STANDBY promotion (if configured)
On reconnect to (new) PRIMARY → resync lock/sequence state
Layer 4: Network Change Detection
Monitor network interface status
On network restored → trigger reconnection at all layers
Legacy equivalent:
- Camera drivers: 2-second reconnect loop (
CameraServerDriverReconnectService) - SignalR: built-in auto-reconnect with
HubRetryPolicy(0s, 5s, 10s, 15s) - Network: 5-second NIC polling (
NetworkAvailabilityWorker)
2.2 Process Supervision (MUST HAVE)
Every .NET process (bridges + coordination service) must auto-restart on crash. An operator should never have to SSH into a machine to restart a bridge.
| Platform | Supervision Method |
|---|---|
| Windows | Windows Service (via Microsoft.Extensions.Hosting.WindowsServices) or NSSM |
| Linux | systemd units with Restart=always |
| Docker | restart: always policy |
| Android tablet | Bridges run on server, not locally |
Proposed process tree:
LattePanda Sigma (per keyboard)
├── copilot-geviscope-bridge.service (auto-restart)
├── copilot-gcore-bridge.service (auto-restart)
├── copilot-geviserver-bridge.service (auto-restart)
├── copilot-coordinator.service (auto-restart, PRIMARY only)
└── copilot-keyboard.service (auto-restart, Flutter desktop)
or browser tab (Flutter web)
2.3 Health Monitoring Dashboard (SHOULD HAVE)
The operator must see at a glance what's working and what's not.
┌──────────────────────────────────────────────────────────┐
│ System Status │
│ ┌────────────┐ ┌────────────┐ ┌────────────────────┐ │
│ │ GeViScope │ │ G-Core │ │ Coordination │ │
│ │ ● Online │ │ ● Online │ │ ● PRIMARY active │ │
│ │ 12 cams │ │ 8 cams │ │ 2 keyboards │ │
│ │ 6 viewers │ │ 4 viewers │ │ 1 lock active │ │
│ └────────────┘ └────────────┘ └────────────────────┘ │
│ │
│ ⚠ G-Core bridge reconnecting (attempt 3/∞) │
└──────────────────────────────────────────────────────────┘
2.4 Command Retry with Idempotency (SHOULD HAVE)
Critical commands (CrossSwitch) should retry on transient failure:
Future<bool> viewerConnectLive(int viewer, int channel) async {
for (int attempt = 1; attempt <= 3; attempt++) {
try {
final response = await _client.post('/viewer/connect-live', ...);
if (response.statusCode == 200) return true;
} catch (e) {
if (attempt == 3) rethrow;
await Future.delayed(Duration(milliseconds: 200 * attempt));
}
}
return false;
}
PTZ commands should NOT retry (they're continuous — a stale retry would cause unexpected movement).
2.5 State Verification After Reconnection (MUST HAVE)
After any reconnection event, the app must not trust its cached state:
On bridge reconnect:
1. Query GET /monitors → rebuild monitor state
2. Query GET /alarms/active → rebuild alarm state
3. Re-subscribe WebSocket events
On coordination reconnect:
1. Query locks → rebuild lock state
2. Query running sequences → update sequence state
3. Re-subscribe lock/sequence change events
Legacy does this: ViewerStatesInitWorker rebuilds viewer state on startup/reconnect. ConfigurationService.OnChangeAvailability resyncs config when AppServer comes back.
3. Platform Independence Analysis
3.1 Current Platform Assumptions
| Component | Current Assumption | Future Need |
|---|---|---|
| C# Bridges | Run locally on Windows (LattePanda) | Linux, Docker, remote server |
| Flutter App | Windows desktop or browser | Linux, Android tablet, browser |
| Coordination | Runs on PRIMARY keyboard (Windows) | Linux, Docker, any host |
| Hardware I/O | USB Serial + HID on local machine | Remote keyboard via network, or Bluetooth |
| Bridge URLs | http://localhost:7720 |
http://192.168.x.y:7720 (already configurable) |
3.2 Architecture for Platform Independence
graph TB
subgraph "Deployment A: LattePanda (Current)"
LP_App["Flutter Desktop"]
LP_Bridge1["GeViScope Bridge"]
LP_Bridge2["G-Core Bridge"]
LP_Coord["Coordinator"]
LP_Serial["USB Serial/HID"]
LP_App --> LP_Bridge1
LP_App --> LP_Bridge2
LP_App --> LP_Coord
LP_Serial --> LP_App
end
subgraph "Deployment B: Android Tablet (Future)"
AT_App["Flutter Android"]
AT_BT["Bluetooth Keyboard"]
AT_App -->|"HTTP over WiFi"| Remote_Bridge1["Bridge on Server"]
AT_App -->|"HTTP over WiFi"| Remote_Bridge2["Bridge on Server"]
AT_App -->|"WebSocket"| Remote_Coord["Coordinator on Server"]
AT_BT --> AT_App
end
subgraph "Deployment C: Linux Kiosk (Future)"
LX_App["Flutter Linux"]
LX_Bridge1["GeViScope Bridge"]
LX_Bridge2["G-Core Bridge"]
LX_Coord["Coordinator"]
LX_Serial["USB Serial/HID"]
LX_App --> LX_Bridge1
LX_App --> LX_Bridge2
LX_App --> LX_Coord
LX_Serial --> LX_App
end
Remote_Bridge1 --> CS1["Camera Server 1"]
Remote_Bridge2 --> CS2["Camera Server 2"]
LP_Bridge1 --> CS1
LP_Bridge2 --> CS2
LX_Bridge1 --> CS1
LX_Bridge2 --> CS2
3.3 Key Design Rules for Platform Independence
-
Flutter app never assumes bridges are on localhost. Bridge URLs come from
servers.json. Already the case. -
Bridges are deployable anywhere .NET 8 runs. Currently Windows x86/x64. Must also build for Linux x64 and linux-arm64.
-
Coordination service is just another network service. Flutter app connects to it like a bridge — via configured URL.
-
Hardware I/O is abstracted behind a service interface.
KeyboardServiceinterface has platform-specific implementations:NativeSerialKeyboardService(desktop with USB)WebSerialKeyboardService(browser with Web Serial API)BluetoothKeyboardService(tablet with BT keyboard, future)EmulatedKeyboardService(development/testing)
-
No platform-specific code in business logic. All platform differences are in the service layer, injected via DI.
4. Coordination Service Design (Option B)
4.1 Service Overview
A minimal .NET 8 ASP.NET Core application (~400 lines) running on the PRIMARY keyboard:
copilot-coordinator/
├── Program.cs # Minimal API setup, WebSocket, endpoints
├── Services/
│ ├── LockManager.cs # Camera lock state (ported from legacy CameraLocksService)
│ ├── SequenceRunner.cs # Sequence execution (ported from legacy SequenceService)
│ └── KeyboardRegistry.cs # Track connected keyboards
├── Models/
│ ├── CameraLock.cs # Lock state model
│ ├── SequenceState.cs # Running sequence model
│ └── Messages.cs # WebSocket message types
└── appsettings.json # Lock timeout, heartbeat interval config
4.2 REST API
GET /health → Service health
GET /status → Connected keyboards, active locks, sequences
POST /locks/try {cameraId, keyboardId, priority} → Acquire lock
POST /locks/release {cameraId, keyboardId} → Release lock
POST /locks/takeover {cameraId, keyboardId, priority} → Request takeover
POST /locks/confirm {cameraId, keyboardId, confirm} → Confirm/reject takeover
POST /locks/reset {cameraId, keyboardId} → Reset expiration
GET /locks → All active locks
GET /locks/{keyboardId} → Locks held by keyboard
POST /sequences/start {viewerId, sequenceId} → Start sequence
POST /sequences/stop {viewerId} → Stop sequence
GET /sequences/running → Active sequences
WS /ws → Real-time events
4.3 WebSocket Events (broadcast to all connected keyboards)
{"type": "lock_acquired", "cameraId": 5, "keyboardId": "KB1", "expiresAt": "..."}
{"type": "lock_released", "cameraId": 5}
{"type": "lock_expiring", "cameraId": 5, "keyboardId": "KB1", "expiresIn": 60}
{"type": "lock_takeover", "cameraId": 5, "from": "KB1", "to": "KB2"}
{"type": "sequence_started", "viewerId": 1001, "sequenceId": 3}
{"type": "sequence_stopped", "viewerId": 1001}
{"type": "keyboard_online", "keyboardId": "KB2"}
{"type": "keyboard_offline", "keyboardId": "KB2"}
{"type": "heartbeat"}
4.4 Failover (Configured STANDBY)
keyboards.json:
{
"keyboards": [
{"id": "KB1", "role": "PRIMARY", "coordinatorPort": 8090},
{"id": "KB2", "role": "STANDBY", "coordinatorPort": 8090}
]
}
- PRIMARY starts coordinator service on
:8090 - STANDBY monitors PRIMARY's
/healthendpoint - If PRIMARY unreachable for 6 seconds → STANDBY starts its own coordinator
- When old PRIMARY recovers → checks if another coordinator is running → defers (becomes STANDBY)
- Lock state after failover: empty (locks expire naturally in ≤5 minutes, same as legacy AppServer restart behavior)
5. Improvement Summary: Legacy vs New
What the New System Does BETTER
| Improvement | Detail |
|---|---|
| No central server hardware | Coordinator runs on keyboard, not separate machine |
| Alarm reliability | Query + Subscribe + Periodic sync (legacy had event-only + hourly refresh) |
| Direct command path | CrossSwitch/PTZ bypass coordinator entirely (legacy routed some through AppServer) |
| Multiplatform | Flutter + .NET 8 run on Windows, Linux, Android. Legacy was Windows-only WPF |
| No SDK dependency in UI | Bridges abstract SDKs behind REST. UI never touches native code |
| Independent operation | Each keyboard works standalone for critical ops. Legacy needed AppServer for several features |
| Deployable anywhere | Bridges + coordinator can run on any server, not just the keyboard |
What the New System Must MATCH (Currently Missing)
| Legacy Feature | Legacy Implementation | New Implementation Needed |
|---|---|---|
| Auto-reconnect to camera servers | 2-second periodic retry service | Bridge health polling + WebSocket auto-reconnect |
| Auto-reconnect to AppServer | SignalR built-in (0s, 5s, 10s, 15s) | Coordinator WebSocket auto-reconnect with backoff |
| Network detection | 5-second NIC polling worker | connectivity_plus package or periodic health checks |
| State resync on reconnect | ViewerStatesInitWorker, config resync on availability change |
Query bridges + coordinator on any reconnect event |
| Graceful partial failure | Parallel.ForEach with per-driver try-catch |
Already OK (each bridge independent) |
| Process watchdog | Windows Service | systemd / Windows Service / Docker restart policy |
| Media channel refresh | 10-minute periodic refresh | Periodic bridge status query |
What the New System Should Do BETTER THAN Legacy
| Improvement | Legacy Gap | New Approach |
|---|---|---|
| Exponential backoff | Fixed delays (0, 5, 10, 15s) — no backoff | Exponential: 1s, 2s, 4s, 8s, max 30s with jitter |
| Circuit breaker | None — retries forever even if server is gone | After N failures, back off to slow polling (60s) |
| Command retry | None — single attempt | Retry critical commands (CrossSwitch) 3x with 200ms delay |
| Health visibility | Hidden in logs | Operator-facing status dashboard in UI |
| Structured logging | Basic ILogger | JSON structured logging → ELK (already in design) |
| Graceful degradation UI | Commands silently disabled | Clear visual indicator: "Degraded mode — locks unavailable" |
6. Proposed Resilience Architecture
graph TB
subgraph "Flutter App"
UI["UI Layer"]
BLoCs["BLoC Layer"]
RS["ReconnectionService"]
HS["HealthService"]
BS["BridgeService"]
CS["CoordinationClient"]
KS["KeyboardService"]
end
subgraph "Health & Reconnection"
RS -->|"periodic /health"| Bridge1["GeViScope Bridge"]
RS -->|"periodic /health"| Bridge2["G-Core Bridge"]
RS -->|"periodic /health"| Coord["Coordinator"]
RS -->|"on failure"| BS
RS -->|"on failure"| CS
HS -->|"status stream"| BLoCs
end
subgraph "Normal Operation"
BS -->|"REST commands"| Bridge1
BS -->|"REST commands"| Bridge2
BS -->|"WebSocket events"| Bridge1
BS -->|"WebSocket events"| Bridge2
CS -->|"REST + WebSocket"| Coord
end
BLoCs --> UI
KS -->|"Serial/HID"| BLoCs
New services needed in Flutter app:
| Service | Responsibility |
|---|---|
ReconnectionService |
Polls bridge /health endpoints, auto-reconnects WebSocket, triggers state resync |
HealthService |
Aggregates health of all bridges + coordinator, exposes stream to UI |
CoordinationClient |
REST + WebSocket client to coordinator (locks, sequences, heartbeat) |
7. Action Items Before Implementation
- Create coordination service (.NET 8 minimal API, ~400 lines)
- Add
ReconnectionServiceto Flutter app (exponential backoff, health polling) - Add
HealthServiceto Flutter app (status aggregation for UI) - Add
CoordinationClientto Flutter app (locks, sequences) - Fix WebSocket auto-reconnect in
BridgeService - Add command retry for CrossSwitch (3x with backoff)
- Add bridge process supervision (systemd/Windows Service configs)
- Add state resync on every reconnect event
- Build health status UI component
- Update
servers.jsonschema to include coordinator URL - Build for Linux — verify .NET 8 bridges compile for linux-x64
- Abstract keyboard input behind
KeyboardServiceinterface with platform impls