Files
AI_portal/AI_PORTAL_HANDOFF.md
Ondřej Glaser 48cef99257 Initial portal commit: landing + 9 AI-powered apps
Apps:
- dwg-rooms: extract room numbers from DWG/DXF
- dwg-counting: count symbols in PDF drawings (OpenCV template matching)
- contract-check: review PDF contracts against a checklist (Claude vision + Tesseract OCR fallback)
- email-drafter: bullet notes → polished Czech/English business emails
- invoice-extractor: PDF/image invoice → structured data → Excel
- translator: Czech-first translator across 19 languages with tone control
- vv-check: find inconsistent unit prices across VV sheets in one workbook
- vv-compare: diff original vs new VV files (changes / added / removed)
- feature-request: portal users submit ideas + sample files

Infrastructure:
- LiteLLM gateway with per-app virtual keys + budgets
- Langfuse observability
- Geist font, shared theme, cross-subdomain back link + theme sync via cookie/URL
- Caddy reverse proxy on *.klas.chat
2026-05-13 15:25:04 +02:00

340 lines
21 KiB
Markdown

# Internal AI Portal — Project Handoff
**Status:** Architecture & vendor selection complete. Ready for build.
**Audience:** Local coding agent + project owner.
**Goal:** Build an internal AI landing page for company employees that combines guided "AI apps" (tile-based workflows) with a full-featured chat fallback, all routed through company API keys with per-user cost tracking and budget enforcement.
---
## 1. Product vision
A single internal URL employees go to. They see:
- **A grid of tiles**, each a guided "AI app" for a specific task (legal-notice check, contract summarizer, invoice extractor, translator, etc.). These are for colleagues who don't want to write prompts — fill a form, get an answer.
- **One special tile labeled "Open Chat"** that drops them into a full ChatGPT/Claude-equivalent chat: file uploads (Excel, PDF, images), code execution sandbox, multi-turn, switch between models mid-conversation.
- All API calls go through company-owned keys. Employees don't pay; the company does. In return, every call is logged, costed, and budget-capped per user.
**Non-goals (for v1):**
- Customer-facing product (internal only).
- Multi-tenant SaaS.
- Replacing existing tools that aren't AI-related.
---
## 2. Architecture overview
Composition of three FOSS components plus a thin custom landing page. No single FOSS project covers all requirements — combining them is the production-ready path.
```
┌────────────────────────────────────┐
│ Custom landing page (tiles UI) │
│ - Auth via company SSO (OIDC) │
│ - Renders tiles + "Open Chat" │
└────────────┬───────────────────────┘
┌──────────────────────┼──────────────────────┐
│ │ │
▼ ▼ ▼
┌────────────┐ ┌────────────┐ ┌──────────────┐
│ Dify │ │ Open WebUI │ │ Other tile │
│ (workflow │ │ (chat + │ │ apps (links │
│ apps) │ │ sandbox) │ │ or iframes)│
└─────┬──────┘ └─────┬──────┘ └──────┬───────┘
│ │ │
└──────────────────────┼───────────────────────┘
┌──────────────────────────────┐
│ LiteLLM proxy gateway │
│ - Unified OpenAI-format │
│ - Per-user virtual keys │
│ - Budget enforcement │
│ - Cost & token logging │
└──────────────┬───────────────┘
┌──────────────────────────────┐
│ Langfuse (observability) │
│ - Traces, dashboards │
│ - Per-user analytics │
│ (consumes LiteLLM logs) │
└──────────────┬───────────────┘
OpenAI / Anthropic / Google / Azure APIs
```
### Component responsibilities
| Component | Role | License |
|-----------|------|---------|
| **Landing page** | Tile UI, SSO entry point, hands users off to specific tools | Custom (we own it) |
| **Dify** | Hosts the predefined "apps" — visual workflow builder, each app is a workflow | Apache 2.0 + extras (logo restriction; OK for internal single-tenant) |
| **Open WebUI** | The "Open Chat" experience — full ChatGPT-equivalent UI with file uploads & code interpreter | BSD-3-Clause |
| **gVisor code execution add-on** | Sandboxed Python/Bash for Open WebUI (Excel processing, charts, etc.) | Apache 2.0 |
| **LiteLLM** | Single API gateway for all providers, virtual keys per user, budget enforcement, cost tracking | MIT |
| **Langfuse** | Observability dashboards on top of LiteLLM | MIT (with self-host caveats — see §9) |
### Why this split
- **Dify alone** has tile-based apps but its chat is weaker than dedicated chat UIs and its code interpreter doesn't match ChatGPT's Excel-processing experience.
- **Open WebUI alone** is chat-first; building 50 guided "apps" inside it means custom Functions/Pipelines code per app, which is harder to maintain than Dify's visual workflows.
- **LibreChat** was considered but rejected: its built-in code interpreter is a paid SaaS service, which fails the "as capable as official chat interfaces" requirement out of the box. Open WebUI's gVisor sandbox is fully free.
- **LiteLLM** is the only piece that gives us per-user budgets and cost tracking across both Dify and Open WebUI, since both can be configured to use it as their "OpenAI-compatible" provider.
---
## 3. User flows
### Flow A — colleague who doesn't want to write prompts
1. Opens portal → sees tile grid.
2. Clicks "Legal Notice Check" tile → opens Dify-hosted app in iframe or new tab.
3. Form: paste notice text, select jurisdiction, click Submit.
4. Dify workflow runs → output displayed.
5. All LLM calls inside the workflow went through LiteLLM → cost attributed to this user.
### Flow B — colleague who wants a real chat
1. Opens portal → clicks "Open Chat" tile.
2. Lands in Open WebUI session (already authenticated via SSO).
3. Uploads `q3-sales.xlsx`, asks "summarize regional performance and draw a bar chart."
4. Model writes Python → gVisor sandbox executes → result + chart appear inline.
5. User can switch model mid-conversation (Claude → GPT-5 → local).
6. All calls flow through LiteLLM → costed to this user, budget enforced.
### Flow C — admin
1. Opens LiteLLM admin UI → sees per-user spend, sets/adjusts budgets.
2. Opens Langfuse → sees traces, prompt analytics, error rates.
3. Opens Dify admin → adds a new workflow → it appears as a new tile on the landing page (via config update).
---
## 4. Repository layout (proposed)
```
ai-portal/
├── README.md
├── docker-compose.yml # All services
├── .env.example
├── landing/ # Custom landing page (Next.js or similar)
│ ├── package.json
│ ├── src/
│ │ ├── pages/index.tsx # Tile grid
│ │ ├── lib/auth.ts # OIDC client
│ │ └── data/apps.json # Tile definitions
│ └── Dockerfile
├── litellm/
│ ├── config.yaml # Models, virtual keys, budgets
│ └── README.md
├── openwebui/
│ ├── env.example
│ ├── functions/ # Custom OWUI functions if needed
│ └── tools/
│ └── run_code.py # gVisor sandbox tool (from EtiennePerot/safe-code-execution)
├── dify/
│ ├── env.example
│ ├── workflows/ # Exported workflow DSL files (one per tile app)
│ │ ├── legal-notice-check.yml
│ │ └── ...
│ └── README.md
├── langfuse/
│ └── env.example
├── infra/
│ ├── traefik/ # Reverse proxy + TLS
│ │ └── traefik.yml
│ └── backup/
│ └── backup.sh
└── docs/
├── ARCHITECTURE.md # This document, eventually
├── ADDING_NEW_APP.md # Runbook for adding a tile
└── ONBOARDING.md # End-user docs
```
---
## 5. Build phases
### Phase 0 — Local dev environment (Day 1)
Goal: all four services running locally via docker-compose, talking to each other.
1. Bootstrap repo with the layout above.
2. Write `docker-compose.yml` with services: `litellm`, `langfuse`, `openwebui`, `dify-api`, `dify-web`, `dify-worker`, `redis`, `postgres`, `traefik`, `landing`.
3. Each service gets its own subnet + `.env` file. No secrets in git.
4. Verify: each service reachable at `https://<service>.localhost` via Traefik with self-signed certs.
**Acceptance:** `docker compose up` brings everything up clean. Each service's UI loads.
### Phase 1 — LiteLLM gateway (Day 2)
Goal: all model calls go through LiteLLM. Cost tracking works.
1. Configure `litellm/config.yaml` with at minimum: OpenAI, Anthropic, Google. Add Azure if used.
2. Set up a master key + per-user virtual keys (for now: 2 test users).
3. Set test budget: $5/user/month, hard cap.
4. Smoke test: `curl` a chat completion through LiteLLM → verify it appears in LiteLLM's spend log.
5. Verify budget enforcement: temporarily lower limit, blast 1000 tokens, confirm 429.
**Acceptance:** LiteLLM admin UI shows per-user spend in real time. Budget limits actually block.
LiteLLM docs: https://docs.litellm.ai/docs/proxy/quick_start
### Phase 2 — Open WebUI as the chat tile (Days 3-4)
Goal: full chat UI with file upload + Excel processing.
1. Configure Open WebUI to use LiteLLM as its OpenAI-compatible endpoint.
2. Set up SSO (OIDC) so user identity flows through.
3. Map OWUI users → LiteLLM virtual keys (so per-user budgets work). Two options:
- Single LiteLLM key, OWUI passes user ID as `X-LiteLLM-User-Id` header — requires LiteLLM tag-based tracking.
- Per-user LiteLLM keys, OWUI feature for per-user provider keys. Investigate which is cleaner.
4. Install the gVisor code execution function and tool from `EtiennePerot/safe-code-execution`. Configure the OWUI container with sandboxing prerequisites (privileged or specific capabilities — see that repo's setup docs).
5. Test: upload an XLSX, ask the model to analyze and chart it. Verify code runs in sandbox, chart appears inline, no code escapes the sandbox.
**Acceptance:** A non-technical user can upload a file and ask "what's in this spreadsheet" and get a useful answer with a chart. Cost shows up in LiteLLM tied to that user.
References:
- Open WebUI docs: https://docs.openwebui.com/
- Code exec add-on: https://github.com/EtiennePerot/safe-code-execution
- Pyodide alternative (no Docker privilege needed but limited libs): https://docs.openwebui.com/features/chat-conversations/chat-features/code-execution/python/
### Phase 3 — Dify for workflow apps (Days 5-7)
Goal: at least 3 working "apps" hosted in Dify and reachable via shareable URLs.
1. Configure Dify's model providers to point at LiteLLM (NOT directly at OpenAI/Anthropic). This is critical — otherwise Dify calls bypass cost tracking.
2. Create the first 3 workflows as proof of concept:
- **Legal Notice Check** (input: text + jurisdiction → output: risk summary)
- **Document Summarizer** (input: file → output: bullet summary)
- **Email Drafter** (input: bullets → output: polished email)
3. For each workflow, enable the "share as web app" feature, get a URL.
4. Decide tile-to-workflow URL mapping format: store in `landing/src/data/apps.json`.
**Acceptance:** Each app works end-to-end via its share URL. Each invocation shows up in LiteLLM spend log attributed to a user (this requires Dify to forward user identity — research how, may need a custom API gateway proxy in front of Dify share URLs).
References:
- Dify self-hosting: https://docs.dify.ai/getting-started/install-self-hosted/docker-compose
- Dify license: https://github.com/langgenius/dify/blob/main/LICENSE (note logo restriction)
### Phase 4 — Landing page (Days 8-10)
Goal: the front door.
1. Next.js (or Astro, SvelteKit — pick what the team knows) app with:
- OIDC login.
- Tile grid, data-driven from `apps.json`.
- Each tile: icon, title, 1-line description, click-through to either a Dify share URL or `/chat` (Open WebUI).
- Search/filter for when there are 50 tiles.
2. Make `apps.json` editable without redeploy: read it from a mounted volume or fetch from Dify's app list API on render.
3. Branding: company logo, color scheme.
4. Add a simple "Costs" link visible to admins only → embeds Langfuse or LiteLLM dashboard.
**Acceptance:** A new employee with SSO access lands on the page, sees tiles, can click into a workflow or open chat without any extra login prompt.
### Phase 5 — Observability & ops (Days 11-12)
1. Stand up Langfuse, configure LiteLLM to ship traces to it.
2. Build a couple of saved dashboards: total spend per day, top 10 users, top 10 apps.
3. Set up alerts: Slack/email when monthly spend hits 80% of cap.
4. Backup script for: Postgres (Dify, Langfuse, Open WebUI), Redis snapshots, Dify workflow exports.
5. Restore drill: spin up fresh stack from backups in <30 min.
### Phase 6 — Hardening & rollout (Days 13-15)
1. Move from self-signed to real TLS (Let's Encrypt via Traefik).
2. Lock down: only company SSO group X can authenticate.
3. Audit log review: confirm every LLM call has user attribution.
4. Doc: `ONBOARDING.md` for end users, `ADDING_NEW_APP.md` for the team that maintains workflows.
5. Pilot with 5-10 users for a week. Iterate. Then announce internally.
---
## 6. Critical decisions to make before coding
These should be answered before Phase 0:
1. **Where will this run?** On-prem, AWS, Azure, GCP? Affects networking, secrets management, and TLS.
2. **Which SSO?** Microsoft Entra ID (Azure AD), Google Workspace, Okta, Keycloak? Open WebUI, Dify, and Langfuse all support OIDC; landing page will too.
3. **What's the realistic budget for v1?** Affects: how many models we expose (GPT-5 + Claude Opus = expensive), default per-user budget, alert thresholds.
4. **Who owns this long-term?** The team adding new workflows in Dify is different from the team maintaining the platform. Make this explicit.
5. **Data sensitivity?** If employees may paste PII or confidential data, need to: (a) prefer Anthropic/OpenAI Zero Data Retention agreements, (b) consider Azure OpenAI for stronger contractual posture, (c) document an acceptable use policy.
6. **Logo customization for Dify.** Internal-only single-tenant deployments are fine keeping the Dify logo. If branding is required, pricing requires contacting business@dify.ai directly. Recommend: keep logo for v1, revisit if leadership pushes back.
---
## 7. Risks & mitigations
| Risk | Mitigation |
|------|------------|
| Cost runs away on day 1 | LiteLLM hard budgets per user enforced from the start, low default ($10-20/user/month) |
| Sandbox escape in code interpreter | gVisor is what ChatGPT uses; combined with container isolation, lowest-feasible risk for this use case. No internet egress from sandbox by default. |
| Dify license violation | Stay single-tenant, keep logo (or pay for license). Document this in `LICENSE_NOTES.md` |
| Sensitive data leakage to providers | Configure providers with ZDR / no-training. Add a banner on the landing page reminding users not to paste secrets. Optionally: a content filter at the LiteLLM layer (regex for credit cards, secrets) that strips/blocks. |
| Vendor lock-in to Dify | Workflows are exportable as YAML. Periodic export commit to git keeps them portable. |
| User identity drift across services | Single OIDC issuer, all services configured to use it, scripted "sync user LiteLLM virtual key" on first login. |
| Open WebUI gVisor sandbox needs privileged Docker | If host policy blocks privileged containers, fall back to Pyodide (browser-based, no privilege needed, narrower library set). Document the trade-off. |
---
## 8. Acceptance criteria for v1
The v1 ships when all of these are true:
- [ ] An employee with SSO can reach the portal at `https://ai.<company>.<tld>` and see tiles.
- [ ] At least 5 working "apps" as tiles, plus the chat tile.
- [ ] Chat supports: Excel/CSV/PDF upload, code execution, image input, model switching mid-conversation.
- [ ] Every LLM call appears in LiteLLM logs with the calling user's identity.
- [ ] Per-user monthly budget enforces (verified by integration test).
- [ ] Admin can see a per-user, per-app spend dashboard.
- [ ] Adding a new "app" tile takes <30 minutes (build workflow in Dify, add row to `apps.json`).
- [ ] Documented disaster recovery: backups + restore drill executed once.
- [ ] Acceptable use policy linked from the landing page.
---
## 9. Known unknowns / research items
These need a spike before committing:
1. **User identity propagation Dify → LiteLLM.** Dify's "share as web app" mode may not pass end-user identity to its LLM calls. May need to either: (a) put a small reverse proxy in front of Dify share URLs that injects a header, or (b) provision per-user Dify accounts and per-user provider keys in Dify (expensive ops cost). Spike this in Phase 3.
2. **Open WebUI per-user provider keys vs tag-based attribution.** Both work; pick based on which has cleaner UX for budget reporting.
3. **Langfuse self-hosting license.** Langfuse self-hosted has tiers; the FOSS core covers traces but some advanced features are commercial. Confirm the FOSS-core features are sufficient before depending on it. If not, fallback is LiteLLM's built-in dashboard (less pretty but free and sufficient for v1).
4. **Where do uploaded files live?** Open WebUI stores them; depending on data classification, may need to mount an encrypted volume or shorten retention.
5. **Rate limits per provider.** With many users behind one company key, OpenAI/Anthropic org-level RPM/TPM limits may bite. LiteLLM supports load balancing across multiple keys plan for this if rollout is wide.
---
## 10. Quick start for the local agent
When you sit down with the local agent, suggested first prompt:
> Read `AI_PORTAL_HANDOFF.md` in this repo. We're building Phase 0 today: the docker-compose skeleton. Set up the directory structure from Section 4, create a `docker-compose.yml` that starts LiteLLM, Open WebUI, Dify (api + web + worker), Postgres, Redis, Langfuse, Traefik, and a placeholder landing page service. Use environment variables for all secrets, with a `.env.example` checked in. Don't configure model providers yet — just get all containers healthy and reachable through Traefik with self-signed TLS at `*.localhost`. Stop and ask before proceeding to Phase 1.
After Phase 0 works, hand the agent Phase 1, 2, etc. one at a time. Don't let it run more than one phase ahead these are integration-heavy and a bad assumption early on will cascade.
---
## Appendix A — Reference links
- Dify: https://github.com/langgenius/dify
- Open WebUI: https://github.com/open-webui/open-webui
- Open WebUI code exec docs: https://docs.openwebui.com/features/chat-conversations/chat-features/code-execution/
- gVisor sandbox add-on: https://github.com/EtiennePerot/safe-code-execution
- LiteLLM: https://github.com/BerriAI/litellm
- LiteLLM proxy docs: https://docs.litellm.ai/docs/proxy/quick_start
- Langfuse: https://github.com/langfuse/langfuse
## Appendix B — Why not LibreChat
Considered. Rejected because LibreChat's built-in Code Interpreter is a paid SaaS service rather than a self-hostable component. The "as capable as official chat interfaces" requirement specifically calls for in-chat code execution and file processing; making that a paid dependency contradicts the "free and open source" goal. Open WebUI's gVisor add-on covers the same ground and is fully Apache-2.0.
## Appendix C — Why not building from scratch
Considered. Rejected for v1 because:
- A production-quality multi-provider chat UI with file uploads, model switching, conversation history, and a sandboxed code interpreter is roughly a year of focused engineering. Open WebUI gives us this for free.
- A workflow builder that non-engineers can use to add new "apps" is similarly large in scope. Dify gives us this for free.
- Custom code is justified only where no FOSS option exists: the landing page (small), the auth glue, and the per-user billing reconciliation.
If after running v1 for a few months we find the FOSS pieces don't fit, we can replace one component at a time without throwing out the whole stack that's the benefit of the gateway architecture.