Apps: - dwg-rooms: extract room numbers from DWG/DXF - dwg-counting: count symbols in PDF drawings (OpenCV template matching) - contract-check: review PDF contracts against a checklist (Claude vision + Tesseract OCR fallback) - email-drafter: bullet notes → polished Czech/English business emails - invoice-extractor: PDF/image invoice → structured data → Excel - translator: Czech-first translator across 19 languages with tone control - vv-check: find inconsistent unit prices across VV sheets in one workbook - vv-compare: diff original vs new VV files (changes / added / removed) - feature-request: portal users submit ideas + sample files Infrastructure: - LiteLLM gateway with per-app virtual keys + budgets - Langfuse observability - Geist font, shared theme, cross-subdomain back link + theme sync via cookie/URL - Caddy reverse proxy on *.klas.chat
21 KiB
Internal AI Portal — Project Handoff
Status: Architecture & vendor selection complete. Ready for build. Audience: Local coding agent + project owner. Goal: Build an internal AI landing page for company employees that combines guided "AI apps" (tile-based workflows) with a full-featured chat fallback, all routed through company API keys with per-user cost tracking and budget enforcement.
1. Product vision
A single internal URL employees go to. They see:
- A grid of tiles, each a guided "AI app" for a specific task (legal-notice check, contract summarizer, invoice extractor, translator, etc.). These are for colleagues who don't want to write prompts — fill a form, get an answer.
- One special tile labeled "Open Chat" that drops them into a full ChatGPT/Claude-equivalent chat: file uploads (Excel, PDF, images), code execution sandbox, multi-turn, switch between models mid-conversation.
- All API calls go through company-owned keys. Employees don't pay; the company does. In return, every call is logged, costed, and budget-capped per user.
Non-goals (for v1):
- Customer-facing product (internal only).
- Multi-tenant SaaS.
- Replacing existing tools that aren't AI-related.
2. Architecture overview
Composition of three FOSS components plus a thin custom landing page. No single FOSS project covers all requirements — combining them is the production-ready path.
┌────────────────────────────────────┐
│ Custom landing page (tiles UI) │
│ - Auth via company SSO (OIDC) │
│ - Renders tiles + "Open Chat" │
└────────────┬───────────────────────┘
│
┌──────────────────────┼──────────────────────┐
│ │ │
▼ ▼ ▼
┌────────────┐ ┌────────────┐ ┌──────────────┐
│ Dify │ │ Open WebUI │ │ Other tile │
│ (workflow │ │ (chat + │ │ apps (links │
│ apps) │ │ sandbox) │ │ or iframes)│
└─────┬──────┘ └─────┬──────┘ └──────┬───────┘
│ │ │
└──────────────────────┼───────────────────────┘
▼
┌──────────────────────────────┐
│ LiteLLM proxy gateway │
│ - Unified OpenAI-format │
│ - Per-user virtual keys │
│ - Budget enforcement │
│ - Cost & token logging │
└──────────────┬───────────────┘
│
▼
┌──────────────────────────────┐
│ Langfuse (observability) │
│ - Traces, dashboards │
│ - Per-user analytics │
│ (consumes LiteLLM logs) │
└──────────────┬───────────────┘
│
▼
OpenAI / Anthropic / Google / Azure APIs
Component responsibilities
| Component | Role | License |
|---|---|---|
| Landing page | Tile UI, SSO entry point, hands users off to specific tools | Custom (we own it) |
| Dify | Hosts the predefined "apps" — visual workflow builder, each app is a workflow | Apache 2.0 + extras (logo restriction; OK for internal single-tenant) |
| Open WebUI | The "Open Chat" experience — full ChatGPT-equivalent UI with file uploads & code interpreter | BSD-3-Clause |
| gVisor code execution add-on | Sandboxed Python/Bash for Open WebUI (Excel processing, charts, etc.) | Apache 2.0 |
| LiteLLM | Single API gateway for all providers, virtual keys per user, budget enforcement, cost tracking | MIT |
| Langfuse | Observability dashboards on top of LiteLLM | MIT (with self-host caveats — see §9) |
Why this split
- Dify alone has tile-based apps but its chat is weaker than dedicated chat UIs and its code interpreter doesn't match ChatGPT's Excel-processing experience.
- Open WebUI alone is chat-first; building 50 guided "apps" inside it means custom Functions/Pipelines code per app, which is harder to maintain than Dify's visual workflows.
- LibreChat was considered but rejected: its built-in code interpreter is a paid SaaS service, which fails the "as capable as official chat interfaces" requirement out of the box. Open WebUI's gVisor sandbox is fully free.
- LiteLLM is the only piece that gives us per-user budgets and cost tracking across both Dify and Open WebUI, since both can be configured to use it as their "OpenAI-compatible" provider.
3. User flows
Flow A — colleague who doesn't want to write prompts
- Opens portal → sees tile grid.
- Clicks "Legal Notice Check" tile → opens Dify-hosted app in iframe or new tab.
- Form: paste notice text, select jurisdiction, click Submit.
- Dify workflow runs → output displayed.
- All LLM calls inside the workflow went through LiteLLM → cost attributed to this user.
Flow B — colleague who wants a real chat
- Opens portal → clicks "Open Chat" tile.
- Lands in Open WebUI session (already authenticated via SSO).
- Uploads
q3-sales.xlsx, asks "summarize regional performance and draw a bar chart." - Model writes Python → gVisor sandbox executes → result + chart appear inline.
- User can switch model mid-conversation (Claude → GPT-5 → local).
- All calls flow through LiteLLM → costed to this user, budget enforced.
Flow C — admin
- Opens LiteLLM admin UI → sees per-user spend, sets/adjusts budgets.
- Opens Langfuse → sees traces, prompt analytics, error rates.
- Opens Dify admin → adds a new workflow → it appears as a new tile on the landing page (via config update).
4. Repository layout (proposed)
ai-portal/
├── README.md
├── docker-compose.yml # All services
├── .env.example
├── landing/ # Custom landing page (Next.js or similar)
│ ├── package.json
│ ├── src/
│ │ ├── pages/index.tsx # Tile grid
│ │ ├── lib/auth.ts # OIDC client
│ │ └── data/apps.json # Tile definitions
│ └── Dockerfile
├── litellm/
│ ├── config.yaml # Models, virtual keys, budgets
│ └── README.md
├── openwebui/
│ ├── env.example
│ ├── functions/ # Custom OWUI functions if needed
│ └── tools/
│ └── run_code.py # gVisor sandbox tool (from EtiennePerot/safe-code-execution)
├── dify/
│ ├── env.example
│ ├── workflows/ # Exported workflow DSL files (one per tile app)
│ │ ├── legal-notice-check.yml
│ │ └── ...
│ └── README.md
├── langfuse/
│ └── env.example
├── infra/
│ ├── traefik/ # Reverse proxy + TLS
│ │ └── traefik.yml
│ └── backup/
│ └── backup.sh
└── docs/
├── ARCHITECTURE.md # This document, eventually
├── ADDING_NEW_APP.md # Runbook for adding a tile
└── ONBOARDING.md # End-user docs
5. Build phases
Phase 0 — Local dev environment (Day 1)
Goal: all four services running locally via docker-compose, talking to each other.
- Bootstrap repo with the layout above.
- Write
docker-compose.ymlwith services:litellm,langfuse,openwebui,dify-api,dify-web,dify-worker,redis,postgres,traefik,landing. - Each service gets its own subnet +
.envfile. No secrets in git. - Verify: each service reachable at
https://<service>.localhostvia Traefik with self-signed certs.
Acceptance: docker compose up brings everything up clean. Each service's UI loads.
Phase 1 — LiteLLM gateway (Day 2)
Goal: all model calls go through LiteLLM. Cost tracking works.
- Configure
litellm/config.yamlwith at minimum: OpenAI, Anthropic, Google. Add Azure if used. - Set up a master key + per-user virtual keys (for now: 2 test users).
- Set test budget: $5/user/month, hard cap.
- Smoke test:
curla chat completion through LiteLLM → verify it appears in LiteLLM's spend log. - Verify budget enforcement: temporarily lower limit, blast 1000 tokens, confirm 429.
Acceptance: LiteLLM admin UI shows per-user spend in real time. Budget limits actually block.
LiteLLM docs: https://docs.litellm.ai/docs/proxy/quick_start
Phase 2 — Open WebUI as the chat tile (Days 3-4)
Goal: full chat UI with file upload + Excel processing.
- Configure Open WebUI to use LiteLLM as its OpenAI-compatible endpoint.
- Set up SSO (OIDC) so user identity flows through.
- Map OWUI users → LiteLLM virtual keys (so per-user budgets work). Two options:
- Single LiteLLM key, OWUI passes user ID as
X-LiteLLM-User-Idheader — requires LiteLLM tag-based tracking. - Per-user LiteLLM keys, OWUI feature for per-user provider keys. Investigate which is cleaner.
- Single LiteLLM key, OWUI passes user ID as
- Install the gVisor code execution function and tool from
EtiennePerot/safe-code-execution. Configure the OWUI container with sandboxing prerequisites (privileged or specific capabilities — see that repo's setup docs). - Test: upload an XLSX, ask the model to analyze and chart it. Verify code runs in sandbox, chart appears inline, no code escapes the sandbox.
Acceptance: A non-technical user can upload a file and ask "what's in this spreadsheet" and get a useful answer with a chart. Cost shows up in LiteLLM tied to that user.
References:
- Open WebUI docs: https://docs.openwebui.com/
- Code exec add-on: https://github.com/EtiennePerot/safe-code-execution
- Pyodide alternative (no Docker privilege needed but limited libs): https://docs.openwebui.com/features/chat-conversations/chat-features/code-execution/python/
Phase 3 — Dify for workflow apps (Days 5-7)
Goal: at least 3 working "apps" hosted in Dify and reachable via shareable URLs.
- Configure Dify's model providers to point at LiteLLM (NOT directly at OpenAI/Anthropic). This is critical — otherwise Dify calls bypass cost tracking.
- Create the first 3 workflows as proof of concept:
- Legal Notice Check (input: text + jurisdiction → output: risk summary)
- Document Summarizer (input: file → output: bullet summary)
- Email Drafter (input: bullets → output: polished email)
- For each workflow, enable the "share as web app" feature, get a URL.
- Decide tile-to-workflow URL mapping format: store in
landing/src/data/apps.json.
Acceptance: Each app works end-to-end via its share URL. Each invocation shows up in LiteLLM spend log attributed to a user (this requires Dify to forward user identity — research how, may need a custom API gateway proxy in front of Dify share URLs).
References:
- Dify self-hosting: https://docs.dify.ai/getting-started/install-self-hosted/docker-compose
- Dify license: https://github.com/langgenius/dify/blob/main/LICENSE (note logo restriction)
Phase 4 — Landing page (Days 8-10)
Goal: the front door.
- Next.js (or Astro, SvelteKit — pick what the team knows) app with:
- OIDC login.
- Tile grid, data-driven from
apps.json. - Each tile: icon, title, 1-line description, click-through to either a Dify share URL or
/chat(Open WebUI). - Search/filter for when there are 50 tiles.
- Make
apps.jsoneditable without redeploy: read it from a mounted volume or fetch from Dify's app list API on render. - Branding: company logo, color scheme.
- Add a simple "Costs" link visible to admins only → embeds Langfuse or LiteLLM dashboard.
Acceptance: A new employee with SSO access lands on the page, sees tiles, can click into a workflow or open chat without any extra login prompt.
Phase 5 — Observability & ops (Days 11-12)
- Stand up Langfuse, configure LiteLLM to ship traces to it.
- Build a couple of saved dashboards: total spend per day, top 10 users, top 10 apps.
- Set up alerts: Slack/email when monthly spend hits 80% of cap.
- Backup script for: Postgres (Dify, Langfuse, Open WebUI), Redis snapshots, Dify workflow exports.
- Restore drill: spin up fresh stack from backups in <30 min.
Phase 6 — Hardening & rollout (Days 13-15)
- Move from self-signed to real TLS (Let's Encrypt via Traefik).
- Lock down: only company SSO group X can authenticate.
- Audit log review: confirm every LLM call has user attribution.
- Doc:
ONBOARDING.mdfor end users,ADDING_NEW_APP.mdfor the team that maintains workflows. - Pilot with 5-10 users for a week. Iterate. Then announce internally.
6. Critical decisions to make before coding
These should be answered before Phase 0:
- Where will this run? On-prem, AWS, Azure, GCP? Affects networking, secrets management, and TLS.
- Which SSO? Microsoft Entra ID (Azure AD), Google Workspace, Okta, Keycloak? Open WebUI, Dify, and Langfuse all support OIDC; landing page will too.
- What's the realistic budget for v1? Affects: how many models we expose (GPT-5 + Claude Opus = expensive), default per-user budget, alert thresholds.
- Who owns this long-term? The team adding new workflows in Dify is different from the team maintaining the platform. Make this explicit.
- Data sensitivity? If employees may paste PII or confidential data, need to: (a) prefer Anthropic/OpenAI Zero Data Retention agreements, (b) consider Azure OpenAI for stronger contractual posture, (c) document an acceptable use policy.
- Logo customization for Dify. Internal-only single-tenant deployments are fine keeping the Dify logo. If branding is required, pricing requires contacting business@dify.ai directly. Recommend: keep logo for v1, revisit if leadership pushes back.
7. Risks & mitigations
| Risk | Mitigation |
|---|---|
| Cost runs away on day 1 | LiteLLM hard budgets per user enforced from the start, low default ($10-20/user/month) |
| Sandbox escape in code interpreter | gVisor is what ChatGPT uses; combined with container isolation, lowest-feasible risk for this use case. No internet egress from sandbox by default. |
| Dify license violation | Stay single-tenant, keep logo (or pay for license). Document this in LICENSE_NOTES.md |
| Sensitive data leakage to providers | Configure providers with ZDR / no-training. Add a banner on the landing page reminding users not to paste secrets. Optionally: a content filter at the LiteLLM layer (regex for credit cards, secrets) that strips/blocks. |
| Vendor lock-in to Dify | Workflows are exportable as YAML. Periodic export commit to git keeps them portable. |
| User identity drift across services | Single OIDC issuer, all services configured to use it, scripted "sync user → LiteLLM virtual key" on first login. |
| Open WebUI gVisor sandbox needs privileged Docker | If host policy blocks privileged containers, fall back to Pyodide (browser-based, no privilege needed, narrower library set). Document the trade-off. |
8. Acceptance criteria for v1
The v1 ships when all of these are true:
- An employee with SSO can reach the portal at
https://ai.<company>.<tld>and see tiles. - At least 5 working "apps" as tiles, plus the chat tile.
- Chat supports: Excel/CSV/PDF upload, code execution, image input, model switching mid-conversation.
- Every LLM call appears in LiteLLM logs with the calling user's identity.
- Per-user monthly budget enforces (verified by integration test).
- Admin can see a per-user, per-app spend dashboard.
- Adding a new "app" tile takes <30 minutes (build workflow in Dify, add row to
apps.json). - Documented disaster recovery: backups + restore drill executed once.
- Acceptable use policy linked from the landing page.
9. Known unknowns / research items
These need a spike before committing:
- User identity propagation Dify → LiteLLM. Dify's "share as web app" mode may not pass end-user identity to its LLM calls. May need to either: (a) put a small reverse proxy in front of Dify share URLs that injects a header, or (b) provision per-user Dify accounts and per-user provider keys in Dify (expensive ops cost). Spike this in Phase 3.
- Open WebUI per-user provider keys vs tag-based attribution. Both work; pick based on which has cleaner UX for budget reporting.
- Langfuse self-hosting license. Langfuse self-hosted has tiers; the FOSS core covers traces but some advanced features are commercial. Confirm the FOSS-core features are sufficient before depending on it. If not, fallback is LiteLLM's built-in dashboard (less pretty but free and sufficient for v1).
- Where do uploaded files live? Open WebUI stores them; depending on data classification, may need to mount an encrypted volume or shorten retention.
- Rate limits per provider. With many users behind one company key, OpenAI/Anthropic org-level RPM/TPM limits may bite. LiteLLM supports load balancing across multiple keys — plan for this if rollout is wide.
10. Quick start for the local agent
When you sit down with the local agent, suggested first prompt:
Read
AI_PORTAL_HANDOFF.mdin this repo. We're building Phase 0 today: the docker-compose skeleton. Set up the directory structure from Section 4, create adocker-compose.ymlthat starts LiteLLM, Open WebUI, Dify (api + web + worker), Postgres, Redis, Langfuse, Traefik, and a placeholder landing page service. Use environment variables for all secrets, with a.env.examplechecked in. Don't configure model providers yet — just get all containers healthy and reachable through Traefik with self-signed TLS at*.localhost. Stop and ask before proceeding to Phase 1.
After Phase 0 works, hand the agent Phase 1, 2, etc. one at a time. Don't let it run more than one phase ahead — these are integration-heavy and a bad assumption early on will cascade.
Appendix A — Reference links
- Dify: https://github.com/langgenius/dify
- Open WebUI: https://github.com/open-webui/open-webui
- Open WebUI code exec docs: https://docs.openwebui.com/features/chat-conversations/chat-features/code-execution/
- gVisor sandbox add-on: https://github.com/EtiennePerot/safe-code-execution
- LiteLLM: https://github.com/BerriAI/litellm
- LiteLLM proxy docs: https://docs.litellm.ai/docs/proxy/quick_start
- Langfuse: https://github.com/langfuse/langfuse
Appendix B — Why not LibreChat
Considered. Rejected because LibreChat's built-in Code Interpreter is a paid SaaS service rather than a self-hostable component. The "as capable as official chat interfaces" requirement specifically calls for in-chat code execution and file processing; making that a paid dependency contradicts the "free and open source" goal. Open WebUI's gVisor add-on covers the same ground and is fully Apache-2.0.
Appendix C — Why not building from scratch
Considered. Rejected for v1 because:
- A production-quality multi-provider chat UI with file uploads, model switching, conversation history, and a sandboxed code interpreter is roughly a year of focused engineering. Open WebUI gives us this for free.
- A workflow builder that non-engineers can use to add new "apps" is similarly large in scope. Dify gives us this for free.
- Custom code is justified only where no FOSS option exists: the landing page (small), the auth glue, and the per-user billing reconciliation.
If after running v1 for a few months we find the FOSS pieces don't fit, we can replace one component at a time without throwing out the whole stack — that's the benefit of the gateway architecture.