Files
AI_portal/AI_PORTAL_HANDOFF.md
Ondřej Glaser 48cef99257 Initial portal commit: landing + 9 AI-powered apps
Apps:
- dwg-rooms: extract room numbers from DWG/DXF
- dwg-counting: count symbols in PDF drawings (OpenCV template matching)
- contract-check: review PDF contracts against a checklist (Claude vision + Tesseract OCR fallback)
- email-drafter: bullet notes → polished Czech/English business emails
- invoice-extractor: PDF/image invoice → structured data → Excel
- translator: Czech-first translator across 19 languages with tone control
- vv-check: find inconsistent unit prices across VV sheets in one workbook
- vv-compare: diff original vs new VV files (changes / added / removed)
- feature-request: portal users submit ideas + sample files

Infrastructure:
- LiteLLM gateway with per-app virtual keys + budgets
- Langfuse observability
- Geist font, shared theme, cross-subdomain back link + theme sync via cookie/URL
- Caddy reverse proxy on *.klas.chat
2026-05-13 15:25:04 +02:00

21 KiB

Internal AI Portal — Project Handoff

Status: Architecture & vendor selection complete. Ready for build. Audience: Local coding agent + project owner. Goal: Build an internal AI landing page for company employees that combines guided "AI apps" (tile-based workflows) with a full-featured chat fallback, all routed through company API keys with per-user cost tracking and budget enforcement.


1. Product vision

A single internal URL employees go to. They see:

  • A grid of tiles, each a guided "AI app" for a specific task (legal-notice check, contract summarizer, invoice extractor, translator, etc.). These are for colleagues who don't want to write prompts — fill a form, get an answer.
  • One special tile labeled "Open Chat" that drops them into a full ChatGPT/Claude-equivalent chat: file uploads (Excel, PDF, images), code execution sandbox, multi-turn, switch between models mid-conversation.
  • All API calls go through company-owned keys. Employees don't pay; the company does. In return, every call is logged, costed, and budget-capped per user.

Non-goals (for v1):

  • Customer-facing product (internal only).
  • Multi-tenant SaaS.
  • Replacing existing tools that aren't AI-related.

2. Architecture overview

Composition of three FOSS components plus a thin custom landing page. No single FOSS project covers all requirements — combining them is the production-ready path.

                        ┌────────────────────────────────────┐
                        │   Custom landing page (tiles UI)   │
                        │   - Auth via company SSO (OIDC)    │
                        │   - Renders tiles + "Open Chat"    │
                        └────────────┬───────────────────────┘
                                     │
              ┌──────────────────────┼──────────────────────┐
              │                      │                      │
              ▼                      ▼                      ▼
      ┌────────────┐         ┌────────────┐         ┌──────────────┐
      │   Dify     │         │ Open WebUI │         │  Other tile  │
      │ (workflow  │         │  (chat +   │         │  apps (links │
      │   apps)    │         │   sandbox) │         │   or iframes)│
      └─────┬──────┘         └─────┬──────┘         └──────┬───────┘
            │                      │                       │
            └──────────────────────┼───────────────────────┘
                                   ▼
                     ┌──────────────────────────────┐
                     │   LiteLLM proxy gateway      │
                     │   - Unified OpenAI-format    │
                     │   - Per-user virtual keys    │
                     │   - Budget enforcement       │
                     │   - Cost & token logging     │
                     └──────────────┬───────────────┘
                                    │
                                    ▼
                     ┌──────────────────────────────┐
                     │  Langfuse (observability)    │
                     │  - Traces, dashboards        │
                     │  - Per-user analytics        │
                     │  (consumes LiteLLM logs)     │
                     └──────────────┬───────────────┘
                                    │
                                    ▼
                     OpenAI / Anthropic / Google / Azure APIs

Component responsibilities

Component Role License
Landing page Tile UI, SSO entry point, hands users off to specific tools Custom (we own it)
Dify Hosts the predefined "apps" — visual workflow builder, each app is a workflow Apache 2.0 + extras (logo restriction; OK for internal single-tenant)
Open WebUI The "Open Chat" experience — full ChatGPT-equivalent UI with file uploads & code interpreter BSD-3-Clause
gVisor code execution add-on Sandboxed Python/Bash for Open WebUI (Excel processing, charts, etc.) Apache 2.0
LiteLLM Single API gateway for all providers, virtual keys per user, budget enforcement, cost tracking MIT
Langfuse Observability dashboards on top of LiteLLM MIT (with self-host caveats — see §9)

Why this split

  • Dify alone has tile-based apps but its chat is weaker than dedicated chat UIs and its code interpreter doesn't match ChatGPT's Excel-processing experience.
  • Open WebUI alone is chat-first; building 50 guided "apps" inside it means custom Functions/Pipelines code per app, which is harder to maintain than Dify's visual workflows.
  • LibreChat was considered but rejected: its built-in code interpreter is a paid SaaS service, which fails the "as capable as official chat interfaces" requirement out of the box. Open WebUI's gVisor sandbox is fully free.
  • LiteLLM is the only piece that gives us per-user budgets and cost tracking across both Dify and Open WebUI, since both can be configured to use it as their "OpenAI-compatible" provider.

3. User flows

Flow A — colleague who doesn't want to write prompts

  1. Opens portal → sees tile grid.
  2. Clicks "Legal Notice Check" tile → opens Dify-hosted app in iframe or new tab.
  3. Form: paste notice text, select jurisdiction, click Submit.
  4. Dify workflow runs → output displayed.
  5. All LLM calls inside the workflow went through LiteLLM → cost attributed to this user.

Flow B — colleague who wants a real chat

  1. Opens portal → clicks "Open Chat" tile.
  2. Lands in Open WebUI session (already authenticated via SSO).
  3. Uploads q3-sales.xlsx, asks "summarize regional performance and draw a bar chart."
  4. Model writes Python → gVisor sandbox executes → result + chart appear inline.
  5. User can switch model mid-conversation (Claude → GPT-5 → local).
  6. All calls flow through LiteLLM → costed to this user, budget enforced.

Flow C — admin

  1. Opens LiteLLM admin UI → sees per-user spend, sets/adjusts budgets.
  2. Opens Langfuse → sees traces, prompt analytics, error rates.
  3. Opens Dify admin → adds a new workflow → it appears as a new tile on the landing page (via config update).

4. Repository layout (proposed)

ai-portal/
├── README.md
├── docker-compose.yml              # All services
├── .env.example
├── landing/                        # Custom landing page (Next.js or similar)
│   ├── package.json
│   ├── src/
│   │   ├── pages/index.tsx         # Tile grid
│   │   ├── lib/auth.ts             # OIDC client
│   │   └── data/apps.json          # Tile definitions
│   └── Dockerfile
├── litellm/
│   ├── config.yaml                 # Models, virtual keys, budgets
│   └── README.md
├── openwebui/
│   ├── env.example
│   ├── functions/                  # Custom OWUI functions if needed
│   └── tools/
│       └── run_code.py             # gVisor sandbox tool (from EtiennePerot/safe-code-execution)
├── dify/
│   ├── env.example
│   ├── workflows/                  # Exported workflow DSL files (one per tile app)
│   │   ├── legal-notice-check.yml
│   │   └── ...
│   └── README.md
├── langfuse/
│   └── env.example
├── infra/
│   ├── traefik/                    # Reverse proxy + TLS
│   │   └── traefik.yml
│   └── backup/
│       └── backup.sh
└── docs/
    ├── ARCHITECTURE.md             # This document, eventually
    ├── ADDING_NEW_APP.md           # Runbook for adding a tile
    └── ONBOARDING.md               # End-user docs

5. Build phases

Phase 0 — Local dev environment (Day 1)

Goal: all four services running locally via docker-compose, talking to each other.

  1. Bootstrap repo with the layout above.
  2. Write docker-compose.yml with services: litellm, langfuse, openwebui, dify-api, dify-web, dify-worker, redis, postgres, traefik, landing.
  3. Each service gets its own subnet + .env file. No secrets in git.
  4. Verify: each service reachable at https://<service>.localhost via Traefik with self-signed certs.

Acceptance: docker compose up brings everything up clean. Each service's UI loads.

Phase 1 — LiteLLM gateway (Day 2)

Goal: all model calls go through LiteLLM. Cost tracking works.

  1. Configure litellm/config.yaml with at minimum: OpenAI, Anthropic, Google. Add Azure if used.
  2. Set up a master key + per-user virtual keys (for now: 2 test users).
  3. Set test budget: $5/user/month, hard cap.
  4. Smoke test: curl a chat completion through LiteLLM → verify it appears in LiteLLM's spend log.
  5. Verify budget enforcement: temporarily lower limit, blast 1000 tokens, confirm 429.

Acceptance: LiteLLM admin UI shows per-user spend in real time. Budget limits actually block.

LiteLLM docs: https://docs.litellm.ai/docs/proxy/quick_start

Phase 2 — Open WebUI as the chat tile (Days 3-4)

Goal: full chat UI with file upload + Excel processing.

  1. Configure Open WebUI to use LiteLLM as its OpenAI-compatible endpoint.
  2. Set up SSO (OIDC) so user identity flows through.
  3. Map OWUI users → LiteLLM virtual keys (so per-user budgets work). Two options:
    • Single LiteLLM key, OWUI passes user ID as X-LiteLLM-User-Id header — requires LiteLLM tag-based tracking.
    • Per-user LiteLLM keys, OWUI feature for per-user provider keys. Investigate which is cleaner.
  4. Install the gVisor code execution function and tool from EtiennePerot/safe-code-execution. Configure the OWUI container with sandboxing prerequisites (privileged or specific capabilities — see that repo's setup docs).
  5. Test: upload an XLSX, ask the model to analyze and chart it. Verify code runs in sandbox, chart appears inline, no code escapes the sandbox.

Acceptance: A non-technical user can upload a file and ask "what's in this spreadsheet" and get a useful answer with a chart. Cost shows up in LiteLLM tied to that user.

References:

Phase 3 — Dify for workflow apps (Days 5-7)

Goal: at least 3 working "apps" hosted in Dify and reachable via shareable URLs.

  1. Configure Dify's model providers to point at LiteLLM (NOT directly at OpenAI/Anthropic). This is critical — otherwise Dify calls bypass cost tracking.
  2. Create the first 3 workflows as proof of concept:
    • Legal Notice Check (input: text + jurisdiction → output: risk summary)
    • Document Summarizer (input: file → output: bullet summary)
    • Email Drafter (input: bullets → output: polished email)
  3. For each workflow, enable the "share as web app" feature, get a URL.
  4. Decide tile-to-workflow URL mapping format: store in landing/src/data/apps.json.

Acceptance: Each app works end-to-end via its share URL. Each invocation shows up in LiteLLM spend log attributed to a user (this requires Dify to forward user identity — research how, may need a custom API gateway proxy in front of Dify share URLs).

References:

Phase 4 — Landing page (Days 8-10)

Goal: the front door.

  1. Next.js (or Astro, SvelteKit — pick what the team knows) app with:
    • OIDC login.
    • Tile grid, data-driven from apps.json.
    • Each tile: icon, title, 1-line description, click-through to either a Dify share URL or /chat (Open WebUI).
    • Search/filter for when there are 50 tiles.
  2. Make apps.json editable without redeploy: read it from a mounted volume or fetch from Dify's app list API on render.
  3. Branding: company logo, color scheme.
  4. Add a simple "Costs" link visible to admins only → embeds Langfuse or LiteLLM dashboard.

Acceptance: A new employee with SSO access lands on the page, sees tiles, can click into a workflow or open chat without any extra login prompt.

Phase 5 — Observability & ops (Days 11-12)

  1. Stand up Langfuse, configure LiteLLM to ship traces to it.
  2. Build a couple of saved dashboards: total spend per day, top 10 users, top 10 apps.
  3. Set up alerts: Slack/email when monthly spend hits 80% of cap.
  4. Backup script for: Postgres (Dify, Langfuse, Open WebUI), Redis snapshots, Dify workflow exports.
  5. Restore drill: spin up fresh stack from backups in <30 min.

Phase 6 — Hardening & rollout (Days 13-15)

  1. Move from self-signed to real TLS (Let's Encrypt via Traefik).
  2. Lock down: only company SSO group X can authenticate.
  3. Audit log review: confirm every LLM call has user attribution.
  4. Doc: ONBOARDING.md for end users, ADDING_NEW_APP.md for the team that maintains workflows.
  5. Pilot with 5-10 users for a week. Iterate. Then announce internally.

6. Critical decisions to make before coding

These should be answered before Phase 0:

  1. Where will this run? On-prem, AWS, Azure, GCP? Affects networking, secrets management, and TLS.
  2. Which SSO? Microsoft Entra ID (Azure AD), Google Workspace, Okta, Keycloak? Open WebUI, Dify, and Langfuse all support OIDC; landing page will too.
  3. What's the realistic budget for v1? Affects: how many models we expose (GPT-5 + Claude Opus = expensive), default per-user budget, alert thresholds.
  4. Who owns this long-term? The team adding new workflows in Dify is different from the team maintaining the platform. Make this explicit.
  5. Data sensitivity? If employees may paste PII or confidential data, need to: (a) prefer Anthropic/OpenAI Zero Data Retention agreements, (b) consider Azure OpenAI for stronger contractual posture, (c) document an acceptable use policy.
  6. Logo customization for Dify. Internal-only single-tenant deployments are fine keeping the Dify logo. If branding is required, pricing requires contacting business@dify.ai directly. Recommend: keep logo for v1, revisit if leadership pushes back.

7. Risks & mitigations

Risk Mitigation
Cost runs away on day 1 LiteLLM hard budgets per user enforced from the start, low default ($10-20/user/month)
Sandbox escape in code interpreter gVisor is what ChatGPT uses; combined with container isolation, lowest-feasible risk for this use case. No internet egress from sandbox by default.
Dify license violation Stay single-tenant, keep logo (or pay for license). Document this in LICENSE_NOTES.md
Sensitive data leakage to providers Configure providers with ZDR / no-training. Add a banner on the landing page reminding users not to paste secrets. Optionally: a content filter at the LiteLLM layer (regex for credit cards, secrets) that strips/blocks.
Vendor lock-in to Dify Workflows are exportable as YAML. Periodic export commit to git keeps them portable.
User identity drift across services Single OIDC issuer, all services configured to use it, scripted "sync user → LiteLLM virtual key" on first login.
Open WebUI gVisor sandbox needs privileged Docker If host policy blocks privileged containers, fall back to Pyodide (browser-based, no privilege needed, narrower library set). Document the trade-off.

8. Acceptance criteria for v1

The v1 ships when all of these are true:

  • An employee with SSO can reach the portal at https://ai.<company>.<tld> and see tiles.
  • At least 5 working "apps" as tiles, plus the chat tile.
  • Chat supports: Excel/CSV/PDF upload, code execution, image input, model switching mid-conversation.
  • Every LLM call appears in LiteLLM logs with the calling user's identity.
  • Per-user monthly budget enforces (verified by integration test).
  • Admin can see a per-user, per-app spend dashboard.
  • Adding a new "app" tile takes <30 minutes (build workflow in Dify, add row to apps.json).
  • Documented disaster recovery: backups + restore drill executed once.
  • Acceptable use policy linked from the landing page.

9. Known unknowns / research items

These need a spike before committing:

  1. User identity propagation Dify → LiteLLM. Dify's "share as web app" mode may not pass end-user identity to its LLM calls. May need to either: (a) put a small reverse proxy in front of Dify share URLs that injects a header, or (b) provision per-user Dify accounts and per-user provider keys in Dify (expensive ops cost). Spike this in Phase 3.
  2. Open WebUI per-user provider keys vs tag-based attribution. Both work; pick based on which has cleaner UX for budget reporting.
  3. Langfuse self-hosting license. Langfuse self-hosted has tiers; the FOSS core covers traces but some advanced features are commercial. Confirm the FOSS-core features are sufficient before depending on it. If not, fallback is LiteLLM's built-in dashboard (less pretty but free and sufficient for v1).
  4. Where do uploaded files live? Open WebUI stores them; depending on data classification, may need to mount an encrypted volume or shorten retention.
  5. Rate limits per provider. With many users behind one company key, OpenAI/Anthropic org-level RPM/TPM limits may bite. LiteLLM supports load balancing across multiple keys — plan for this if rollout is wide.

10. Quick start for the local agent

When you sit down with the local agent, suggested first prompt:

Read AI_PORTAL_HANDOFF.md in this repo. We're building Phase 0 today: the docker-compose skeleton. Set up the directory structure from Section 4, create a docker-compose.yml that starts LiteLLM, Open WebUI, Dify (api + web + worker), Postgres, Redis, Langfuse, Traefik, and a placeholder landing page service. Use environment variables for all secrets, with a .env.example checked in. Don't configure model providers yet — just get all containers healthy and reachable through Traefik with self-signed TLS at *.localhost. Stop and ask before proceeding to Phase 1.

After Phase 0 works, hand the agent Phase 1, 2, etc. one at a time. Don't let it run more than one phase ahead — these are integration-heavy and a bad assumption early on will cascade.


Appendix B — Why not LibreChat

Considered. Rejected because LibreChat's built-in Code Interpreter is a paid SaaS service rather than a self-hostable component. The "as capable as official chat interfaces" requirement specifically calls for in-chat code execution and file processing; making that a paid dependency contradicts the "free and open source" goal. Open WebUI's gVisor add-on covers the same ground and is fully Apache-2.0.

Appendix C — Why not building from scratch

Considered. Rejected for v1 because:

  • A production-quality multi-provider chat UI with file uploads, model switching, conversation history, and a sandboxed code interpreter is roughly a year of focused engineering. Open WebUI gives us this for free.
  • A workflow builder that non-engineers can use to add new "apps" is similarly large in scope. Dify gives us this for free.
  • Custom code is justified only where no FOSS option exists: the landing page (small), the auth glue, and the per-user billing reconciliation.

If after running v1 for a few months we find the FOSS pieces don't fit, we can replace one component at a time without throwing out the whole stack — that's the benefit of the gateway architecture.