# Internal AI Portal — Project Handoff **Status:** Architecture & vendor selection complete. Ready for build. **Audience:** Local coding agent + project owner. **Goal:** Build an internal AI landing page for company employees that combines guided "AI apps" (tile-based workflows) with a full-featured chat fallback, all routed through company API keys with per-user cost tracking and budget enforcement. --- ## 1. Product vision A single internal URL employees go to. They see: - **A grid of tiles**, each a guided "AI app" for a specific task (legal-notice check, contract summarizer, invoice extractor, translator, etc.). These are for colleagues who don't want to write prompts — fill a form, get an answer. - **One special tile labeled "Open Chat"** that drops them into a full ChatGPT/Claude-equivalent chat: file uploads (Excel, PDF, images), code execution sandbox, multi-turn, switch between models mid-conversation. - All API calls go through company-owned keys. Employees don't pay; the company does. In return, every call is logged, costed, and budget-capped per user. **Non-goals (for v1):** - Customer-facing product (internal only). - Multi-tenant SaaS. - Replacing existing tools that aren't AI-related. --- ## 2. Architecture overview Composition of three FOSS components plus a thin custom landing page. No single FOSS project covers all requirements — combining them is the production-ready path. ``` ┌────────────────────────────────────┐ │ Custom landing page (tiles UI) │ │ - Auth via company SSO (OIDC) │ │ - Renders tiles + "Open Chat" │ └────────────┬───────────────────────┘ │ ┌──────────────────────┼──────────────────────┐ │ │ │ ▼ ▼ ▼ ┌────────────┐ ┌────────────┐ ┌──────────────┐ │ Dify │ │ Open WebUI │ │ Other tile │ │ (workflow │ │ (chat + │ │ apps (links │ │ apps) │ │ sandbox) │ │ or iframes)│ └─────┬──────┘ └─────┬──────┘ └──────┬───────┘ │ │ │ └──────────────────────┼───────────────────────┘ ▼ ┌──────────────────────────────┐ │ LiteLLM proxy gateway │ │ - Unified OpenAI-format │ │ - Per-user virtual keys │ │ - Budget enforcement │ │ - Cost & token logging │ └──────────────┬───────────────┘ │ ▼ ┌──────────────────────────────┐ │ Langfuse (observability) │ │ - Traces, dashboards │ │ - Per-user analytics │ │ (consumes LiteLLM logs) │ └──────────────┬───────────────┘ │ ▼ OpenAI / Anthropic / Google / Azure APIs ``` ### Component responsibilities | Component | Role | License | |-----------|------|---------| | **Landing page** | Tile UI, SSO entry point, hands users off to specific tools | Custom (we own it) | | **Dify** | Hosts the predefined "apps" — visual workflow builder, each app is a workflow | Apache 2.0 + extras (logo restriction; OK for internal single-tenant) | | **Open WebUI** | The "Open Chat" experience — full ChatGPT-equivalent UI with file uploads & code interpreter | BSD-3-Clause | | **gVisor code execution add-on** | Sandboxed Python/Bash for Open WebUI (Excel processing, charts, etc.) | Apache 2.0 | | **LiteLLM** | Single API gateway for all providers, virtual keys per user, budget enforcement, cost tracking | MIT | | **Langfuse** | Observability dashboards on top of LiteLLM | MIT (with self-host caveats — see §9) | ### Why this split - **Dify alone** has tile-based apps but its chat is weaker than dedicated chat UIs and its code interpreter doesn't match ChatGPT's Excel-processing experience. - **Open WebUI alone** is chat-first; building 50 guided "apps" inside it means custom Functions/Pipelines code per app, which is harder to maintain than Dify's visual workflows. - **LibreChat** was considered but rejected: its built-in code interpreter is a paid SaaS service, which fails the "as capable as official chat interfaces" requirement out of the box. Open WebUI's gVisor sandbox is fully free. - **LiteLLM** is the only piece that gives us per-user budgets and cost tracking across both Dify and Open WebUI, since both can be configured to use it as their "OpenAI-compatible" provider. --- ## 3. User flows ### Flow A — colleague who doesn't want to write prompts 1. Opens portal → sees tile grid. 2. Clicks "Legal Notice Check" tile → opens Dify-hosted app in iframe or new tab. 3. Form: paste notice text, select jurisdiction, click Submit. 4. Dify workflow runs → output displayed. 5. All LLM calls inside the workflow went through LiteLLM → cost attributed to this user. ### Flow B — colleague who wants a real chat 1. Opens portal → clicks "Open Chat" tile. 2. Lands in Open WebUI session (already authenticated via SSO). 3. Uploads `q3-sales.xlsx`, asks "summarize regional performance and draw a bar chart." 4. Model writes Python → gVisor sandbox executes → result + chart appear inline. 5. User can switch model mid-conversation (Claude → GPT-5 → local). 6. All calls flow through LiteLLM → costed to this user, budget enforced. ### Flow C — admin 1. Opens LiteLLM admin UI → sees per-user spend, sets/adjusts budgets. 2. Opens Langfuse → sees traces, prompt analytics, error rates. 3. Opens Dify admin → adds a new workflow → it appears as a new tile on the landing page (via config update). --- ## 4. Repository layout (proposed) ``` ai-portal/ ├── README.md ├── docker-compose.yml # All services ├── .env.example ├── landing/ # Custom landing page (Next.js or similar) │ ├── package.json │ ├── src/ │ │ ├── pages/index.tsx # Tile grid │ │ ├── lib/auth.ts # OIDC client │ │ └── data/apps.json # Tile definitions │ └── Dockerfile ├── litellm/ │ ├── config.yaml # Models, virtual keys, budgets │ └── README.md ├── openwebui/ │ ├── env.example │ ├── functions/ # Custom OWUI functions if needed │ └── tools/ │ └── run_code.py # gVisor sandbox tool (from EtiennePerot/safe-code-execution) ├── dify/ │ ├── env.example │ ├── workflows/ # Exported workflow DSL files (one per tile app) │ │ ├── legal-notice-check.yml │ │ └── ... │ └── README.md ├── langfuse/ │ └── env.example ├── infra/ │ ├── traefik/ # Reverse proxy + TLS │ │ └── traefik.yml │ └── backup/ │ └── backup.sh └── docs/ ├── ARCHITECTURE.md # This document, eventually ├── ADDING_NEW_APP.md # Runbook for adding a tile └── ONBOARDING.md # End-user docs ``` --- ## 5. Build phases ### Phase 0 — Local dev environment (Day 1) Goal: all four services running locally via docker-compose, talking to each other. 1. Bootstrap repo with the layout above. 2. Write `docker-compose.yml` with services: `litellm`, `langfuse`, `openwebui`, `dify-api`, `dify-web`, `dify-worker`, `redis`, `postgres`, `traefik`, `landing`. 3. Each service gets its own subnet + `.env` file. No secrets in git. 4. Verify: each service reachable at `https://.localhost` via Traefik with self-signed certs. **Acceptance:** `docker compose up` brings everything up clean. Each service's UI loads. ### Phase 1 — LiteLLM gateway (Day 2) Goal: all model calls go through LiteLLM. Cost tracking works. 1. Configure `litellm/config.yaml` with at minimum: OpenAI, Anthropic, Google. Add Azure if used. 2. Set up a master key + per-user virtual keys (for now: 2 test users). 3. Set test budget: $5/user/month, hard cap. 4. Smoke test: `curl` a chat completion through LiteLLM → verify it appears in LiteLLM's spend log. 5. Verify budget enforcement: temporarily lower limit, blast 1000 tokens, confirm 429. **Acceptance:** LiteLLM admin UI shows per-user spend in real time. Budget limits actually block. LiteLLM docs: https://docs.litellm.ai/docs/proxy/quick_start ### Phase 2 — Open WebUI as the chat tile (Days 3-4) Goal: full chat UI with file upload + Excel processing. 1. Configure Open WebUI to use LiteLLM as its OpenAI-compatible endpoint. 2. Set up SSO (OIDC) so user identity flows through. 3. Map OWUI users → LiteLLM virtual keys (so per-user budgets work). Two options: - Single LiteLLM key, OWUI passes user ID as `X-LiteLLM-User-Id` header — requires LiteLLM tag-based tracking. - Per-user LiteLLM keys, OWUI feature for per-user provider keys. Investigate which is cleaner. 4. Install the gVisor code execution function and tool from `EtiennePerot/safe-code-execution`. Configure the OWUI container with sandboxing prerequisites (privileged or specific capabilities — see that repo's setup docs). 5. Test: upload an XLSX, ask the model to analyze and chart it. Verify code runs in sandbox, chart appears inline, no code escapes the sandbox. **Acceptance:** A non-technical user can upload a file and ask "what's in this spreadsheet" and get a useful answer with a chart. Cost shows up in LiteLLM tied to that user. References: - Open WebUI docs: https://docs.openwebui.com/ - Code exec add-on: https://github.com/EtiennePerot/safe-code-execution - Pyodide alternative (no Docker privilege needed but limited libs): https://docs.openwebui.com/features/chat-conversations/chat-features/code-execution/python/ ### Phase 3 — Dify for workflow apps (Days 5-7) Goal: at least 3 working "apps" hosted in Dify and reachable via shareable URLs. 1. Configure Dify's model providers to point at LiteLLM (NOT directly at OpenAI/Anthropic). This is critical — otherwise Dify calls bypass cost tracking. 2. Create the first 3 workflows as proof of concept: - **Legal Notice Check** (input: text + jurisdiction → output: risk summary) - **Document Summarizer** (input: file → output: bullet summary) - **Email Drafter** (input: bullets → output: polished email) 3. For each workflow, enable the "share as web app" feature, get a URL. 4. Decide tile-to-workflow URL mapping format: store in `landing/src/data/apps.json`. **Acceptance:** Each app works end-to-end via its share URL. Each invocation shows up in LiteLLM spend log attributed to a user (this requires Dify to forward user identity — research how, may need a custom API gateway proxy in front of Dify share URLs). References: - Dify self-hosting: https://docs.dify.ai/getting-started/install-self-hosted/docker-compose - Dify license: https://github.com/langgenius/dify/blob/main/LICENSE (note logo restriction) ### Phase 4 — Landing page (Days 8-10) Goal: the front door. 1. Next.js (or Astro, SvelteKit — pick what the team knows) app with: - OIDC login. - Tile grid, data-driven from `apps.json`. - Each tile: icon, title, 1-line description, click-through to either a Dify share URL or `/chat` (Open WebUI). - Search/filter for when there are 50 tiles. 2. Make `apps.json` editable without redeploy: read it from a mounted volume or fetch from Dify's app list API on render. 3. Branding: company logo, color scheme. 4. Add a simple "Costs" link visible to admins only → embeds Langfuse or LiteLLM dashboard. **Acceptance:** A new employee with SSO access lands on the page, sees tiles, can click into a workflow or open chat without any extra login prompt. ### Phase 5 — Observability & ops (Days 11-12) 1. Stand up Langfuse, configure LiteLLM to ship traces to it. 2. Build a couple of saved dashboards: total spend per day, top 10 users, top 10 apps. 3. Set up alerts: Slack/email when monthly spend hits 80% of cap. 4. Backup script for: Postgres (Dify, Langfuse, Open WebUI), Redis snapshots, Dify workflow exports. 5. Restore drill: spin up fresh stack from backups in <30 min. ### Phase 6 — Hardening & rollout (Days 13-15) 1. Move from self-signed to real TLS (Let's Encrypt via Traefik). 2. Lock down: only company SSO group X can authenticate. 3. Audit log review: confirm every LLM call has user attribution. 4. Doc: `ONBOARDING.md` for end users, `ADDING_NEW_APP.md` for the team that maintains workflows. 5. Pilot with 5-10 users for a week. Iterate. Then announce internally. --- ## 6. Critical decisions to make before coding These should be answered before Phase 0: 1. **Where will this run?** On-prem, AWS, Azure, GCP? Affects networking, secrets management, and TLS. 2. **Which SSO?** Microsoft Entra ID (Azure AD), Google Workspace, Okta, Keycloak? Open WebUI, Dify, and Langfuse all support OIDC; landing page will too. 3. **What's the realistic budget for v1?** Affects: how many models we expose (GPT-5 + Claude Opus = expensive), default per-user budget, alert thresholds. 4. **Who owns this long-term?** The team adding new workflows in Dify is different from the team maintaining the platform. Make this explicit. 5. **Data sensitivity?** If employees may paste PII or confidential data, need to: (a) prefer Anthropic/OpenAI Zero Data Retention agreements, (b) consider Azure OpenAI for stronger contractual posture, (c) document an acceptable use policy. 6. **Logo customization for Dify.** Internal-only single-tenant deployments are fine keeping the Dify logo. If branding is required, pricing requires contacting business@dify.ai directly. Recommend: keep logo for v1, revisit if leadership pushes back. --- ## 7. Risks & mitigations | Risk | Mitigation | |------|------------| | Cost runs away on day 1 | LiteLLM hard budgets per user enforced from the start, low default ($10-20/user/month) | | Sandbox escape in code interpreter | gVisor is what ChatGPT uses; combined with container isolation, lowest-feasible risk for this use case. No internet egress from sandbox by default. | | Dify license violation | Stay single-tenant, keep logo (or pay for license). Document this in `LICENSE_NOTES.md` | | Sensitive data leakage to providers | Configure providers with ZDR / no-training. Add a banner on the landing page reminding users not to paste secrets. Optionally: a content filter at the LiteLLM layer (regex for credit cards, secrets) that strips/blocks. | | Vendor lock-in to Dify | Workflows are exportable as YAML. Periodic export commit to git keeps them portable. | | User identity drift across services | Single OIDC issuer, all services configured to use it, scripted "sync user → LiteLLM virtual key" on first login. | | Open WebUI gVisor sandbox needs privileged Docker | If host policy blocks privileged containers, fall back to Pyodide (browser-based, no privilege needed, narrower library set). Document the trade-off. | --- ## 8. Acceptance criteria for v1 The v1 ships when all of these are true: - [ ] An employee with SSO can reach the portal at `https://ai..` and see tiles. - [ ] At least 5 working "apps" as tiles, plus the chat tile. - [ ] Chat supports: Excel/CSV/PDF upload, code execution, image input, model switching mid-conversation. - [ ] Every LLM call appears in LiteLLM logs with the calling user's identity. - [ ] Per-user monthly budget enforces (verified by integration test). - [ ] Admin can see a per-user, per-app spend dashboard. - [ ] Adding a new "app" tile takes <30 minutes (build workflow in Dify, add row to `apps.json`). - [ ] Documented disaster recovery: backups + restore drill executed once. - [ ] Acceptable use policy linked from the landing page. --- ## 9. Known unknowns / research items These need a spike before committing: 1. **User identity propagation Dify → LiteLLM.** Dify's "share as web app" mode may not pass end-user identity to its LLM calls. May need to either: (a) put a small reverse proxy in front of Dify share URLs that injects a header, or (b) provision per-user Dify accounts and per-user provider keys in Dify (expensive ops cost). Spike this in Phase 3. 2. **Open WebUI per-user provider keys vs tag-based attribution.** Both work; pick based on which has cleaner UX for budget reporting. 3. **Langfuse self-hosting license.** Langfuse self-hosted has tiers; the FOSS core covers traces but some advanced features are commercial. Confirm the FOSS-core features are sufficient before depending on it. If not, fallback is LiteLLM's built-in dashboard (less pretty but free and sufficient for v1). 4. **Where do uploaded files live?** Open WebUI stores them; depending on data classification, may need to mount an encrypted volume or shorten retention. 5. **Rate limits per provider.** With many users behind one company key, OpenAI/Anthropic org-level RPM/TPM limits may bite. LiteLLM supports load balancing across multiple keys — plan for this if rollout is wide. --- ## 10. Quick start for the local agent When you sit down with the local agent, suggested first prompt: > Read `AI_PORTAL_HANDOFF.md` in this repo. We're building Phase 0 today: the docker-compose skeleton. Set up the directory structure from Section 4, create a `docker-compose.yml` that starts LiteLLM, Open WebUI, Dify (api + web + worker), Postgres, Redis, Langfuse, Traefik, and a placeholder landing page service. Use environment variables for all secrets, with a `.env.example` checked in. Don't configure model providers yet — just get all containers healthy and reachable through Traefik with self-signed TLS at `*.localhost`. Stop and ask before proceeding to Phase 1. After Phase 0 works, hand the agent Phase 1, 2, etc. one at a time. Don't let it run more than one phase ahead — these are integration-heavy and a bad assumption early on will cascade. --- ## Appendix A — Reference links - Dify: https://github.com/langgenius/dify - Open WebUI: https://github.com/open-webui/open-webui - Open WebUI code exec docs: https://docs.openwebui.com/features/chat-conversations/chat-features/code-execution/ - gVisor sandbox add-on: https://github.com/EtiennePerot/safe-code-execution - LiteLLM: https://github.com/BerriAI/litellm - LiteLLM proxy docs: https://docs.litellm.ai/docs/proxy/quick_start - Langfuse: https://github.com/langfuse/langfuse ## Appendix B — Why not LibreChat Considered. Rejected because LibreChat's built-in Code Interpreter is a paid SaaS service rather than a self-hostable component. The "as capable as official chat interfaces" requirement specifically calls for in-chat code execution and file processing; making that a paid dependency contradicts the "free and open source" goal. Open WebUI's gVisor add-on covers the same ground and is fully Apache-2.0. ## Appendix C — Why not building from scratch Considered. Rejected for v1 because: - A production-quality multi-provider chat UI with file uploads, model switching, conversation history, and a sandboxed code interpreter is roughly a year of focused engineering. Open WebUI gives us this for free. - A workflow builder that non-engineers can use to add new "apps" is similarly large in scope. Dify gives us this for free. - Custom code is justified only where no FOSS option exists: the landing page (small), the auth glue, and the per-user billing reconciliation. If after running v1 for a few months we find the FOSS pieces don't fit, we can replace one component at a time without throwing out the whole stack — that's the benefit of the gateway architecture.