AI_portal/AI_PORTAL_HANDOFF.md

# Internal AI Portal — Project Handoff

**Status:** Architecture & vendor selection complete. Ready for build.
**Audience:** Local coding agent + project owner.
**Goal:** Build an internal AI landing page for company employees that combines guided "AI apps" (tile-based workflows) with a full-featured chat fallback, all routed through company API keys with per-user cost tracking and budget enforcement.

---

## 1. Product vision

A single internal URL employees go to. They see:

- **A grid of tiles**, each a guided "AI app" for a specific task (legal-notice check, contract summarizer, invoice extractor, translator, etc.). These are for colleagues who don't want to write prompts — fill a form, get an answer.
- **One special tile labeled "Open Chat"** that drops them into a full ChatGPT/Claude-equivalent chat: file uploads (Excel, PDF, images), code execution sandbox, multi-turn, switch between models mid-conversation.
- All API calls go through company-owned keys. Employees don't pay; the company does. In return, every call is logged, costed, and budget-capped per user.

**Non-goals (for v1):**
- Customer-facing product (internal only).
- Multi-tenant SaaS.
- Replacing existing tools that aren't AI-related.

---

## 2. Architecture overview

Composition of three FOSS components plus a thin custom landing page. No single FOSS project covers all requirements — combining them is the production-ready path.

```
                        ┌────────────────────────────────────┐
                        │   Custom landing page (tiles UI)   │
                        │   - Auth via company SSO (OIDC)    │
                        │   - Renders tiles + "Open Chat"    │
                        └────────────┬───────────────────────┘
                                     │
              ┌──────────────────────┼──────────────────────┐
              │                      │                      │
              ▼                      ▼                      ▼
      ┌────────────┐         ┌────────────┐         ┌──────────────┐
      │   Dify     │         │ Open WebUI │         │  Other tile  │
      │ (workflow  │         │  (chat +   │         │  apps (links │
      │   apps)    │         │   sandbox) │         │   or iframes)│
      └─────┬──────┘         └─────┬──────┘         └──────┬───────┘
            │                      │                       │
            └──────────────────────┼───────────────────────┘
                                   ▼
                     ┌──────────────────────────────┐
                     │   LiteLLM proxy gateway      │
                     │   - Unified OpenAI-format    │
                     │   - Per-user virtual keys    │
                     │   - Budget enforcement       │
                     │   - Cost & token logging     │
                     └──────────────┬───────────────┘
                                    │
                                    ▼
                     ┌──────────────────────────────┐
                     │  Langfuse (observability)    │
                     │  - Traces, dashboards        │
                     │  - Per-user analytics        │
                     │  (consumes LiteLLM logs)     │
                     └──────────────┬───────────────┘
                                    │
                                    ▼
                     OpenAI / Anthropic / Google / Azure APIs
```

### Component responsibilities

| Component | Role | License |
|-----------|------|---------|
| **Landing page** | Tile UI, SSO entry point, hands users off to specific tools | Custom (we own it) |
| **Dify** | Hosts the predefined "apps" — visual workflow builder, each app is a workflow | Apache 2.0 + extras (logo restriction; OK for internal single-tenant) |
| **Open WebUI** | The "Open Chat" experience — full ChatGPT-equivalent UI with file uploads & code interpreter | BSD-3-Clause |
| **gVisor code execution add-on** | Sandboxed Python/Bash for Open WebUI (Excel processing, charts, etc.) | Apache 2.0 |
| **LiteLLM** | Single API gateway for all providers, virtual keys per user, budget enforcement, cost tracking | MIT |
| **Langfuse** | Observability dashboards on top of LiteLLM | MIT (with self-host caveats — see §9) |

### Why this split

- **Dify alone** has tile-based apps but its chat is weaker than dedicated chat UIs and its code interpreter doesn't match ChatGPT's Excel-processing experience.
- **Open WebUI alone** is chat-first; building 50 guided "apps" inside it means custom Functions/Pipelines code per app, which is harder to maintain than Dify's visual workflows.
- **LibreChat** was considered but rejected: its built-in code interpreter is a paid SaaS service, which fails the "as capable as official chat interfaces" requirement out of the box. Open WebUI's gVisor sandbox is fully free.
- **LiteLLM** is the only piece that gives us per-user budgets and cost tracking across both Dify and Open WebUI, since both can be configured to use it as their "OpenAI-compatible" provider.

---

## 3. User flows

### Flow A — colleague who doesn't want to write prompts

1. Opens portal → sees tile grid.
2. Clicks "Legal Notice Check" tile → opens Dify-hosted app in iframe or new tab.
3. Form: paste notice text, select jurisdiction, click Submit.
4. Dify workflow runs → output displayed.
5. All LLM calls inside the workflow went through LiteLLM → cost attributed to this user.

### Flow B — colleague who wants a real chat

1. Opens portal → clicks "Open Chat" tile.
2. Lands in Open WebUI session (already authenticated via SSO).
3. Uploads `q3-sales.xlsx`, asks "summarize regional performance and draw a bar chart."
4. Model writes Python → gVisor sandbox executes → result + chart appear inline.
5. User can switch model mid-conversation (Claude → GPT-5 → local).
6. All calls flow through LiteLLM → costed to this user, budget enforced.

### Flow C — admin

1. Opens LiteLLM admin UI → sees per-user spend, sets/adjusts budgets.
2. Opens Langfuse → sees traces, prompt analytics, error rates.
3. Opens Dify admin → adds a new workflow → it appears as a new tile on the landing page (via config update).

---

## 4. Repository layout (proposed)

```
ai-portal/
├── README.md
├── docker-compose.yml              # All services
├── .env.example
├── landing/                        # Custom landing page (Next.js or similar)
│   ├── package.json
│   ├── src/
│   │   ├── pages/index.tsx         # Tile grid
│   │   ├── lib/auth.ts             # OIDC client
│   │   └── data/apps.json          # Tile definitions
│   └── Dockerfile
├── litellm/
│   ├── config.yaml                 # Models, virtual keys, budgets
│   └── README.md
├── openwebui/
│   ├── env.example
│   ├── functions/                  # Custom OWUI functions if needed
│   └── tools/
│       └── run_code.py             # gVisor sandbox tool (from EtiennePerot/safe-code-execution)
├── dify/
│   ├── env.example
│   ├── workflows/                  # Exported workflow DSL files (one per tile app)
│   │   ├── legal-notice-check.yml
│   │   └── ...
│   └── README.md
├── langfuse/
│   └── env.example
├── infra/
│   ├── traefik/                    # Reverse proxy + TLS
│   │   └── traefik.yml
│   └── backup/
│       └── backup.sh
└── docs/
    ├── ARCHITECTURE.md             # This document, eventually
    ├── ADDING_NEW_APP.md           # Runbook for adding a tile
    └── ONBOARDING.md               # End-user docs
```

---

## 5. Build phases

### Phase 0 — Local dev environment (Day 1)

Goal: all four services running locally via docker-compose, talking to each other.

1. Bootstrap repo with the layout above.
2. Write `docker-compose.yml` with services: `litellm`, `langfuse`, `openwebui`, `dify-api`, `dify-web`, `dify-worker`, `redis`, `postgres`, `traefik`, `landing`.
3. Each service gets its own subnet + `.env` file. No secrets in git.
4. Verify: each service reachable at `https://<service>.localhost` via Traefik with self-signed certs.

**Acceptance:** `docker compose up` brings everything up clean. Each service's UI loads.

### Phase 1 — LiteLLM gateway (Day 2)

Goal: all model calls go through LiteLLM. Cost tracking works.

1. Configure `litellm/config.yaml` with at minimum: OpenAI, Anthropic, Google. Add Azure if used.
2. Set up a master key + per-user virtual keys (for now: 2 test users).
3. Set test budget: $5/user/month, hard cap.
4. Smoke test: `curl` a chat completion through LiteLLM → verify it appears in LiteLLM's spend log.
5. Verify budget enforcement: temporarily lower limit, blast 1000 tokens, confirm 429.

**Acceptance:** LiteLLM admin UI shows per-user spend in real time. Budget limits actually block.

LiteLLM docs: https://docs.litellm.ai/docs/proxy/quick_start

### Phase 2 — Open WebUI as the chat tile (Days 3-4)

Goal: full chat UI with file upload + Excel processing.

1. Configure Open WebUI to use LiteLLM as its OpenAI-compatible endpoint.
2. Set up SSO (OIDC) so user identity flows through.
3. Map OWUI users → LiteLLM virtual keys (so per-user budgets work). Two options:
   - Single LiteLLM key, OWUI passes user ID as `X-LiteLLM-User-Id` header — requires LiteLLM tag-based tracking.
   - Per-user LiteLLM keys, OWUI feature for per-user provider keys. Investigate which is cleaner.
4. Install the gVisor code execution function and tool from `EtiennePerot/safe-code-execution`. Configure the OWUI container with sandboxing prerequisites (privileged or specific capabilities — see that repo's setup docs).
5. Test: upload an XLSX, ask the model to analyze and chart it. Verify code runs in sandbox, chart appears inline, no code escapes the sandbox.

**Acceptance:** A non-technical user can upload a file and ask "what's in this spreadsheet" and get a useful answer with a chart. Cost shows up in LiteLLM tied to that user.

References:
- Open WebUI docs: https://docs.openwebui.com/
- Code exec add-on: https://github.com/EtiennePerot/safe-code-execution
- Pyodide alternative (no Docker privilege needed but limited libs): https://docs.openwebui.com/features/chat-conversations/chat-features/code-execution/python/

### Phase 3 — Dify for workflow apps (Days 5-7)

Goal: at least 3 working "apps" hosted in Dify and reachable via shareable URLs.

1. Configure Dify's model providers to point at LiteLLM (NOT directly at OpenAI/Anthropic). This is critical — otherwise Dify calls bypass cost tracking.
2. Create the first 3 workflows as proof of concept:
   - **Legal Notice Check** (input: text + jurisdiction → output: risk summary)
   - **Document Summarizer** (input: file → output: bullet summary)
   - **Email Drafter** (input: bullets → output: polished email)
3. For each workflow, enable the "share as web app" feature, get a URL.
4. Decide tile-to-workflow URL mapping format: store in `landing/src/data/apps.json`.

**Acceptance:** Each app works end-to-end via its share URL. Each invocation shows up in LiteLLM spend log attributed to a user (this requires Dify to forward user identity — research how, may need a custom API gateway proxy in front of Dify share URLs).

References:
- Dify self-hosting: https://docs.dify.ai/getting-started/install-self-hosted/docker-compose
- Dify license: https://github.com/langgenius/dify/blob/main/LICENSE (note logo restriction)

### Phase 4 — Landing page (Days 8-10)

Goal: the front door.

1. Next.js (or Astro, SvelteKit — pick what the team knows) app with:
   - OIDC login.
   - Tile grid, data-driven from `apps.json`.
   - Each tile: icon, title, 1-line description, click-through to either a Dify share URL or `/chat` (Open WebUI).
   - Search/filter for when there are 50 tiles.
2. Make `apps.json` editable without redeploy: read it from a mounted volume or fetch from Dify's app list API on render.
3. Branding: company logo, color scheme.
4. Add a simple "Costs" link visible to admins only → embeds Langfuse or LiteLLM dashboard.

**Acceptance:** A new employee with SSO access lands on the page, sees tiles, can click into a workflow or open chat without any extra login prompt.

### Phase 5 — Observability & ops (Days 11-12)

1. Stand up Langfuse, configure LiteLLM to ship traces to it.
2. Build a couple of saved dashboards: total spend per day, top 10 users, top 10 apps.
3. Set up alerts: Slack/email when monthly spend hits 80% of cap.
4. Backup script for: Postgres (Dify, Langfuse, Open WebUI), Redis snapshots, Dify workflow exports.
5. Restore drill: spin up fresh stack from backups in <30 min.

### Phase 6 — Hardening & rollout (Days 13-15)

1. Move from self-signed to real TLS (Let's Encrypt via Traefik).
2. Lock down: only company SSO group X can authenticate.
3. Audit log review: confirm every LLM call has user attribution.
4. Doc: `ONBOARDING.md` for end users, `ADDING_NEW_APP.md` for the team that maintains workflows.
5. Pilot with 5-10 users for a week. Iterate. Then announce internally.

---

## 6. Critical decisions to make before coding

These should be answered before Phase 0:

1. **Where will this run?** On-prem, AWS, Azure, GCP? Affects networking, secrets management, and TLS.
2. **Which SSO?** Microsoft Entra ID (Azure AD), Google Workspace, Okta, Keycloak? Open WebUI, Dify, and Langfuse all support OIDC; landing page will too.
3. **What's the realistic budget for v1?** Affects: how many models we expose (GPT-5 + Claude Opus = expensive), default per-user budget, alert thresholds.
4. **Who owns this long-term?** The team adding new workflows in Dify is different from the team maintaining the platform. Make this explicit.
5. **Data sensitivity?** If employees may paste PII or confidential data, need to: (a) prefer Anthropic/OpenAI Zero Data Retention agreements, (b) consider Azure OpenAI for stronger contractual posture, (c) document an acceptable use policy.
6. **Logo customization for Dify.** Internal-only single-tenant deployments are fine keeping the Dify logo. If branding is required, pricing requires contacting business@dify.ai directly. Recommend: keep logo for v1, revisit if leadership pushes back.

---

## 7. Risks & mitigations

| Risk | Mitigation |
|------|------------|
| Cost runs away on day 1 | LiteLLM hard budgets per user enforced from the start, low default ($10-20/user/month) |
| Sandbox escape in code interpreter | gVisor is what ChatGPT uses; combined with container isolation, lowest-feasible risk for this use case. No internet egress from sandbox by default. |
| Dify license violation | Stay single-tenant, keep logo (or pay for license). Document this in `LICENSE_NOTES.md` |
| Sensitive data leakage to providers | Configure providers with ZDR / no-training. Add a banner on the landing page reminding users not to paste secrets. Optionally: a content filter at the LiteLLM layer (regex for credit cards, secrets) that strips/blocks. |
| Vendor lock-in to Dify | Workflows are exportable as YAML. Periodic export commit to git keeps them portable. |
| User identity drift across services | Single OIDC issuer, all services configured to use it, scripted "sync user → LiteLLM virtual key" on first login. |
| Open WebUI gVisor sandbox needs privileged Docker | If host policy blocks privileged containers, fall back to Pyodide (browser-based, no privilege needed, narrower library set). Document the trade-off. |

---

## 8. Acceptance criteria for v1

The v1 ships when all of these are true:

- [ ] An employee with SSO can reach the portal at `https://ai.<company>.<tld>` and see tiles.
- [ ] At least 5 working "apps" as tiles, plus the chat tile.
- [ ] Chat supports: Excel/CSV/PDF upload, code execution, image input, model switching mid-conversation.
- [ ] Every LLM call appears in LiteLLM logs with the calling user's identity.
- [ ] Per-user monthly budget enforces (verified by integration test).
- [ ] Admin can see a per-user, per-app spend dashboard.
- [ ] Adding a new "app" tile takes <30 minutes (build workflow in Dify, add row to `apps.json`).
- [ ] Documented disaster recovery: backups + restore drill executed once.
- [ ] Acceptable use policy linked from the landing page.

---

## 9. Known unknowns / research items

These need a spike before committing:

1. **User identity propagation Dify → LiteLLM.** Dify's "share as web app" mode may not pass end-user identity to its LLM calls. May need to either: (a) put a small reverse proxy in front of Dify share URLs that injects a header, or (b) provision per-user Dify accounts and per-user provider keys in Dify (expensive ops cost). Spike this in Phase 3.
2. **Open WebUI per-user provider keys vs tag-based attribution.** Both work; pick based on which has cleaner UX for budget reporting.
3. **Langfuse self-hosting license.** Langfuse self-hosted has tiers; the FOSS core covers traces but some advanced features are commercial. Confirm the FOSS-core features are sufficient before depending on it. If not, fallback is LiteLLM's built-in dashboard (less pretty but free and sufficient for v1).
4. **Where do uploaded files live?** Open WebUI stores them; depending on data classification, may need to mount an encrypted volume or shorten retention.
5. **Rate limits per provider.** With many users behind one company key, OpenAI/Anthropic org-level RPM/TPM limits may bite. LiteLLM supports load balancing across multiple keys — plan for this if rollout is wide.

---

## 10. Quick start for the local agent

When you sit down with the local agent, suggested first prompt:

> Read `AI_PORTAL_HANDOFF.md` in this repo. We're building Phase 0 today: the docker-compose skeleton. Set up the directory structure from Section 4, create a `docker-compose.yml` that starts LiteLLM, Open WebUI, Dify (api + web + worker), Postgres, Redis, Langfuse, Traefik, and a placeholder landing page service. Use environment variables for all secrets, with a `.env.example` checked in. Don't configure model providers yet — just get all containers healthy and reachable through Traefik with self-signed TLS at `*.localhost`. Stop and ask before proceeding to Phase 1.

After Phase 0 works, hand the agent Phase 1, 2, etc. one at a time. Don't let it run more than one phase ahead — these are integration-heavy and a bad assumption early on will cascade.

---

## Appendix A — Reference links

- Dify: https://github.com/langgenius/dify
- Open WebUI: https://github.com/open-webui/open-webui
- Open WebUI code exec docs: https://docs.openwebui.com/features/chat-conversations/chat-features/code-execution/
- gVisor sandbox add-on: https://github.com/EtiennePerot/safe-code-execution
- LiteLLM: https://github.com/BerriAI/litellm
- LiteLLM proxy docs: https://docs.litellm.ai/docs/proxy/quick_start
- Langfuse: https://github.com/langfuse/langfuse

## Appendix B — Why not LibreChat

Considered. Rejected because LibreChat's built-in Code Interpreter is a paid SaaS service rather than a self-hostable component. The "as capable as official chat interfaces" requirement specifically calls for in-chat code execution and file processing; making that a paid dependency contradicts the "free and open source" goal. Open WebUI's gVisor add-on covers the same ground and is fully Apache-2.0.

## Appendix C — Why not building from scratch

Considered. Rejected for v1 because:
- A production-quality multi-provider chat UI with file uploads, model switching, conversation history, and a sandboxed code interpreter is roughly a year of focused engineering. Open WebUI gives us this for free.
- A workflow builder that non-engineers can use to add new "apps" is similarly large in scope. Dify gives us this for free.
- Custom code is justified only where no FOSS option exists: the landing page (small), the auth glue, and the per-user billing reconciliation.

If after running v1 for a few months we find the FOSS pieces don't fit, we can replace one component at a time without throwing out the whole stack — that's the benefit of the gateway architecture.