Back to archive
II. Clinical AI & Health Platformssupportingcontributorclient anonymised

Deutsch / Popper Bench

Turborepo monorepo for clinical AI evaluation (US healthtech platform): runs LLM agents over clinical vignettes with anti-overfitting methodology. The user built the platform layer — Elysia API, auth/RBAC, queues, dashboard, and AWS deploy.

Status
active
Period
2026-02-11 → 2026-02-23
AI sessions
Stack
Languages
TypeScript
Frameworks · Infra
BunTurborepoElysia.jsNext.js 16Drizzle ORMBetter AuthBullMQTimescaleDBVercel AI SDK
§01

Overview

  • What it is: internal product of US healthtech platform — "Clinical Validation Bench" (package name regain-clinical-validation). Benchmark harness that runs clinical AI systems (codenames Deutsch / Popper / Hermes) over a corpus of clinical vignettes, scores responses (judge/oracle), compares runs, tracks regressions, and fights overfitting (strict anti-overfitting methodology, anchored to AHA/ACC/ESC guidelines). Multi-provider model registry (Cerebras/Vertex/Azure) via Vercel AI SDK 6.
  • Type / status / role: api/platform (monorepo: API + dashboard + queues + runner) / active / contributor (substantial). Primary author — Anton Kim <anton@US healthtech platform> (154 commits, clinical engine and methodology). User Davron — 9 commits but huge in volume (~32,000 lines): built the entire application layer (API, auth, queues, dashboard, deploy).
  • Activity period: 2026-02-11 → 2026-02-23 (~2 weeks of intensive work), 163 commits.
§02

Stack

  • Languages: TypeScript (Bun runtime).
  • Monorepo: Turborepo + Bun workspaces (@workspace/*). apps/: api (Elysia :3001), web (Next.js 16 :3002), runner (CLI bench), queue (BullMQ worker), dashboard (legacy). packages/: db (Drizzle+TimescaleDB), auth (Better Auth), ui (shadcn/Tailwind v4), harness, judge, oracle, analyzer, vignettes, cli.
  • Frameworks/libraries: Elysia.js (API), Next.js 16 + React 19 (dashboard), Drizzle ORM + pg (PostgreSQL/TimescaleDB), Better Auth (+admin/RBAC), BullMQ + Bun.redis (queues), shadcn/ui + Tailwind v4, Vercel AI SDK 6 (createProviderRegistry), @regain/hermes 2.0. Lint/format — Biome.
  • Infra/deploy (user contribution): multi-arch Docker (ARM64/Graviton), Bun build --compile to standalone binary; GitHub Actions → AWS ECS (regain-production), OIDC role (id-token), build matrix for 3 services into ECR.
  • Data: PostgreSQL/TimescaleDB (Drizzle, packages/db schema), Redis (queues), vignette corpus (packages/vignettes, data/, vignettes/), reports (reports/).
§03

What was shipped

Project overall (Anton): clinical bench engine — harness/judge/oracle/analyzer, vignette corpus, anti-overfitting methodology, model registry, bench scripts (history/compare/baseline/changelog/traces/control-conformance).

User's contribution (9 commits, verified via git log --author, ~32k lines total):

  • dae8dbb (184 files, +9679) — implemented the Elysia.js API with authentication and export (effectively brought up apps/api from scratch).
  • a4d0988 (62 files, +17816) — extended controllers + new features (analytics, corpus, export, generalization, improvements, queue, runs, vignettes).
  • e87a280 (30 files, +5009) — dashboard pages, queue infrastructure, ARPA targets.
  • 5a11f85Dockerfiles + CI/CD for AWS (ECS deploy, ecs-deploy.sh, deploy-us.yml).
  • 4d84d5e — auth middleware + dashboard layout improvement.
  • c6b6906 — fix cross-subdomain cookie (infinite redirect loop on auth).
  • dc665de — type-error fix after merge; + 2 merge commits.
  • Net contribution: user owns the entire platform layer (API + auth/RBAC + queues + dashboard + deploy); Anton — the clinical engine/science.
§04

Technical challenges

Confirmed by code (user's files).

  • RBAC as an Elysia macro (apps/api/src/lib/rbac.ts) → rbacPlugin with isAuthenticated and rbac({permission}) macros; delegates permission checks to Better Auth (auth.api.userHasPermission), clean 401/403. Declarative route protection at the framework level.
  • BullMQ isolation from Elysia types (modules/v1/queue/queue-service.ts) → thin wrapper returning plain objects (JobInfo/JobCounts/JobDetail) so BullMQ types don't "leak" into Elysia's type chain. Mature architectural decision — understanding how TS types propagate across the API layer.
  • Standalone binary for Graviton (apps/api/Dockerfile) → multi-stage, bun build --compile --minify --target bun-linux-${TARGETARCH} → single binary in production without runtime; layered dependency caching across all workspace packages. Senior-level containerization.
  • OIDC deploy to AWS ECS (.github/workflows/deploy-us.yml) → triggered on successful CI (workflow_run), id-token: write + configure-aws-credentials (no long-lived keys), build matrix for bench-api/web/worker into ECR, cluster regain-production. Modern, secure CD.
  • Domain API → 8 controllers (runs, vignettes, analytics, corpus, export, generalization, improvements, queue) — REST surface over the clinical bench.
§05

AI-assisted development

  • Sessions found: the Claude Code sessions directory for this project exists but contains 0 `.jsonl` transcripts (cleared/not saved).
  • Indirectly: very detailed CLAUDE.md (12 KB) with strict rules (Bun-only, no dynamic import, TanStack Query required, anti-overfitting), .cursor-like conventions — development clearly AI-assisted (Cursor/Claude Code). The canonical dev machine path is /Users/gsizm/ (Anton).
  • AI workflow patterns: no transcripts for details; but CLAUDE.md is an excellent example of an engineering "AI repo guide".
§06

Achievements & metrics

  • Monorepo: 5 apps + 11 packages, Turborepo orchestration.
  • User: ~32k lines in 9 commits — entire backend/infra layer.
  • API: 8 domain controllers + RBAC + export.
  • Deploy: 3 container services on AWS ECS (Graviton/ARM64), OIDC CI/CD.
  • Bench: corpus of ~29 vignettes (smoke) — several cardio conditions (HFrEF/HFpEF/post-MI); multi-provider registry (6+ Cerebras, 6+ Vertex models).
§07

Contributors

git shortlog · all branches

  1. Dave939
  2. Anton Kim154
2 contributors163 commits total
Currently

Open to Senior / Staff engineering roles and selective freelance — production AI, platform, and full-stack work.

Get in touch