§01
Overview
- What it is: "Clinical reasoning engine for cardiovascular domain (TA1)" from US healthtech platform. Takes a patient's clinical context and, through multi-agent debate, generates/critiques/synthesizes hypotheses with an epistemic quality-of-explanation score. RAG over clinical guidelines, multi-tenancy, audit. Part of the Deutsch (TA1) / Popper (TA2) / Hermes / PHI-service ecosystem.
- Type / status / role: api/engine (Bun monorepo) / active / lead — user Davron Yuldashev
<yul.davron.93@gmail.com>= 170 commits (123 "Davron Yuldashev" + 47 "Dave93") out of 292 (~58%), and the top author of the core engine package (28 edits to core vs 18 by Anton Kim, 17 Harsh, 11 aniashev). Team: Anton Kim, Harsh Manwani, Anna Shevtsova/aniashev. - Activity window: 2026-01-26 → 2026-03-11 (~1.5 months of intense work), 292 commits.
§02
Stack
- Languages: TypeScript (Bun runtime).
- **Monorepo (Turborepo + Bun workspaces
@deutsch/*): `apps/`: api (Elysia), queue. `packages/`: core (the reasoning engine), cartridges/cvd** (cardiology domain cartridge), db (Drizzle + TimescaleDB + pgvectorscale), client (TS SDK), adapters (popper TA2, phi), config-*. - AI: Vercel AI SDK 6 with a multi-provider registry and HIPAA-aware failover: Vercel Gateway → Azure OpenAI (BAA) → AWS Bedrock (BAA) → Anthropic → OpenAI; BYOK. Model aliases (reasoning-primary=Claude Sonnet 4.5, reasoning-fast/response=Haiku 4.5, embeddings text-embedding-3-large 3072d).
model-registry.jsonwith rate limits,switch-provider.sh/model-query.sh, presets for Cerebras/Vertex. - Data: PostgreSQL 17 + TimescaleDB (hypertables:
audit_events6-year retention,session_activity90 days, compression/retention policies) + pgvectorscale/pgvector (RAG:guideline_embeddings,interaction_embeddings). Multi-tenant (tenants,sessions). - Infra/deploy: Docker (
apps/api/Dockerfile), KSA deploy (docker-compose.ksa.yml— Saudi Arabia), GitHub Actions CI (lint/typecheck/test/build/docker), Biome.
§03
What was shipped
The user is the lead developer; owns both the platform and a significant part of the engine.
- ArgMed engine core (
packages/core/src/engine/) — top author: debate-orchestrator, proposal-generator, claim-classifier, contradiction-detector, survivor-selector, confidence-calculator, htv-scorer, bold-rating, counter-hypothesis, idk-trigger, mode-enforcer, diversity-analyzer, snapshot-validator, session-manager, context-builder, output-validator. - AI layer (
packages/core/src/ai): providers, embeddings, client, ArgMed Zod schemas (Generator/Verifier/Reasoner Output, HTVScore, ClaimType). - CVD cartridge (cardiology domain knowledge) — top author.
- API/DB/queues: Elysia API, Drizzle + TimescaleDB schema, queue package.
- Volume: 170/292 commits (~58%), including the non-trivial engine algorithms.
§04
Technical challenges
Confirmed by code (packages/core/src/engine/*).
- ArgMed: three-agent debate (
debate-orchestrator.ts) → Generator→Verifier→Reasoner pipeline with configuration:htvThreshold(minimum acceptance score for a hypothesis),maxRounds, claim-type coverage check with retry (validateClaimTypeCoverage,retryAttempted/retrySucceeded), framework-agnostic metrics callbacks (PhaseMetricsCallback— Prometheus/OTel) and SSE progress (PhaseStartCallback). Mature LLM-agent orchestration. - HTV scoring (Hard-To-Vary) (
htv-scorer.ts) → implementation of David Deutsch's good-explanation criterion: measurement across Interdependence / Specificity / Parsimony / Falsifiability axes; thresholdsrefutation 0.3 / idk 0.4 / good 0.7 / excellent 0.85. A hypothesis below threshold → refuted or triggers the IDK protocol ("I don't know" instead of hallucination). This is rare, deliberate epistemic engineering (Popper/Deutsch in production). - Anti-hallucination through falsification → contradiction-detector, counter-hypothesis, survivor-selector, idk-trigger — the system rejects poorly grounded hypotheses instead of lying confidently. Critical for clinical settings.
- HIPAA-aware multi-provider → failover chain with BAA providers (Azure/Bedrock), BYOK; the
deutschGenerateText/Object/Embedabstraction withpurpose-based routing. - Time-series + vector in one Postgres → TimescaleDB hypertables (retention/compression policies) + pgvectorscale for RAG; a single database instead of a storage zoo.
§05
AI-assisted development
- Sessions found: the local Claude Code sessions directory for this project exists, but contains 0 `.jsonl` transcripts (cleared/not saved).
- Indirectly: a detailed
CLAUDE.md(11 KB),CONTRIBUTING.md, dense documentation (docs/01-architecture/02-argmed-framework.mdreferenced in code) — AI-assisted development, engineering-disciplined. - AI workflow patterns: no transcripts; but the repo is a textbook example of "AI-driven but strictly specified" development (Zod schemas, tests per engine module:
*.test.tsfor htv-scorer, debate, claim-classifier, survivor-selector, etc.).
§06
Achievements & metrics
- The user: 170 commits (~58%), lead author of the core engine.
- Engine: 15+ reasoning modules + 3-agent debate + HTV scoring on 4 axes + IDK protocol.
- DB: 6 tables, 2 hypertables (6-year audit), 2 RAG vector tables, multi-tenant.
- AI: 5-level HIPAA failover, 6 model aliases, registry with rate limits, Cerebras/Vertex/Azure/Bedrock presets.
- Engine test coverage (bun:test per module).
- KSA region deployment.