Back to archive
II. Clinical AI & Health Platformsshowcaseleadclient anonymised

ArgMed — Clinical Reasoning Engine

Clinical reasoning engine (US healthtech platform) for cardiology: the three-agent ArgMed debate (Generator→Verifier→Reasoner) that scores hypotheses by Deutsch's "hard-to-vary" criterion. The user is the lead author, including the engine core.

Status
active
Period
2026-01-26 → 2026-03-11
AI sessions
Stack
Languages
TypeScript
Frameworks · Infra
BunTurborepoElysia.jsVercel AI SDK 6Drizzle ORMTimescaleDBpgvectorscaleZod
§01

Overview

  • What it is: "Clinical reasoning engine for cardiovascular domain (TA1)" from US healthtech platform. Takes a patient's clinical context and, through multi-agent debate, generates/critiques/synthesizes hypotheses with an epistemic quality-of-explanation score. RAG over clinical guidelines, multi-tenancy, audit. Part of the Deutsch (TA1) / Popper (TA2) / Hermes / PHI-service ecosystem.
  • Type / status / role: api/engine (Bun monorepo) / active / lead — user Davron Yuldashev <yul.davron.93@gmail.com> = 170 commits (123 "Davron Yuldashev" + 47 "Dave93") out of 292 (~58%), and the top author of the core engine package (28 edits to core vs 18 by Anton Kim, 17 Harsh, 11 aniashev). Team: Anton Kim, Harsh Manwani, Anna Shevtsova/aniashev.
  • Activity window: 2026-01-26 → 2026-03-11 (~1.5 months of intense work), 292 commits.
§02

Stack

  • Languages: TypeScript (Bun runtime).
  • **Monorepo (Turborepo + Bun workspaces @deutsch/*): `apps/`: api (Elysia), queue. `packages/`: core (the reasoning engine), cartridges/cvd** (cardiology domain cartridge), db (Drizzle + TimescaleDB + pgvectorscale), client (TS SDK), adapters (popper TA2, phi), config-*.
  • AI: Vercel AI SDK 6 with a multi-provider registry and HIPAA-aware failover: Vercel Gateway → Azure OpenAI (BAA) → AWS Bedrock (BAA) → Anthropic → OpenAI; BYOK. Model aliases (reasoning-primary=Claude Sonnet 4.5, reasoning-fast/​response=Haiku 4.5, embeddings text-embedding-3-large 3072d). model-registry.json with rate limits, switch-provider.sh/model-query.sh, presets for Cerebras/Vertex.
  • Data: PostgreSQL 17 + TimescaleDB (hypertables: audit_events 6-year retention, session_activity 90 days, compression/retention policies) + pgvectorscale/pgvector (RAG: guideline_embeddings, interaction_embeddings). Multi-tenant (tenants, sessions).
  • Infra/deploy: Docker (apps/api/Dockerfile), KSA deploy (docker-compose.ksa.yml — Saudi Arabia), GitHub Actions CI (lint/typecheck/test/build/docker), Biome.
§03

What was shipped

The user is the lead developer; owns both the platform and a significant part of the engine.

  • ArgMed engine core (packages/core/src/engine/) — top author: debate-orchestrator, proposal-generator, claim-classifier, contradiction-detector, survivor-selector, confidence-calculator, htv-scorer, bold-rating, counter-hypothesis, idk-trigger, mode-enforcer, diversity-analyzer, snapshot-validator, session-manager, context-builder, output-validator.
  • AI layer (packages/core/src/ai): providers, embeddings, client, ArgMed Zod schemas (Generator/Verifier/Reasoner Output, HTVScore, ClaimType).
  • CVD cartridge (cardiology domain knowledge) — top author.
  • API/DB/queues: Elysia API, Drizzle + TimescaleDB schema, queue package.
  • Volume: 170/292 commits (~58%), including the non-trivial engine algorithms.
§04

Technical challenges

Confirmed by code (packages/core/src/engine/*).

  • ArgMed: three-agent debate (debate-orchestrator.ts) → Generator→Verifier→Reasoner pipeline with configuration: htvThreshold (minimum acceptance score for a hypothesis), maxRounds, claim-type coverage check with retry (validateClaimTypeCoverage, retryAttempted/retrySucceeded), framework-agnostic metrics callbacks (PhaseMetricsCallback — Prometheus/OTel) and SSE progress (PhaseStartCallback). Mature LLM-agent orchestration.
  • HTV scoring (Hard-To-Vary) (htv-scorer.ts) → implementation of David Deutsch's good-explanation criterion: measurement across Interdependence / Specificity / Parsimony / Falsifiability axes; thresholds refutation 0.3 / idk 0.4 / good 0.7 / excellent 0.85. A hypothesis below threshold → refuted or triggers the IDK protocol ("I don't know" instead of hallucination). This is rare, deliberate epistemic engineering (Popper/Deutsch in production).
  • Anti-hallucination through falsification → contradiction-detector, counter-hypothesis, survivor-selector, idk-trigger — the system rejects poorly grounded hypotheses instead of lying confidently. Critical for clinical settings.
  • HIPAA-aware multi-provider → failover chain with BAA providers (Azure/Bedrock), BYOK; the deutschGenerateText/Object/Embed abstraction with purpose-based routing.
  • Time-series + vector in one Postgres → TimescaleDB hypertables (retention/compression policies) + pgvectorscale for RAG; a single database instead of a storage zoo.
§05

AI-assisted development

  • Sessions found: the local Claude Code sessions directory for this project exists, but contains 0 `.jsonl` transcripts (cleared/not saved).
  • Indirectly: a detailed CLAUDE.md (11 KB), CONTRIBUTING.md, dense documentation (docs/01-architecture/02-argmed-framework.md referenced in code) — AI-assisted development, engineering-disciplined.
  • AI workflow patterns: no transcripts; but the repo is a textbook example of "AI-driven but strictly specified" development (Zod schemas, tests per engine module: *.test.ts for htv-scorer, debate, claim-classifier, survivor-selector, etc.).
§06

Achievements & metrics

  • The user: 170 commits (~58%), lead author of the core engine.
  • Engine: 15+ reasoning modules + 3-agent debate + HTV scoring on 4 axes + IDK protocol.
  • DB: 6 tables, 2 hypertables (6-year audit), 2 RAG vector tables, multi-tenant.
  • AI: 5-level HIPAA failover, 6 model aliases, registry with rate limits, Cerebras/Vertex/Azure/Bedrock presets.
  • Engine test coverage (bun:test per module).
  • KSA region deployment.
§07

Contributors

git shortlog · all branches

  1. Dave93170
  2. aniashev64
  3. Anton Kim53
  4. Harsh Manwani30
4 contributors317 commits total
Currently

Open to Senior / Staff engineering roles and selective freelance — production AI, platform, and full-stack work.

Get in touch