Back to archive
II. Clinical AI & Health Platformssupportingcontributorclient anonymised

Anon Service — PII ↔ PHI Bridge

Internal microservice that maintains 1:1 correspondence between "canonical" PII user identifiers and anonymous UUIDs for PHI records; encryption via HashiCorp Vault, service-to-service authorization via Keycloak.

Status
active
Period
2025-10-13 → 2025-12-09
AI sessions
Stack
Languages
TypeScript
Frameworks · Infra
BunElysiaDrizzle ORMPostgresHashiCorp Vault (node-vault)Keycloak/jose JWTBiome
§01

Overview

  • What it is: privacy/anonymization microservice within the healthcare-platform healthcare ecosystem (realm healthcare-platform-internal-service). Accepts piiUserId → returns a deterministic anonymous UUID (anonUserId); supports reverse re-identification (strictly limited), rotation, and GDPR deletion. Interfaces with sibling services pii-service and phi-service (also in the analysis queue — #46, #44).
  • Type / status / role: api (internal REST microservice) / active (last commit 2025-12-09, CI/deploy in place) / contributor — the core domain logic was written by Ramiro (28+1 commits), the user Davron Yuldashev — 11 commits, mostly DevOps/infra.
  • Active period: 2025-10-13 → 2025-12-09 (~2 months), 40 commits, active development with a fix/health-endpoints branch and merge flow.
§02

Stack

  • Languages: TypeScript (strict), runtime Bun.
  • Frameworks/libraries (from `package.json` and code):
  • Elysia 1.4 (+ @elysiajs/cors, @elysiajs/openapi/Swagger) — HTTP framework on Bun.
  • Drizzle ORM 0.44 + drizzle-kit + postgres (postgres.js) — Postgres access and migrations.
  • HashiCorp Vault via node-vault — Transit encryption (AES-256-GCM) + KV + AppRole.
  • Keycloak + jose 6 — RS256 JWT verification via JWKS for service-to-service auth.
  • Biome — lint/format (instead of ESLint/Prettier).
  • Infra/deploy: Docker multi-stage (bun:1.3.3-slim, non-root user), separate Dockerfile.migrate, docker-compose.yml + docker-compose-dev.yml with healthchecks. CI: GitLab CI (.gitlab/ci/_common.yml, build.yml, deploy.yml) + GitHub workflows. Vault host https://vt.ksa.healthcare-platform.com.
  • Data: PostgreSQL. 2 tables (drizzle/schema.ts): id_mappings (ciphertexts + bytea blind index, status enum active/rotated/deleted, version, soft-delete deletedAt) and mapping_audit (audit log: caller, action, realm, indices only — no plaintext). Custom bytea type via customType (Drizzle doesn't ship it out of the box).
  • Notable tooling: OpenAPI spec (docs/openapi.yaml), architectural docs with Mermaid (docs/ARCHITECTURE.md, DATA_FLOW.md, SEQUENCE_DIAGRAMS.md).
§03

What was shipped

Project-wide chronology (for context) and separately — what the user did.

Project overall (mostly Ramiro):

  • Service initialization, core: app/db/vault/health (cd4658a, 366813a).
  • Auth + encryption + metrics: JWT verification, HMAC, Transit encrypt/decrypt (b8f608f).
  • Mappings controller → renamed to identities (GDPR semantics), endpoints create/lookup/delete (f8fa0c2, e8c5c3d).
  • Service-to-service auth via Keycloak + RBAC plugins (f08945b), CORS + internal routes (e97f8ff).
  • Vault/Keycloak config, init/status scripts (d0935fe); custom bytea + boolean success + index on created_at (4b67b1a).

User's contribution (git log --author="Davron", 11 commits — DevOps/infra-leaning):

  • CI/CD + Docker dev environment (1d99301, 8 files): GitLab CI pipeline (build/deploy stages), docker-compose-dev.yml with Postgres, Dockerfile.migrate, multi-stage Dockerfile.
  • Container hardening (37c72d8, b9081c0): non-root user (groupadd/useradd), ownership fix, bun:1.3-slim for builder and prod.
  • Health probes (f0ae025, 385+/1492−): liveness/readiness endpoints with typed response schemas + OpenAPI descriptions; dependency updates (rebuild of bun.lock).
  • Swagger/config (e674d63, c2b1e1a, 47bbfc2), docker-compose depends_on/healthcheck (bc95012), final CI/deploy tweaks (ae86fe6).
  • Vault path + debug logging (4a1a524) — ⚠️ see "Technical challenges".
  • Volume: 11/40 commits (~28%), concentrated in infrastructure and operational readiness, not the domain crypto logic.
§04

Technical challenges

Only what is confirmed in code (with paths/hashes).

  • Deterministic search over encrypted datablind index pattern: pii_user_id is stored encrypted (Vault Transit, crypto.ts/mapping-service.ts:62), and for lookup an HMAC-SHA256(key, value) is computed (services/hmac.ts:23) and placed into a bytea column. Lookup happens through the index (mapping-service.ts:41-44), no plaintext in the DB. A strong privacy-engineering solution (Senior level) — *Ramiro's authorship*.
  • Audit without PII leakage → the mapping_audit table writes only blind indices and metadata (caller, action, success), not the identifiers themselves (schema.ts:43-55, mapping-service.ts:8-28). GDPR-friendly. *Ramiro's authorship.* ⚠️ audit() swallows errors with an empty catch {} — audit-write failures go unnoticed.
  • Service-to-service RBAC on Keycloakmiddleware/service-auth.ts: RS256 verification via createRemoteJWKSet, iss check, requirement of a service-account (preferred_username starts with service-account-), extracting realm + client roles; composable Elysia plugins requireServiceRole/requireServiceClient/requireServiceClientWithRole. Clean and well-typed. *Ramiro's authorship.*
  • Soft-delete + status machine → enum active/rotated/deleted, versioning, deletedAt (GDPR deletion as a marker, mapping-service.ts:120-173).
  • User's contribution — operational readiness: containerization (multi-stage, non-root), CI/CD pipeline (GitLab), Kubernetes-style liveness/readiness probes with OpenAPI schemas, healthcheck dependencies in compose. Real DevOps/platform skill for a security-critical service.
  • ⚠️ Security issues found (honest — do NOT show publicly as-is): 1. config/vault.ts:31-33strictSSL: false / rejectUnauthorized: false: TLS verification to Vault is disabled. 2. services/hmac.ts:8,11,19-22 and mapping-service.ts:35-37console.log prints the HMAC key, plaintext `piiUserId`, blind index and Vault response. This leaks into logs exactly the secrets/PII the service exists to protect; the README explicitly declares "No raw IDs in logs". These debug logs were added by the user in commit 4a1a524 ("Added console logging ... of piiUserId and computed indices", "Enhanced logging in hmac service"). A real bug + reputational risk.
§05

AI-assisted development

  • Sessions found: 0. Verified via the full-path normalization key for this project — no matches. (The commit-message style and .cursor/ in the repo hint at possible Cursor use, but there is no direct evidence in the local Claude Code sessions directory.)
  • What was done with AI: no data.
  • AI-workflow patterns: none.
  • No sessions.
§06

Achievements & metrics

From code/docs, no speculation:

  • 5 role-based operations of identity mapping (write/read/reidentify/delete/rotate), tied to Keycloak roles anon:*.
  • 2 tables + 3 migrations; enum status machine + versioning + soft-delete.
  • Full set of health probes: /health/live, /health/ready, /api/v1/internal/health (+ JWKS status).
  • Integration with 3 systems: Vault (Transit+KV+AppRole), Keycloak (JWKS/JWT), Postgres.
  • CI/CD on 2 platforms (GitLab + GitHub Actions), Docker multi-stage + migration image.
  • No load/scale metrics (internal proprietary service).
§07

Contributors

git shortlog · all branches

  1. Dave9311
  2. Ramiro29
2 contributors40 commits total
Currently

Open to Senior / Staff engineering roles and selective freelance — production AI, platform, and full-stack work.

Get in touch