Clinica

deployed

Realtime Clinical Voice Translator

PythonFastAPIReactTypeScriptLiteRT GemmaTTSCloudflare Tunnel
Key Metrics EN↔TH realtime speech-to-speech, LiteRT Gemma unified ASR+MT, ~2s end-to-end latency

Problem

Foreign-language patients in Thai emergency departments face significant language barriers during clinical encounters. Generic translation tools (Google Translate, Pocketalk) lack clinical domain tuning — symptoms, negation, dosages, and drug names are frequently mistranslated. Patient audio leaving the premises for cloud translation creates PDPA compliance risk.

Solution

Clinica is a purpose-built clinical voice translator for clinician-patient conversations. Push-to-talk, hear the translation. Designed for bedside and ward workflows.

Key design decisions:

  • Unified ASR+MT pipeline via LiteRT Gemma — one model does speech recognition and translation in a single pass
  • Edge-first trajectory — models run locally or on-premise, no patient audio leaves the hospital
  • Turn-taking UX built for clinical workflows, not tourists
  • Privacy by design — local transcript logging for audit trail

Architecture

Mic → LiteRT Gemma (ASR+MT) → TTS → Browser playback

        FastAPI server

     Cloudflare Tunnel + Access

The frontend is a React 18 PWA with push-to-talk. The backend is FastAPI behind Cloudflare Access for email-OTP gating. Target deployment is fully on-device/edge.

Status

Deployed and iterating. Benchmark methodology under active refinement. Currently serving pilot users behind Cloudflare Access.

Tech Stack

  • Runtime: Python 3.12, FastAPI, Uvicorn
  • Frontend: React 18, TypeScript, Vite PWA
  • Model: LiteRT Gemma 4 (E4B-it)
  • Deploy: Docker Compose, systemd, Cloudflare Tunnel + Access

View on GitHub →