Clinica

deployed

Realtime Clinical Voice Translator

PythonFastAPIReactTypeScriptLiteRT GemmaTTSCloudflare Tunnel

Key Metrics EN↔TH realtime speech-to-speech, LiteRT Gemma unified ASR+MT, ~2s end-to-end latency

Problem

Foreign-language patients in Thai emergency departments face significant language barriers during clinical encounters. Generic translation tools (Google Translate, Pocketalk) lack clinical domain tuning — symptoms, negation, dosages, and drug names are frequently mistranslated. Patient audio leaving the premises for cloud translation creates PDPA compliance risk.

Solution

Clinica is a purpose-built clinical voice translator for clinician-patient conversations. Push-to-talk, hear the translation. Designed for bedside and ward workflows.

Key design decisions:

Unified ASR+MT pipeline via LiteRT Gemma — one model does speech recognition and translation in a single pass
Edge-first trajectory — models run locally or on-premise, no patient audio leaves the hospital
Turn-taking UX built for clinical workflows, not tourists
Privacy by design — local transcript logging for audit trail

Architecture

Mic → LiteRT Gemma (ASR+MT) → TTS → Browser playback
              ↕
        FastAPI server
              ↕
     Cloudflare Tunnel + Access

The frontend is a React 18 PWA with push-to-talk. The backend is FastAPI behind Cloudflare Access for email-OTP gating. Target deployment is fully on-device/edge.

Status

Deployed and iterating. Benchmark methodology under active refinement. Currently serving pilot users behind Cloudflare Access.

Tech Stack

Runtime: Python 3.12, FastAPI, Uvicorn
Frontend: React 18, TypeScript, Vite PWA
Model: LiteRT Gemma 4 (E4B-it)
Deploy: Docker Compose, systemd, Cloudflare Tunnel + Access

View on GitHub →