HackathonParty

Framing the Problem

Clinicians spend up to half of their workday manually typing or dictating notes into electronic medical records (EMRs).

This not only consumes valuable patient time but also leads to burnout and incomplete documentation.

Traditional speech-to-text tools convert voice to plain text — but not to structured clinical notes that are EMR-ready.

DocScribe bridges this gap by automatically turning a clinician’s spoken narrative into a structured, validated SOAP note, helping providers document care in seconds.

Idea Explanation

What is your idea?

DocScribe is an AI-powered voice-to-note assistant for healthcare providers.

It listens to a clinician’s dictation and automatically produces a structured, explainable medical note with fields like chief complaint, assessment, diagnosis, plan, and follow-up.

How does it fix the problem?

By combining OpenAI’s Whisper for speech-to-text and Google’s Flan-T5 for clinical extraction, DocScribe transforms disorganized speech into a consistent, EMR-ready format — reducing administrative time and minimizing transcription errors.

Implementation

1. Frontend: Streamlit web app for audio upload, mic recording, and note display.

2. Backend:

• ASR Module (asr_whisper.py) — Converts speech to text using OpenAI Whisper.

• Extractor (extract_clinical.py) — Uses Flan-T5 to parse structured note fields.

• Composer (compose_note.py) — Assembles SOAP and summary outputs.

3. Evaluation: Lightweight F1 metric benchmarking (for correctness & precision).

4. Storage: Notes export as .json or .txt — easily integrable with EMR APIs.

Data Flow:

🎙️ Voice Input → Whisper ASR → Transcript

→ Flan-T5 Extraction → Structured JSON → SOAP Formatter → Download / Export

Challenges

• Latency trade-offs: Balancing speed vs accuracy between small and large Flan-T5 models.

• Prompt grounding: Ensuring that outputs only contain text present in the transcript (no hallucinations).

• Audio variability: Handling background noise and pauses in real-world recordings.

• Formatting alignment: Ensuring consistency across SOAP, JSON, and summary outputs.

How we overcame them:

• Cached model pipelines with @st.cache_data to reduce load time.

• Added rule-based validators and regex filters for faithful extraction.

• Integrated live mic recording with auto-stop detection for smoother demos.

Accomplishments

Built a fully functional end-to-end Streamlit app in under 48 hours.
Achieved 0.74 mean F1 across clinical case benchmarks (CAP, UTI, sprain).
Successfully integrated speech + language + clinical structure extraction into one pipeline.

• Gained deeper understanding of prompt engineering and responsible AI in clinical contexts.

Next Steps

• Fine-tune Flan-T5 or Bio_ClinicalBERT on de-identified clinical transcripts for higher precision.

• Deploy DocScribe Cloud API for EMR integration and hospital pilots.

• Add real-time summarization and multi-speaker diarization for team notes.

• Integrate Voice Activity Detection (VAD) for hands-free real-time dictation.

• Extend explainability overlays to visualize which transcript segments influenced each field.

DocScribe