10-Week Program
Curriculum
Three tracks, one cohort. Every track shares the same clinical cases and core modules, but with different depth, tools, and deliverables tailored to your background.
Three Tracks, One Learning System
Not three separate courses — a unified medical AI learning system with three entry points. Track A feeds into B, B feeds into C, and C’s governance constraints become A/B’s design boundaries.
Track A — AI Principles
Code-First
- Who: Pre-med, CS, STEM undergrads, technical learners
- Focus: Hands-on ML/DL with medical datasets using Colab, PyTorch, and Claude Code
- Capstone: Build and evaluate a clinical AI model or multi-agent workflow
Tools
Track B — Clinical Applications
Evaluate & Apply
- Who: Medical students, residents, nurses, pharmacists, researchers
- Focus: AI evaluation, paper critique, deployment readiness assessment
- Capstone: Clinical utility memo, paper critique, or deployment recommendation
Tools
Track C — Executive & Implementation
Decide & Deploy
- Who: Department heads, innovation teams, CMOs/CIOs, clinical leaders
- Focus: AI governance, procurement, ROI modeling, organizational adoption
- Capstone: Board-ready AI strategy deck with vendor evaluation and governance plan
Tools
Weekly Rhythm
Every week follows a consistent structure — case-first, principle-driven, paper-backed, discussion-closed.
Clinical case opening
AI principles deep-dive
Clinical application & limitations
Paper spotlight (latest research)
Dual-track breakout / discussion
10-Week Syllabus
Week 01AI in Medicine: History, Hype & the Agent Era
Track A: Build
Lecture
From Symbolic AI to the Agent Era
- •AI evolution: symbolic -> ML -> DL -> transformer -> LLM -> agent
- •Medical AI milestones: MYCIN -> CheXNet -> AlphaFold -> Med-PaLM -> agentic workflows
- •2026 SOTA landscape: GPT-5, Claude, Gemini, Llama 4, Qwen 3 -- open vs closed
Lab
Lab A1: AI Timeline + First LLM Interaction
Build an interactive AI/medical AI timeline with Claude. Compare 3 LLMs on the same clinical question.
Assignment
One-page reflection: AI's greatest potential and greatest risk in medicine
Suggested Reading
Topol, Deep Medicine (2019) -- Ch. 1 overview
Track B: Judge
Lecture
AI in Your Clinic -- Hype vs Reality
- •Clinical perspective: MYCIN (1976) -> CDSS -> CheXNet -> Med-PaLM -> 2026 agents
- •AI milestones vs actual clinical adoption gap
- •Hype cycle psychology: overestimate short-term, underestimate long-term
Lab
Workshop B1: LLM Clinical Task Experience
Same clinical vignette across Claude, ChatGPT, Gemini. Compare DDx lists, recommended tests, and dangerous omissions.
Assignment
One-page reflection on first LLM clinical task observation
Suggested Reading
Topol, Deep Medicine (2019) -- clinician perspective on AI
Track C: Deploy
Lecture
The AI Hype Cycle -- Lessons from $62B in Failures
- •IBM Watson Health full case study: $4B investment, promises vs delivery, why it collapsed
- •2026 landscape: $22B+ market, top funded companies, M&A signals
- •Where is your hospital on the hype cycle?
Lab
Decision Lab C1: IBM Watson Health Post-Mortem
Analyze timeline, investment decisions, and org failures. Produce a 1-page decision error chain analysis.
Assignment
Read IBM Watson case + write: the most likely AI procurement mistake at your hospital
Suggested Reading
Strickland, IBM Watson Health's Rocky Journey (IEEE Spectrum)
Week 02Healthcare Data: Not a Clean CSV
Track A: Build
Lecture
Healthcare Data Reality
- •EHR structure: FHIR, ICD, CPT, LOINC
- •Four data challenges: missingness, label noise, dataset shift, temporal leakage
- •HIPAA / de-identification basics
Lab
Lab A2: Medical Data EDA
Explore MIMIC-IV demo subset in Colab. Find missing patterns, plot distributions, identify dataset shift signs.
Assignment
Data quality memo: 3 EDA issues found + suggested remediation
Track B: Judge
Lecture
What AI Sees vs What You See
- •EHR pitfalls: note bloat, copy-forward, coding drift, missingness patterns
- •Dataset shift & population mismatch: why AI works at Stanford but breaks at community hospitals
- •Image data traps: scanner variance, label quality, selection bias
Lab
Workshop B2: Paper Data Source Audit
Audit a medical AI paper's data: source, labels, inclusion criteria, external validation, bias risks.
Assignment
Completed data audit table + summary: data quality score (1-10) with justification
Track C: Deploy
Lecture
Data Governance -- The Boring Foundation
- •Data governance maturity model: chaos -> managed -> optimized
- •Legal & commercial barriers: BAA, de-identification, data licensing
- •Case: Cleveland Clinic's 3-year journey to AI-ready data
Lab
Decision Lab C2: Vendor Data Due Diligence
Evaluate a radiology AI vendor's data sheet (designed with gaps). Produce a due diligence report.
Assignment
Data due diligence checklist + pass / flag / reject decision
Week 03Classical ML & Clinical Prediction
Track A: Build
Lecture
Baselines That Actually Work
- •Logistic regression, decision trees, random forest, gradient boosting
- •Medical prediction tasks: readmission, sepsis risk, triage priority
- •Evaluation metrics in clinical context: sensitivity vs specificity vs PPV vs NPV
Lab
Lab A3: Build a Readmission Predictor
Train 30-day readmission models with scikit-learn. Run ROC, confusion matrix, calibration plot. Choose clinical threshold.
Assignment
Technical memo: model results + threshold selection rationale + 3 failure modes
Suggested Reading
Roberts et al., Common Pitfalls in ML for Healthcare (Nat Med 2021)
Track B: Judge
Lecture
Metrics That Matter (and Metrics That Lie)
- •Sensitivity/specificity/PPV/NPV shift with prevalence
- •AUC myth: high AUC does not equal clinical utility
- •Net benefit & decision curve analysis: beyond accuracy
Lab
Workshop B3: Threshold Trade-off Exercise
Sepsis model at AUC=0.85: pick thresholds for ICU attending vs ED triage nurse. Analyze CDSS alert fatigue scenario.
Assignment
Threshold decision worksheet + alert fatigue case analysis + written rationale
Suggested Reading
Van Calster et al., Calibration: the Achilles Heel of Predictive Analytics (BMC Med 2019)
Track C: Deploy
Lecture
AI Metrics -- What the Dashboard Should Show You
- •Non-technical explanation of sensitivity, specificity, PPV, NPV
- •Why 95% accuracy can be useless; AUC in 1 minute
- •Metrics that matter: clinical impact, workflow efficiency, error rate reduction
Lab
Decision Lab C3: From High-AUC to Procurement Decision
Sepsis AI with AUC=0.88. Recalculate PPV at your prevalence, assess workflow impact, write procurement recommendation.
Assignment
Procurement decision memo: buy / defer / reject with conditions
Suggested Reading
Van Calster et al., Calibration (BMC Med 2019) -- executive-accessible version
Week 04Deep Learning & Medical Imaging
Track A: Build
Lecture
From CheXNet to Foundation Models
- •CNN intuition: convolution, pooling, feature maps; transfer learning
- •CheXNet (2017) -> CheXpert -> MIMIC-CXR -> 2026 radiology foundation models
- •Common pitfalls: shortcut learning, label leakage, scanner-specific bias
Lab
Lab A4: CXR Classification Notebook
Fine-tune a pre-trained model for CXR binary classification. Run Grad-CAM to visualize model attention.
Assignment
Grad-CAM screenshots + analysis: is the model learning the right features?
Suggested Reading
Rajpurkar et al., CheXNet (2017) + follow-up critique
Track B: Judge
Lecture
Radiology AI -- Promise and Pitfalls
- •Shortcut learning: models read labels, not pathology
- •External validation reality: NIH performance vs Mumbai deployment
- •Deployment evidence: radiologist+AI vs radiologist alone vs AI alone
Lab
Workshop B4: Paper Critique #1 -- CXR AI
Structured critique of a CXR AI paper: problem definition, data quality, method, results, overstated conclusions.
Assignment
Complete paper critique report using provided template
Suggested Reading
Seyyed-Kalantari et al., Underdiagnosis Bias in CXR AI (Nat Med 2021)
Track C: Deploy
Lecture
Deploying Radiology AI -- The First 100 Days
- •200+ FDA-cleared radiology AI products, but how many are truly deployed?
- •Pilot design 101: population, duration, success/exit criteria
- •Monitoring: model drift, performance degradation, feedback loops
Lab
Decision Lab C4: CXR AI Pilot Charter + CDSS Comparison
Design pilot charters for episodic (CXR triage) vs continuous (CDSS drug interaction) AI. Compare KPIs and rollback plans.
Assignment
Two pilot charters (CXR + CDSS) + comparison analysis: episodic vs continuous AI deployment
Suggested Reading
Mayo Clinic AI Deployment Framework
Week 05Transformers, LLMs & Clinical NLP
Track A: Build
Lecture
Language Models in Medicine
- •Attention mechanism -> transformer architecture
- •Pre-training -> fine-tuning -> RLHF -> instruction tuning -> tool use
- •Clinical NLP: note summarization, ICD coding, patient education; hallucination danger
Lab
Lab A5: Clinical Text Pipeline + CDSS Prototype
Build clinical note extraction pipeline + LLM-based medication safety checker via Claude API. Compare with rule-based checker.
Assignment
Pipeline code + 3-case hallucination audit table + CDSS alert accuracy analysis
Suggested Reading
Singhal et al., Med-PaLM 2 (2023)
Track B: Judge
Lecture
Clinical LLMs -- Capabilities, Failures & Hallucination
- •USMLE scores vs bedside gap: exams != clinical care
- •Hallucination taxonomy: fabricated citations, plausible-but-wrong reasoning
- •Automation bias: why you unconsciously trust AI
Lab
Workshop B5: LLM Head-to-Head Clinical Comparison
5 clinical + 2 patient-facing cases across 3 LLMs. Score DDx completeness, hallucination, safety, and patient-friendliness.
Assignment
7-case LLM comparison table + summary: should clinician-facing vs patient-facing AI have different standards?
Suggested Reading
Singhal et al., Med-PaLM 2 (2023) + benchmark vs bedside critique
Track C: Deploy
Lecture
Ambient Scribe & Documentation AI -- The $18B Question
- •Market: Nuance DAX, Abridge, Nabla, Suki -- real value is workflow redesign
- •Evidence gap: which products have RCTs vs testimonials only
- •Risk: hallucination in clinical notes, liability, patient consent, data residency
Lab
Decision Lab C5: Documentation AI + Patient Chatbot Vendor Evaluation
Score 3 ambient scribe vendors + 1 patient chatbot vendor. Use LLM to find red flags. Calculate TCO.
Assignment
Vendor evaluation scorecard (documentation AI + patient chatbot) + final recommendation memo
Suggested Reading
LLM Chatbot for Care Transitions (Nature Medicine 2026)
Week 06The Agent Era: Coding Agents & Multi-Agent Systems
Track A: Build
Lecture
The Agent Era -- Beyond Prompting
- •Prompting -> tool use -> coding agents -> multi-agent orchestration
- •Claude Code, Codex CLI, Cursor, Windsurf: positioning & capabilities
- •OpenClaw architecture: agent definition, memory, context engineering, task routing
Lab
Lab A6: Build a Medical AI Project with Claude Code
Use natural language + Claude Code to build from scratch: sepsis warning pipeline OR LLM-CDSS with RAG + review UI.
Assignment
Claude Code session log + final project + 1-page reflection: what the agent helped vs deceived
Suggested Reading
Anthropic, Vibe Physics (2026) -- AI as research collaborator, eager-to-please problem
Track B: Judge
Lecture
The Clinician as AI Director
- •You don't need to code -- you need to describe problems precisely
- •Vibe Physics case: Harvard professor uses Claude Code for theoretical physics
- •Medical scenarios: natural language -> clinical calculator prototype
Lab
Workshop B6: Natural Language -> Clinical Prototype
Direct Claude Code via natural language to build a CHA2DS2-VASc calculator, drug interaction checker, or discharge summary generator.
Assignment
Clinical tool specification (natural language) + agent output review notes: what's right, wrong, and how you fixed it
Suggested Reading
Schwartz, Vibe Physics (Anthropic 2026)
Track C: Deploy
Lecture
The AI Stack -- What Executives Must Understand
- •AI stack: foundation model -> application -> workflow -> governance layer
- •Open-source vs closed-source strategy: cost, control, compliance
- •From chatbot -> coding agent -> multi-agent: Claude Code, Codex, OpenClaw
Lab
Decision Lab C6: Hospital AI Tooling Stack Workshop
Map your hospital's 4-layer AI stack: model, application, workflow, governance. Define build vs buy vs partner decisions.
Assignment
Hospital AI tooling stack diagram + build/buy/partner decision matrix
Week 07Model Evaluation, Reproducibility & Failure Modes
Track A: Build
Lecture
When Good Metrics Go Bad
- •AUC trap: high AUC != clinically useful; calibration & net benefit
- •Subgroup performance: disparities across age, sex, race
- •Reproducibility crisis: why paper results fail in your hands
Lab
Lab A7: Reproduce & Critique
Subgroup analysis, calibration plot, and failure mode identification on your Week 3/4 models. Compare against published results.
Assignment
Evaluation report: subgroup results + calibration + failure mode + improvement suggestions
Suggested Reading
Roberts et al., Common Pitfalls in ML for Healthcare (Nat Med 2021)
Track B: Judge
Lecture
Advanced Failure Modes in Medical AI
- •Leakage, shortcut learning, subgroup disparity
- •Reproducibility: why you can't replicate paper results
- •p-hacking in ML: model/metric/dataset selection degrees of freedom
Lab
Workshop B7: Paper Critique #2 -- Find the Flaw
Critique 2 traditional + 2 CDSS/patient chatbot papers (Nature Medicine 2026, NEJM AI 2025). Find hidden flaws.
Assignment
Full critique of 2+ papers (including 1 CDSS/chatbot) + 3 major questions for authors
Suggested Reading
LLM Chatbot for Mental Health Treatment (NEJM AI 2025, RCT)
Track C: Deploy
Lecture
AI ROI -- Beyond the Vendor Slide Deck
- •ROI structure: cost avoidance vs revenue vs quality vs risk reduction
- •Hidden costs: integration, training, workflow redesign, ongoing monitoring
- •Exit strategy: when to shut down an AI tool
Lab
Decision Lab C7: ROI Calculator Workshop
Calculate direct/indirect ROI for a 6-month sepsis AI pilot. Run sensitivity analysis on false positive rate changes.
Assignment
ROI worksheet + sensitivity analysis + go/no-go recommendation
Suggested Reading
Kaiser Permanente AI ROI Framework
Week 08Multi-Agent Systems & Hospital Automation
Track A: Build
Lecture
Multi-Agent Systems for Healthcare
- •Single agent vs multi-agent: when to decompose tasks
- •OpenClaw deep dive: agent definitions, memory model, context management
- •Medical multi-agent: literature review + analysis + report generation pipelines
Lab
Lab A8: OpenClaw Medical AI Pipeline Demo
Interact with a pre-built 4-agent pipeline: paper intake -> method extraction -> PubMed search -> structured review report.
Assignment
Design your own 3-agent medical workflow: text description + agent definitions + expected I/O
Suggested Reading
Anthropic, Long-Running Claude for Scientific Computing (2026)
Track B: Judge
Lecture
Multi-Agent Systems -- What Clinicians Need to Know
- •Why one AI isn't enough: specialized agents for evidence synthesis
- •OpenClaw: methodology agent + bias agent + clinical agent collaboration
- •Human-in-the-loop: what to automate vs what requires your eyes
Lab
Workshop B8: OpenClaw Evidence Synthesis Demo
Live demo of multi-agent paper review. Modify an agent prompt and observe output changes. Can this replace your journal club?
Assignment
Describe a multi-agent clinical workflow you want + why multiple agents + where human review is mandatory
Track C: Deploy
Lecture
Multi-Agent AI in Hospital Operations
- •Agentic workflow scenarios: QA routing, prior auth, clinical trial matching, bed management
- •CDSS agentic pipeline: order intake -> RAG search -> LLM reasoning -> severity routing
- •Risk & governance: what can be fully automated vs human-approved
Lab
Decision Lab C8: OpenClaw Hospital Automation Demo
Live demo of QA event routing pipeline. Design an agentic workflow for your hospital process. Estimate FTE replacement ROI.
Assignment
Agentic workflow design + cost-benefit sketch
Suggested Reading
Anthropic, Long-Running Claude for Scientific Computing (2026)
Week 09Regulation, Ethics & Safe Deployment
Track A: Build
Lecture
Building Within Boundaries
- •FDA SaMD classification: 510(k) / De Novo / PMA pathways
- •EU AI Act high-risk classification; WHO 2025 guidance on health LLMs
- •Liability, model cards, datasheets for datasets, algorithmic impact assessments
Lab
Lab A9: Compliance Constraint Checklist
Apply compliance checklist to your Week 6 Claude Code project: de-identification, intended use, FDA level, monitoring plan.
Assignment
Completed compliance checklist + revised project scope
Suggested Reading
FDA SaMD Framework + WHO Guidance on Health LLMs (2025)
Track B: Judge
Lecture
Using AI Safely in Clinical Practice
- •FDA SaMD framework: what level is your AI tool?
- •WHO 6 principles for health LLMs; EU AI Act implications
- •Liability: if AI is wrong, who is responsible -- you or the vendor?
Lab
Workshop B9: Draft a Safe-Use Protocol
Write a safe-use protocol for an AI clinical note summarizer: intended use, human review nodes, incident reporting, exit criteria.
Assignment
Completed safe-use protocol using provided template
Suggested Reading
WHO Guidance on Ethics & Governance of LLMs in Health (2025)
Track C: Deploy
Lecture
Building an AI Governance Program
- •Governance 4 pillars: policy, process, people, technology
- •Committee structure: AI governance board, clinical AI review, IT security
- •Patient chatbot governance: consent, scope limits, emergency escalation, adverse event reporting
Lab
Decision Lab C9: Governance Playbook Workshop
Build a full AI governance playbook: RACI matrix, risk classification, incident response, patient chatbot governance checklist.
Assignment
Governance playbook + RACI matrix + patient chatbot governance checklist
Suggested Reading
FDA SaMD Framework + EU AI Act + WHO LLM Guidance (2025)
Week 10Capstone: Demo Day
Track A: Build
Lecture
Capstone Presentations
- •5-min demo: data -> model -> evaluation pipeline + Claude Code / OpenClaw process
- •Technical memo (2-3 pages): problem, data, methods, results, limitations, compliance
- •Peer review: evaluate another student's project using Track B frameworks
Lab
Lab A10: Demo Day + Peer Review
Present your end-to-end medical AI project. Receive structured peer feedback across technical correctness, clinical reasoning, and agent tool usage.
Assignment
Final deliverables: notebook + technical memo + demo + peer review
Track B: Judge
Lecture
Capstone Presentations
- •AI Tool Evaluation Memo (3-4 pages): evidence quality, clinical usability, risk, recommendation
- •Trust / Use-with-caution / Reject recommendation + safe-use protocol
- •5-min presentation to simulated hospital committee
Lab
Workshop B10: Demo Day + Peer Review
Present your AI tool evaluation to a simulated hospital committee. Receive peer feedback on evidence assessment and clinical judgment.
Assignment
Final deliverables: evaluation memo + peer review + 5-min presentation
Track C: Deploy
Lecture
Executive AI Strategy Simulation
- •Scenario: 500-bed hospital, $2M AI budget, 12 months to deploy 2 use cases
- •Executive strategy memo (3-5 pages): use cases, vendor eval, roadmap, governance, ROI
- •10-min board presentation + peer challenge from other teams
Lab
Decision Lab C10: Board Presentation + Peer Challenge
Present your AI strategy to a simulated board. Defend: why these use cases? What if the first one fails? What are competitors doing?
Assignment
Final deliverables: executive strategy memo + board presentation + peer challenge
Anchor Cases
All three tracks revisit these clinical anchors from different angles — building shared language across disciplines.
CXR / Radiology AI
From CNN architecture to reader studies, workflow integration, and procurement evaluation.
EHR / Clinical Note Summarization
From transformer embeddings to hallucination risk, documentation support, and vendor assessment.
Sepsis / Deterioration Prediction
From risk score modeling to threshold-setting, clinical utility, and deployment monitoring.
Assessment
We don’t test who memorizes AI jargon best. We assess who can define problems, match models to tasks, evaluate evidence, and judge clinical safety.
Ready to Choose Your Track?
Spring 2026 cohort now forming. All three tracks welcome — pick the one that matches your background.
Apply Now