10-Week Program

Curriculum

Three tracks, one cohort. Every track shares the same clinical cases and core modules, but with different depth, tools, and deliverables tailored to your background.

Three Tracks, One Learning System

Not three separate courses — a unified medical AI learning system with three entry points. Track A feeds into B, B feeds into C, and C’s governance constraints become A/B’s design boundaries.

Build

Track A — AI Principles

Code-First

  • Who: Pre-med, CS, STEM undergrads, technical learners
  • Focus: Hands-on ML/DL with medical datasets using Colab, PyTorch, and Claude Code
  • Capstone: Build and evaluate a clinical AI model or multi-agent workflow

Tools

Google ColabPyTorchscikit-learnHugging FaceClaude CodeOpenClaw
Judge

Track B — Clinical Applications

Evaluate & Apply

  • Who: Medical students, residents, nurses, pharmacists, researchers
  • Focus: AI evaluation, paper critique, deployment readiness assessment
  • Capstone: Clinical utility memo, paper critique, or deployment recommendation

Tools

No-code templatesPaper critique frameworksLLM comparison toolsDecision dashboards
Deploy

Track C — Executive & Implementation

Decide & Deploy

  • Who: Department heads, innovation teams, CMOs/CIOs, clinical leaders
  • Focus: AI governance, procurement, ROI modeling, organizational adoption
  • Capstone: Board-ready AI strategy deck with vendor evaluation and governance plan

Tools

ROI calculatorsVendor evaluation matrixGovernance checklistsPilot roadmap templates

Weekly Rhythm

Every week follows a consistent structure — case-first, principle-driven, paper-backed, discussion-closed.

20 min

Clinical case opening

25 min

AI principles deep-dive

25 min

Clinical application & limitations

15 min

Paper spotlight (latest research)

15–30 min

Dual-track breakout / discussion

10-Week Syllabus

Week 01

AI in Medicine: History, Hype & the Agent Era

Track A: Build

Lecture

From Symbolic AI to the Agent Era

  • AI evolution: symbolic -> ML -> DL -> transformer -> LLM -> agent
  • Medical AI milestones: MYCIN -> CheXNet -> AlphaFold -> Med-PaLM -> agentic workflows
  • 2026 SOTA landscape: GPT-5, Claude, Gemini, Llama 4, Qwen 3 -- open vs closed

Lab

Lab A1: AI Timeline + First LLM Interaction

Build an interactive AI/medical AI timeline with Claude. Compare 3 LLMs on the same clinical question.

Assignment

One-page reflection: AI's greatest potential and greatest risk in medicine

Suggested Reading

Topol, Deep Medicine (2019) -- Ch. 1 overview

Track B: Judge

Lecture

AI in Your Clinic -- Hype vs Reality

  • Clinical perspective: MYCIN (1976) -> CDSS -> CheXNet -> Med-PaLM -> 2026 agents
  • AI milestones vs actual clinical adoption gap
  • Hype cycle psychology: overestimate short-term, underestimate long-term

Lab

Workshop B1: LLM Clinical Task Experience

Same clinical vignette across Claude, ChatGPT, Gemini. Compare DDx lists, recommended tests, and dangerous omissions.

Assignment

One-page reflection on first LLM clinical task observation

Suggested Reading

Topol, Deep Medicine (2019) -- clinician perspective on AI

Track C: Deploy

Lecture

The AI Hype Cycle -- Lessons from $62B in Failures

  • IBM Watson Health full case study: $4B investment, promises vs delivery, why it collapsed
  • 2026 landscape: $22B+ market, top funded companies, M&A signals
  • Where is your hospital on the hype cycle?

Lab

Decision Lab C1: IBM Watson Health Post-Mortem

Analyze timeline, investment decisions, and org failures. Produce a 1-page decision error chain analysis.

Assignment

Read IBM Watson case + write: the most likely AI procurement mistake at your hospital

Suggested Reading

Strickland, IBM Watson Health's Rocky Journey (IEEE Spectrum)

Week 02

Healthcare Data: Not a Clean CSV

Track A: Build

Lecture

Healthcare Data Reality

  • EHR structure: FHIR, ICD, CPT, LOINC
  • Four data challenges: missingness, label noise, dataset shift, temporal leakage
  • HIPAA / de-identification basics

Lab

Lab A2: Medical Data EDA

Explore MIMIC-IV demo subset in Colab. Find missing patterns, plot distributions, identify dataset shift signs.

Assignment

Data quality memo: 3 EDA issues found + suggested remediation

Track B: Judge

Lecture

What AI Sees vs What You See

  • EHR pitfalls: note bloat, copy-forward, coding drift, missingness patterns
  • Dataset shift & population mismatch: why AI works at Stanford but breaks at community hospitals
  • Image data traps: scanner variance, label quality, selection bias

Lab

Workshop B2: Paper Data Source Audit

Audit a medical AI paper's data: source, labels, inclusion criteria, external validation, bias risks.

Assignment

Completed data audit table + summary: data quality score (1-10) with justification

Track C: Deploy

Lecture

Data Governance -- The Boring Foundation

  • Data governance maturity model: chaos -> managed -> optimized
  • Legal & commercial barriers: BAA, de-identification, data licensing
  • Case: Cleveland Clinic's 3-year journey to AI-ready data

Lab

Decision Lab C2: Vendor Data Due Diligence

Evaluate a radiology AI vendor's data sheet (designed with gaps). Produce a due diligence report.

Assignment

Data due diligence checklist + pass / flag / reject decision

Week 03

Classical ML & Clinical Prediction

Track A: Build

Lecture

Baselines That Actually Work

  • Logistic regression, decision trees, random forest, gradient boosting
  • Medical prediction tasks: readmission, sepsis risk, triage priority
  • Evaluation metrics in clinical context: sensitivity vs specificity vs PPV vs NPV

Lab

Lab A3: Build a Readmission Predictor

Train 30-day readmission models with scikit-learn. Run ROC, confusion matrix, calibration plot. Choose clinical threshold.

Assignment

Technical memo: model results + threshold selection rationale + 3 failure modes

Suggested Reading

Roberts et al., Common Pitfalls in ML for Healthcare (Nat Med 2021)

Track B: Judge

Lecture

Metrics That Matter (and Metrics That Lie)

  • Sensitivity/specificity/PPV/NPV shift with prevalence
  • AUC myth: high AUC does not equal clinical utility
  • Net benefit & decision curve analysis: beyond accuracy

Lab

Workshop B3: Threshold Trade-off Exercise

Sepsis model at AUC=0.85: pick thresholds for ICU attending vs ED triage nurse. Analyze CDSS alert fatigue scenario.

Assignment

Threshold decision worksheet + alert fatigue case analysis + written rationale

Suggested Reading

Van Calster et al., Calibration: the Achilles Heel of Predictive Analytics (BMC Med 2019)

Track C: Deploy

Lecture

AI Metrics -- What the Dashboard Should Show You

  • Non-technical explanation of sensitivity, specificity, PPV, NPV
  • Why 95% accuracy can be useless; AUC in 1 minute
  • Metrics that matter: clinical impact, workflow efficiency, error rate reduction

Lab

Decision Lab C3: From High-AUC to Procurement Decision

Sepsis AI with AUC=0.88. Recalculate PPV at your prevalence, assess workflow impact, write procurement recommendation.

Assignment

Procurement decision memo: buy / defer / reject with conditions

Suggested Reading

Van Calster et al., Calibration (BMC Med 2019) -- executive-accessible version

Week 04

Deep Learning & Medical Imaging

Track A: Build

Lecture

From CheXNet to Foundation Models

  • CNN intuition: convolution, pooling, feature maps; transfer learning
  • CheXNet (2017) -> CheXpert -> MIMIC-CXR -> 2026 radiology foundation models
  • Common pitfalls: shortcut learning, label leakage, scanner-specific bias

Lab

Lab A4: CXR Classification Notebook

Fine-tune a pre-trained model for CXR binary classification. Run Grad-CAM to visualize model attention.

Assignment

Grad-CAM screenshots + analysis: is the model learning the right features?

Suggested Reading

Rajpurkar et al., CheXNet (2017) + follow-up critique

Track B: Judge

Lecture

Radiology AI -- Promise and Pitfalls

  • Shortcut learning: models read labels, not pathology
  • External validation reality: NIH performance vs Mumbai deployment
  • Deployment evidence: radiologist+AI vs radiologist alone vs AI alone

Lab

Workshop B4: Paper Critique #1 -- CXR AI

Structured critique of a CXR AI paper: problem definition, data quality, method, results, overstated conclusions.

Assignment

Complete paper critique report using provided template

Suggested Reading

Seyyed-Kalantari et al., Underdiagnosis Bias in CXR AI (Nat Med 2021)

Track C: Deploy

Lecture

Deploying Radiology AI -- The First 100 Days

  • 200+ FDA-cleared radiology AI products, but how many are truly deployed?
  • Pilot design 101: population, duration, success/exit criteria
  • Monitoring: model drift, performance degradation, feedback loops

Lab

Decision Lab C4: CXR AI Pilot Charter + CDSS Comparison

Design pilot charters for episodic (CXR triage) vs continuous (CDSS drug interaction) AI. Compare KPIs and rollback plans.

Assignment

Two pilot charters (CXR + CDSS) + comparison analysis: episodic vs continuous AI deployment

Suggested Reading

Mayo Clinic AI Deployment Framework

Week 05

Transformers, LLMs & Clinical NLP

Track A: Build

Lecture

Language Models in Medicine

  • Attention mechanism -> transformer architecture
  • Pre-training -> fine-tuning -> RLHF -> instruction tuning -> tool use
  • Clinical NLP: note summarization, ICD coding, patient education; hallucination danger

Lab

Lab A5: Clinical Text Pipeline + CDSS Prototype

Build clinical note extraction pipeline + LLM-based medication safety checker via Claude API. Compare with rule-based checker.

Assignment

Pipeline code + 3-case hallucination audit table + CDSS alert accuracy analysis

Suggested Reading

Singhal et al., Med-PaLM 2 (2023)

Track B: Judge

Lecture

Clinical LLMs -- Capabilities, Failures & Hallucination

  • USMLE scores vs bedside gap: exams != clinical care
  • Hallucination taxonomy: fabricated citations, plausible-but-wrong reasoning
  • Automation bias: why you unconsciously trust AI

Lab

Workshop B5: LLM Head-to-Head Clinical Comparison

5 clinical + 2 patient-facing cases across 3 LLMs. Score DDx completeness, hallucination, safety, and patient-friendliness.

Assignment

7-case LLM comparison table + summary: should clinician-facing vs patient-facing AI have different standards?

Suggested Reading

Singhal et al., Med-PaLM 2 (2023) + benchmark vs bedside critique

Track C: Deploy

Lecture

Ambient Scribe & Documentation AI -- The $18B Question

  • Market: Nuance DAX, Abridge, Nabla, Suki -- real value is workflow redesign
  • Evidence gap: which products have RCTs vs testimonials only
  • Risk: hallucination in clinical notes, liability, patient consent, data residency

Lab

Decision Lab C5: Documentation AI + Patient Chatbot Vendor Evaluation

Score 3 ambient scribe vendors + 1 patient chatbot vendor. Use LLM to find red flags. Calculate TCO.

Assignment

Vendor evaluation scorecard (documentation AI + patient chatbot) + final recommendation memo

Suggested Reading

LLM Chatbot for Care Transitions (Nature Medicine 2026)

Week 06

The Agent Era: Coding Agents & Multi-Agent Systems

Track A: Build

Lecture

The Agent Era -- Beyond Prompting

  • Prompting -> tool use -> coding agents -> multi-agent orchestration
  • Claude Code, Codex CLI, Cursor, Windsurf: positioning & capabilities
  • OpenClaw architecture: agent definition, memory, context engineering, task routing

Lab

Lab A6: Build a Medical AI Project with Claude Code

Use natural language + Claude Code to build from scratch: sepsis warning pipeline OR LLM-CDSS with RAG + review UI.

Assignment

Claude Code session log + final project + 1-page reflection: what the agent helped vs deceived

Suggested Reading

Anthropic, Vibe Physics (2026) -- AI as research collaborator, eager-to-please problem

Track B: Judge

Lecture

The Clinician as AI Director

  • You don't need to code -- you need to describe problems precisely
  • Vibe Physics case: Harvard professor uses Claude Code for theoretical physics
  • Medical scenarios: natural language -> clinical calculator prototype

Lab

Workshop B6: Natural Language -> Clinical Prototype

Direct Claude Code via natural language to build a CHA2DS2-VASc calculator, drug interaction checker, or discharge summary generator.

Assignment

Clinical tool specification (natural language) + agent output review notes: what's right, wrong, and how you fixed it

Suggested Reading

Schwartz, Vibe Physics (Anthropic 2026)

Track C: Deploy

Lecture

The AI Stack -- What Executives Must Understand

  • AI stack: foundation model -> application -> workflow -> governance layer
  • Open-source vs closed-source strategy: cost, control, compliance
  • From chatbot -> coding agent -> multi-agent: Claude Code, Codex, OpenClaw

Lab

Decision Lab C6: Hospital AI Tooling Stack Workshop

Map your hospital's 4-layer AI stack: model, application, workflow, governance. Define build vs buy vs partner decisions.

Assignment

Hospital AI tooling stack diagram + build/buy/partner decision matrix

Week 07

Model Evaluation, Reproducibility & Failure Modes

Track A: Build

Lecture

When Good Metrics Go Bad

  • AUC trap: high AUC != clinically useful; calibration & net benefit
  • Subgroup performance: disparities across age, sex, race
  • Reproducibility crisis: why paper results fail in your hands

Lab

Lab A7: Reproduce & Critique

Subgroup analysis, calibration plot, and failure mode identification on your Week 3/4 models. Compare against published results.

Assignment

Evaluation report: subgroup results + calibration + failure mode + improvement suggestions

Suggested Reading

Roberts et al., Common Pitfalls in ML for Healthcare (Nat Med 2021)

Track B: Judge

Lecture

Advanced Failure Modes in Medical AI

  • Leakage, shortcut learning, subgroup disparity
  • Reproducibility: why you can't replicate paper results
  • p-hacking in ML: model/metric/dataset selection degrees of freedom

Lab

Workshop B7: Paper Critique #2 -- Find the Flaw

Critique 2 traditional + 2 CDSS/patient chatbot papers (Nature Medicine 2026, NEJM AI 2025). Find hidden flaws.

Assignment

Full critique of 2+ papers (including 1 CDSS/chatbot) + 3 major questions for authors

Suggested Reading

LLM Chatbot for Mental Health Treatment (NEJM AI 2025, RCT)

Track C: Deploy

Lecture

AI ROI -- Beyond the Vendor Slide Deck

  • ROI structure: cost avoidance vs revenue vs quality vs risk reduction
  • Hidden costs: integration, training, workflow redesign, ongoing monitoring
  • Exit strategy: when to shut down an AI tool

Lab

Decision Lab C7: ROI Calculator Workshop

Calculate direct/indirect ROI for a 6-month sepsis AI pilot. Run sensitivity analysis on false positive rate changes.

Assignment

ROI worksheet + sensitivity analysis + go/no-go recommendation

Suggested Reading

Kaiser Permanente AI ROI Framework

Week 08

Multi-Agent Systems & Hospital Automation

Track A: Build

Lecture

Multi-Agent Systems for Healthcare

  • Single agent vs multi-agent: when to decompose tasks
  • OpenClaw deep dive: agent definitions, memory model, context management
  • Medical multi-agent: literature review + analysis + report generation pipelines

Lab

Lab A8: OpenClaw Medical AI Pipeline Demo

Interact with a pre-built 4-agent pipeline: paper intake -> method extraction -> PubMed search -> structured review report.

Assignment

Design your own 3-agent medical workflow: text description + agent definitions + expected I/O

Suggested Reading

Anthropic, Long-Running Claude for Scientific Computing (2026)

Track B: Judge

Lecture

Multi-Agent Systems -- What Clinicians Need to Know

  • Why one AI isn't enough: specialized agents for evidence synthesis
  • OpenClaw: methodology agent + bias agent + clinical agent collaboration
  • Human-in-the-loop: what to automate vs what requires your eyes

Lab

Workshop B8: OpenClaw Evidence Synthesis Demo

Live demo of multi-agent paper review. Modify an agent prompt and observe output changes. Can this replace your journal club?

Assignment

Describe a multi-agent clinical workflow you want + why multiple agents + where human review is mandatory

Track C: Deploy

Lecture

Multi-Agent AI in Hospital Operations

  • Agentic workflow scenarios: QA routing, prior auth, clinical trial matching, bed management
  • CDSS agentic pipeline: order intake -> RAG search -> LLM reasoning -> severity routing
  • Risk & governance: what can be fully automated vs human-approved

Lab

Decision Lab C8: OpenClaw Hospital Automation Demo

Live demo of QA event routing pipeline. Design an agentic workflow for your hospital process. Estimate FTE replacement ROI.

Assignment

Agentic workflow design + cost-benefit sketch

Suggested Reading

Anthropic, Long-Running Claude for Scientific Computing (2026)

Week 09

Regulation, Ethics & Safe Deployment

Track A: Build

Lecture

Building Within Boundaries

  • FDA SaMD classification: 510(k) / De Novo / PMA pathways
  • EU AI Act high-risk classification; WHO 2025 guidance on health LLMs
  • Liability, model cards, datasheets for datasets, algorithmic impact assessments

Lab

Lab A9: Compliance Constraint Checklist

Apply compliance checklist to your Week 6 Claude Code project: de-identification, intended use, FDA level, monitoring plan.

Assignment

Completed compliance checklist + revised project scope

Suggested Reading

FDA SaMD Framework + WHO Guidance on Health LLMs (2025)

Track B: Judge

Lecture

Using AI Safely in Clinical Practice

  • FDA SaMD framework: what level is your AI tool?
  • WHO 6 principles for health LLMs; EU AI Act implications
  • Liability: if AI is wrong, who is responsible -- you or the vendor?

Lab

Workshop B9: Draft a Safe-Use Protocol

Write a safe-use protocol for an AI clinical note summarizer: intended use, human review nodes, incident reporting, exit criteria.

Assignment

Completed safe-use protocol using provided template

Suggested Reading

WHO Guidance on Ethics & Governance of LLMs in Health (2025)

Track C: Deploy

Lecture

Building an AI Governance Program

  • Governance 4 pillars: policy, process, people, technology
  • Committee structure: AI governance board, clinical AI review, IT security
  • Patient chatbot governance: consent, scope limits, emergency escalation, adverse event reporting

Lab

Decision Lab C9: Governance Playbook Workshop

Build a full AI governance playbook: RACI matrix, risk classification, incident response, patient chatbot governance checklist.

Assignment

Governance playbook + RACI matrix + patient chatbot governance checklist

Suggested Reading

FDA SaMD Framework + EU AI Act + WHO LLM Guidance (2025)

Week 10

Capstone: Demo Day

Track A: Build

Lecture

Capstone Presentations

  • 5-min demo: data -> model -> evaluation pipeline + Claude Code / OpenClaw process
  • Technical memo (2-3 pages): problem, data, methods, results, limitations, compliance
  • Peer review: evaluate another student's project using Track B frameworks

Lab

Lab A10: Demo Day + Peer Review

Present your end-to-end medical AI project. Receive structured peer feedback across technical correctness, clinical reasoning, and agent tool usage.

Assignment

Final deliverables: notebook + technical memo + demo + peer review

Track B: Judge

Lecture

Capstone Presentations

  • AI Tool Evaluation Memo (3-4 pages): evidence quality, clinical usability, risk, recommendation
  • Trust / Use-with-caution / Reject recommendation + safe-use protocol
  • 5-min presentation to simulated hospital committee

Lab

Workshop B10: Demo Day + Peer Review

Present your AI tool evaluation to a simulated hospital committee. Receive peer feedback on evidence assessment and clinical judgment.

Assignment

Final deliverables: evaluation memo + peer review + 5-min presentation

Track C: Deploy

Lecture

Executive AI Strategy Simulation

  • Scenario: 500-bed hospital, $2M AI budget, 12 months to deploy 2 use cases
  • Executive strategy memo (3-5 pages): use cases, vendor eval, roadmap, governance, ROI
  • 10-min board presentation + peer challenge from other teams

Lab

Decision Lab C10: Board Presentation + Peer Challenge

Present your AI strategy to a simulated board. Defend: why these use cases? What if the first one fails? What are competitors doing?

Assignment

Final deliverables: executive strategy memo + board presentation + peer challenge

Anchor Cases

All three tracks revisit these clinical anchors from different angles — building shared language across disciplines.

CXR / Radiology AI

From CNN architecture to reader studies, workflow integration, and procurement evaluation.

EHR / Clinical Note Summarization

From transformer embeddings to hallucination risk, documentation support, and vendor assessment.

Sepsis / Deterioration Prediction

From risk score modeling to threshold-setting, clinical utility, and deployment monitoring.

Assessment

We don’t test who memorizes AI jargon best. We assess who can define problems, match models to tasks, evaluate evidence, and judge clinical safety.

20%
Participation & case reflections
20%
Weekly assignments (layered by track)
25%
Paper critique or tool evaluation
35%
Final capstone project

Ready to Choose Your Track?

Spring 2026 cohort now forming. All three tracks welcome — pick the one that matches your background.

Apply Now