全部
logo

Translating multimodal foundation models into oncology: Toward a future where AI directs diagnosis and therapy

Research Highlights

Translating multimodal foundation models into oncology: Toward a future where AI directs diagnosis and therapy

Park Kyung Chan
Yoo Wonbeak
Genes & Diseases第13卷, 第4期纸质出版 2026-07-01在线发表 2025-11-27
3500

Recent developments in multimodal artificial intelligence (AI) have begun to transform how clinicians approach cancer prognosis and treatment selection. In a recent study, Xiang et al1 present MUSK (Multimodal Unified Self-supervised learning for Oncology), a foundation model that integrates more than 50 million whole-slide pathology images and over 1 billion oncology-related clinical text tokens. MUSK uses a unified transformer architecture to simultaneously capture morphological and semantic features, enabling the integrated image–text interpretation essential for oncology (Fig. 1A). The model was pretrained in two stages: the first stage employed masked modeling using unpaired data from each modality, while the second stage used approximately one million paired image–text samples with contrastive learning to align histologic and linguistic representations. This approach enabled robust cross-modal understanding, supporting downstream diagnostic and prognostic applications.

pic