2026 年 30 巻 2 号 p. 457-471
Multimodal medical imaging is pivotal for early disease screening; however, its deployment is often constrained by limited resources. While deep learning-based synthesis cannot replace clinical imaging, high-fidelity cross-modal translation can provide actionable prior information for preliminary assessment. To this end, we introduce a chained extension framework that scales model capacity and precision by linking multiple encoder–decoder modules. Starting from a minimal encoder–decoder backbone, we construct a triple-stage generative adversarial network and integrate a brightness-sensitive loss that reweights luminance-dependent errors. This staged design decomposes positron emission tomography to computed tomography translation into complementary subtasks targeting structural consistency, texture enhancement, and key-region refinement. Comprehensive experiments indicate that the proposed approach generates synthetic CT images that closely match reference CT scans, visually and quantitatively, achieving a structural similarity index of 0.85, peak signal-to-noise ratio of 23.31 dB, and mean absolute error of 6.93×10-2. Thus, our framework is feasible as an assistive tool for early screening workflows in resource-limited settings. Moreover, the staged training strategy, coupled with brightness-aware weighting, mitigates common optimization plateaus in cross-modal synthesis, suggesting a principled path toward further gains in fidelity and robustness.
この記事は最新の被引用情報を取得できません。