論文ID: 2025EDL8030
Unlike conventional speech-based depression detection (SDD), cross-elicitation SDD presents a more challenging task due to the differing speech elicitation conditions between the labeled source (training) and unlabeled target (testing) speech data. In such scenarios, a significant feature distribution gap may exist between the source and target speech samples, potentially reducing the detection performance of most existing SDD methods. To address this issue, we propose a novel deep transfer learning method called the Deep Elicitation-Adapted Neural Network (DEANN) in this letter. DEANN aims to directly learn both depression-discriminative and elicitation-invariant features from speech spectrograms corresponding to different elicitation conditions using two weight-shared Convolutional Neural Networks (CNNs). To achieve this, the CNNs are first endowed with depression-discriminative capability by establishing a relationship between the source speech samples and the provided depression labels. Subsequently, a well-designed constraint mechanism, termed Bidirectional Sparse Reconstruction, is introduced. This mechanism ensures that source and target speech samples can be sparsely reconstructed by each other at the same feature layer of both CNNs, allowing the learned features to maintain adaptability to changes in speech elicitation conditions while preserving their original depression-discriminative capability. To evaluate DEANN, we conduct extensive cross-elicitation SDD experiments on the MODMA dataset. The experimental results demonstrate the effectiveness and superiority of the proposed DEANN in addressing the challenge of cross-elicitation SDD compared to many existing state-of-the-art transfer learning methods.