Deep Learning for Cardiac Overload Estimation　― Predicting B-Type Natriuretic Peptide (BNP) Levels From Heart Sounds and Electrocardiogram ―

Shimpei Ogawa; Masanobu Ishii; Shumpei Saito; Hiroshi Seki; Koshiro Ikeda; Yuhei Yasui; Tomohiro Komatsu; Ginga Sato; Noriaki Tabata; Mitsuru Ohishi; Takuro Kubozono; Naritatsu Saito; Eri Toda Kato; Xiaoyang Song; Masahiro Yamada; Shunsuke Natori; Yuki Kunikane; Takafumi Yokomatsu; Masashi Kato; Yasuaki Sagara; Nami Uchiyama; Nobuhiko Atsuchi; Shota Kawahara; Shoji Natsugoe; Kenichi Tsujita

doi:10.1253/circj.CJ-25-0098

Abstract

Background: B-type natriuretic peptide (BNP) and N-terminal pro-BNP (NT-pro-BNP) are key biomarkers used for heart failure (HF) management. Although traditional auscultation lacks objective evaluation, the SSS01-series phonocardiogram enables rapid recording of heart sounds and ECG. We developed a deep-learning model to estimate plasma BNP levels from these non-invasive dynamic physiological signals, with the aim of validating the model’s performance with an external validation dataset and assessing its feasibility for clinical application.

Methods and Results: This multicenter study evaluated the estimated BNP (eBNP) model for predicting plasma BNP levels ≥100 pg/mL using 8 s of heart sound and ECG data. Validation was performed on an external validation dataset of 140 patients, achieving an area under the receiver operating characteristic curve (AUROC) of 0.895, with sensitivity and specificity of 84.3% and 82.9%, respectively. Subgroup analysis of patients with body mass index of 18.5–25 (n=127) showed more substantial predictive capability, with an AUROC of 0.959, sensitivity of 92.5%, and specificity of 84.8%.

Conclusions: The eBNP model demonstrated strong potential for non-invasive and rapid HF screening. Its simplicity and objectivity make it ideally suited for point-of-care testing, offering a promising approach for early HF diagnosis and detection monitoring of HF exacerbations. These findings, validated on datasets independent of training, highlight the model’s robustness across diverse clinical populations.

The 1-year rehospitalization rate for heart failure (HF) patients remains as high as 26–35%, with a mortality rate of 23%, making HF a significant global health concern.¹^,² In Japan, the aging of the population is associated with an increase in comorbidities, contributing to a rise in HF cases. By 2030, when the aging rate is projected to exceed 31%, the number of HF patients is expected to surpass 1.3 million.³ Given that early therapeutic intervention before the onset of overt HF signs and symptoms has demonstrated efficacy,⁴^–⁶ and is globally recommended in clinical guidelines,⁷^–⁹ the development of a high-throughput early HF screening approach is crucial.

When HF is suspected in clinical practice, measuring the B-type natriuretic peptide (BNP) or N-terminal pro-BNP (NT-pro-BNP) level is a widely endorsed diagnostic approach recommended by HF guidelines worldwide.⁷^–¹² Although these tests are highly informative, they require blood samples, which poses challenges, including result delays, particularly in outpatient settings. This highlights the need for non-invasive, rapid, and practical diagnostic methods for HF.

Recent advancements in deep learning and artificial intelligence (AI) have enabled cardiac disease detection through biosignals analysis, particularly using electrocardiographic (ECG) data, as well as imaging modalities such as angiography and tomography.¹³^–¹⁷ In the context of HF screening, heart sound data offer unique insights into cardiovascular hemodynamics and valvular function, for detecting subtle pathological changes.¹⁸^–²⁰ We hypothesized that incorporating hemodynamic insights from acoustic engineering with electrophysiological approaches could enhance the accuracy of HF detection. In pursuit of this goal, we developed the SSS01-series (AMI Inc., Kagoshima, Japan), a phonocardiogram designed to synchronously acquire heart sounds and ECG data (Figure 1).²¹ This hand-held medical device enables non-invasive assessment of cardiac function in only 8 s, serving as a point-of-care testing (POCT) tool for HF screening and monitoring. Additionally, it features a standardized and simplified operation to minimize interoperator variability, ensuring consistent results across different clinical settings, regardless of clinical skills and specialties.

Figure 1.

SSS01-series phonocardiogram: acquisition, visualization, and findings. (A) The SSS01-series phonocardiogram, a compact medical device (PMDA Approval No. 30400BZX00218000), is capable of synchronously acquiring heart sounds and ECG data. The device captures dynamic physiological signals in a fixed 8-s recording. (B) Real-time display of recorded signals during data acquisition. (C) Findings for patients with aortic stenosis.

This study aimed to evaluate the performance of the deep learning model that estimates plasma BNP levels from heart sound and ECG data. The successful development of a simple, non-invasive, and rapid method to estimate left cardiac overload could offer a transformative approach for HF screening and exacerbation monitoring in clinical practice.

Methods

Study Design and Patient Selection

This study was a multicenter prospective observational study aimed at validating the performance of a previously developed estimated BNP (eBNP) model using an external validation dataset, independent of the datasets used during the development phase.

In the development phase, data collected from 2 university hospitals and 4 general hospitals were selected as the Training and Internal Validation Cohort. As this phase also aimed to develop models for estimating valvular heart disease and cardiac overload, the inclusion criteria for recruitment comprised patients who had undergone transthoracic echocardiography during hospitalization or outpatient visits. Patients were excluded if they had missing heart sounds or ECG data, unavailable BNP and/or NT-pro-BNP levels, heart sound data collected more than 7 days before or after acquiring BNP levels, or were currently undergoing dialysis (Supplementary Figure 1).

For external validation, which was planned in advance before model development, patients who underwent transthoracic echocardiography during hospitalization or outpatient visits at 3 general hospitals separate from those used in the Training and Internal Validation Cohort were recruited between February and December 2021 (n=646, 129, and 43, respectively). Patients meeting any of the following criteria were excluded from this external validation cohort: (a) heart sound data at the 4th left sternal border (4LSB) position and plasma BNP levels were not available; (b) heart sound data collected >7 days before or after acquiring plasma BNP levels; (c) patients currently undergoing dialysis; (d) patients with unavailable serum estimated glomerular filtration rate (eGFR) values or those with renal dysfunction (eGFR <30 mL/min/1.73 m²). To evaluate the eBNP model’s ability to detect BNP ≥100 pg/mL, 70 cases and 70 controls were selected to create the external validation dataset. To ensure a diverse patient representation, including those with high plasma BNP levels, stratified sampling was performed based on BNP level distribution as follows: <35 pg/mL (44 cases), 35≤BNP<100 pg/mL (26 cases), 100≤BNP<200 pg/mL (23 cases), 200≤BNP<400 pg/mL (23 cases), and BNP ≥400 pg/mL (24 cases) (Figure 2). All clinical protocols adhered to the Declaration of Helsinki and the study protocols were approved by the institutional review boards of each participating facility (Approval Nos. 2030, 932, 21-22, and 507, respectively).

Figure 2.

Study flow chart illustrating the process of selecting 140 patients for the external validation dataset from the external validation cohort of 818 patients across 3 hospitals. Considering that plasma B-type natriuretic peptide (BNP) levels were expected to follow a non-normal distribution, we predefined the number of patients in each BNP range to ensure a balanced representation. Stratified sampling was then performed to achieve this balanced validation sample. 4LSB, 4 left sternal border; eGFR, estimated glomerular filtration rate.

Preprocessing of Waveform Data, ECG, and BNP Value

Heart sounds and ECG data were collected using the SSS01 series phonocardiogram (PMDA Approval No. 30400BZX00218000). Placing the device on the anterior chest, the device can acquire synchronized bipolar lead ECG and heart sounds with sampling rates of 500 Hz and 8 kHz, respectively, and record them as digital data.²¹ The mean data length in the external validation dataset was 16.3±1.78 s/wave data. An 8-s segment from 5 to 13 s of the 4LSB recording was extracted and resampled to 2 kHz, and the signal amplitude was normalized (mean=0, variance=1). If the total data length was <13 s, the last 8 s of the data were used. Because the standard recording length of the SSS01 series is 8 s, the same duration was applied in this validation study. During data acquisition, patients were generally instructed to hold their breath from the 5-s to the 10-s mark. Because the 8-s segment used for model development and validation consistently began at the 5-s mark of each recording, the last 3 s (from the 10-s to the 13-s mark) typically included respiratory sounds due to resumed breathing.

The eBNP model was trained on a classification task to predict whether the ground truth plasma BNP level was ≥100 pg/mL or not. This cutoff threshold of ≥100 pg/mL is consistent with the diagnostic and risk assessment criteria jointly defined by the Heart Failure Society of America, Heart Failure Association of the European Society of Cardiology, and the Japanese Heart Failure Society.⁹

Model Development

The training and internal validation dataset was split at the patient level into training and validation sets in an 8 : 2 ratio (Supplementary Figure 1), using stratified 5-fold cross-validation with a fixed random seed. Patients were categorized into 7 BNP-based strata (<20, 20–35, 35–100, 100–200, 200–400, 400–800, ≥800 pg/mL), and stratified sampling ensured a nearly uniform distribution across the training and validation sets in each fold. Among the 5 partitions, the model with the highest area under the receiver operating characteristic curve (AUROC) in the validation set was selected for further analysis. The training set was used to train the deep learning model, and the validation set was used for tuning and internal evaluation.

Deep Neural Networks and Gradient Boosting Decision Trees were utilized for training, using PyTorch and LightGBM frameworks.²²^,²³ To augment the training data, cases with available serum NT-pro-BNP levels were included. NT-pro-BNP ≥300 pg/mL, indicating a high risk of HF, was treated as a binary positive label equivalent to BNP ≥100 pg/mL.

Patients were instructed to breathe in for the first 5 s and hold their breath for the next 5 s during the physiological data collection. Heart sounds and ECG data were collected from £4 locations: the 2nd right sternal border (2RSB), 2nd left sternal border (2LSB), 4LSB, and the 5th left intercostal space at the mid-clavicular line (5LMCL). For each site, heart sounds and ECG data were randomly segmented into 5-s intervals and treated as independent datasets for training. In some models, ECG data underwent bandpass filtering (0.05–200 Hz) before inputting into the models. In other models, test-time augmentation (TTA) was applied, where 16×5.0-s subsequences were randomly extracted from each time series, and their arithmetic mean was calculated.

The Convolutional Neural Network (CNN) model consisted of 1D convolutional layers and fully connected layers, based on EfficientNet architecture. To improve performance, Squeeze-and-Excitation modules and Skip Connections were incorporated. Additionally, data augmentation techniques such as Gaussian noise, background noise overlay, and speed perturbation were applied to enhance model robustness.

The LightGBM-based model was constructed by applying Short-Time Fourier Transform (STFT) and Wavelet Transform to heart sound signals, extracting features such as spectrum and cepstrum. A modified Continuous Wavelet Transform algorithm was used to detect R-peaks in the ECG data.²⁴ Using these peaks as reference points, sound 1 (S1) and sound 2 (S2) were identified from the heart sound waveform, and the time domain was segmented into R-R intervals, S1, systole, S2, and diastole. Heart sound features, including amplitude, intensity, area, frequency, variance, ratio of peak time, and duration, comprising 119 distinct parameters, were calculated from these segments using methods based on previous studies and novel approaches.²⁵ Because multiple heartbeats were detected from a single recording, these features were calculated for each beat, and the averaged values were used as input parameters for constructing the LightGBM model.

The 2 models were ensembled to generate a single probability score of estimated BNP for each heart sound. The 4LSB site showed the highest AUROC during internal validation, with the optimal probability threshold determined by the Youden Index being 0.3169. Therefore, this site and threshold were adopted for this present study.

Grad-CAM was applied to assess model interpretability to identify the time-based information the model focused on during predictions.²⁶

Statistical Analysis and Model Performance Evaluation

The primary outcome was the eBNP model’s sensitivity in classifying plasma BNP levels ≥100 pg/mL. The sample size was calculated based on the width of the 95% confidence interval (CI) of a binomial distribution, with the target sensitivity set to ≥75%. Baseline characteristics were summarized as median with interquartile range (IQR), assuming non-normal distributions. For left ventricular ejection fraction (LVEF) values, the modified Simpson’s method was prioritized, and if unavailable, the Teichholz method was used. The correlation between plasma BNP levels and model probabilities was evaluated using Spearman’s rank correlation coefficient, assuming a non-normal data distribution. The model’s performance in classifying plasma BNP levels ≥100 pg/mL was assessed by calculating the AUROC using the probabilities generated by the model. Sensitivity, specificity, and their 95% CIs were calculated using the cutoff value determined in the internal validation dataset, with Wilson’s method used for CI estimation. Subgroup analysis stratified by body mass index (BMI: <18.5, 18.5–25, ≥25) included forest plots. In Japan, a BMI ≥25 is classified as obesity, corresponding to the high-BMI group in this analysis. To simulate the influence of external noise on the eBNP model performance, background noise commonly used in acoustic engineering, such as speech noise and respiratory noise, was superimposed onto the heart sound data. The model’s robustness was evaluated based on sensitivity and specificity.²⁷^,²⁸ The noise levels were calculated based on a predefined signal-noise-ratio (SNR) and the energy of the 8-s heart sound segment, and the performance was assessed across SNR values ranging from 50 to 10 dB, with 50 dB representing the lower noise level. All modeling was performed using Python version 3.9.7, and statistical analyses were conducted using R version 4.4.1. A two-tailed P value <0.05 was considered statistically significant.

Results

The external validation dataset included 140 patients selected through stratified sampling based on BNP classes from eligible patients across 3 hospitals. The training and internal validation dataset comprised 1,035 patients from a multicenter dataset. The clinical characteristics of the patients in the external validation dataset were: median age 74 (64–83) years, male 63%, median BMI 23.0 (20.9–25.7), atrial fibrillation 15%, ischemic heart disease 39%, valvular heart disease 47%, and median BNP 101 (30–268). In contrast, the training and internal validation dataset had a median age 72 (63–80) years, male 58%, median BMI 23.9 (21.6–26.6), atrial fibrillation 8.4%, ischemic heart disease 59%, valvular heart disease 47%, and median BNP 64 (22–197) (Table).

Table.

Patients’ Characteristics

Characteristics	Missing, %	Overall, N=1,175	External validation, N=140	Train and internal validation, N=1,035
Age, years	0	72 (63, 80)	74 (64, 83)	72 (63, 80)
Male, n (%)	0	689 (59)	88 (63)	601 (58)
BMI, kg/m²	6.6	23.8 (21.5, 26.5)	23.0 (20.9, 25.7)	23.9 (21.6, 26.6)
Hypertension, n (%)	0	810 (69)	102 (73)	708 (68)
Dyslipidemia, n (%)	0.1	612 (52)	65 (46)	547 (53)
Diabetes, n (%)	0	743 (63)	30 (21)	713 (69)
CKD, n (%)	0	173 (15)	26 (19)	147 (14)
Arrhythmia, n (%)	21.9
Sinus		721 (79)	90 (66)	631 (81)
AF		87 (9.5)	21 (15)	66 (8.4)
Other arrhythmia		110 (12)	25 (18)	85 (11)
Ischemic heart disease, n (%)	0.1	661 (56)	54 (39)	607 (59)
Valvular heart disease, n (%)	0	551 (47)	66 (47)	485 (47)
Heart failure, n (%)	0	538 (46)	79 (56)	459 (44)
LVEF, %	0	64 (56, 73)	55 (44, 63)	65 (57, 74)
E/e′	17.7	10.2 (8.2, 12.7)	11.2 (9.0, 15.9)	10.1 (8.1, 12.5)
AS severity, n (%)	6
None		1,014 (92)	127 (91)	887 (92)
Trivial		1 (<0.1)	1 (0.7)	0 (0)
Mild		38 (3.4)	3 (2.1)	35 (3.6)
Moderate		29 (2.6)	3 (2.1)	26 (2.7)
Severe		22 (2.0)	5 (3.6)	17 (1.8)
AR severity, n (%)	6.6
None		535 (49)	68 (49)	467 (49)
Trivial		198 (18)	24 (17)	174 (18)
Mild		314 (29)	27 (19)	287 (30)
Moderate		41 (3.7)	16 (11)	25 (2.6)
Severe		10 (0.9)	4 (2.9)	6 (0.6)
MR severity, n (%)	6
None		35 (3.2)	14 (10)	21 (2.2)
Trivial		535 (48)	46 (33)	489 (51)
Mild		433 (39)	45 (32)	388 (40)
Moderate		75 (6.8)	27 (19)	48 (5.0)
Severe		26 (2.4)	8 (5.7)	18 (1.9)
BNP, pg/mL	67	73 (22, 231)	101 (30, 268)	64 (22, 197)
NT-pro-BNP, pg/mL	31.9	151 (56, 480)	341 (232, 779)	149 (56, 479)
eGFR, mL/min/1.73 m²	2	60 (47, 72)	53 (42, 65)	61 (49, 72)

Values are median (interquartile) or n (%). AF, atrial fibrillation; AR, aortic regurgitation; AS, aortic stenosis; BMI, body mass index; BNP, B-type natriuretic peptide; CKD, chronic kidney disease; LVEF, left ventricular ejection fraction; MR, mitral regurgitation; NT-pro-BNP, N-terminal pro-BNP.

The probability cutoff value of the eBNP model was set at 0.3169 based on the internal validation dataset. The probability of the eBNP model, a continuous variable ranging from 0 to 1, demonstrated a positive correlation with plasma BNP levels (log plasma BNP vs. eBNP model in all patients, r=0.616, P<0.001; log plasma BNP vs. eBNP model excluding both the top and bottom 3% of outliers based on plasma BNP levels, r=0.660, P<0.001, Supplementary Figure 2). The eBNP model’s performance in the external validation dataset for classifying plasma BNP levels ≥100 pg/mL achieved an AUROC of 0.895 (Figure 3). Using the cutoff determined from the internal validation dataset, the sensitivity and specificity in the external validation dataset were 84.3% (95% CI: 74.0–91.0) and 82.9% (95% CI: 72.4–89.9), respectively. Notably, the Youden Index-based cutoff value in the external validation dataset population was 0.315, showing a similar trend to the value determined in the internal validation dataset. Grad-CAM analysis of the eBNP model revealed that the model primarily focused on signals during early and late diastole, as well as the S1 signal, for prediction (Supplementary Figure 3).

Figure 3.

ROC curve and performance of the eBNP model for plasma BNP ≥100 pg/mL. The eBNP model demonstrated an AUROC of 0.895 (95% CI: 0.843–0.948) in the external validation dataset. AUROC, area under the receiver operating characteristics curve; CI, confidence interval;eBNP, estimated B-type natriuretic peptide; PPV, positive predictive value; NPV, negative predictive value.

A stratified analysis was subsequently performed on 127 patients with available BMI data, categorized into 3 BMI classes based on BMI ranges: low (BMI <18.5; 13 patients, 10.2%), middle (18.5≤BMI<25.0; 73 patients, 57.5%), and high (BMI ≥25.0; 41 patients, 32.3%). The proportion of patients with BNP ≥100 pg/mL in the low-, middle-, and high-BMI groups was 69%, 55%, and 34%, respectively. The respective AUROCs for these groups were 0.806 (95% CI: 0.555–1.0), 0.959 (95% CI: 0.921–0.997), and 0.787 (95% CI: 0.647–0.927). The respective sensitivities were 0.778 (95% CI: 0.453–0.937), 0.925 (95% CI: 0.801–0.974), and 0.643 (95% CI: 0.388–0.837), and specificities were 0.750 (95% CI: 0.301–0.954), 0.848 (95% CI: 0.691–0.933), and 0.778 (95% CI: 0.592–0.894) (Figure 4).

Figure 4.

Performance of the eBNP model for predicting plasma BNP levels ≥100 pg/mL, stratified by BMI: BMI <18.5, 18.5≤BMI<25, and BMI ≥25. AUROC, area under the receiver operating characteristics curve; BMI, body mass index; CI, confidence interval; eBNP, estimated brain natriuretic peptide.

Evaluation of the model’s performance under background noise superimposition demonstrated minimal impact on performance at noise level 50 dB, which is comparable to an examiner conversing nearby in a clinical setting (sensitivities on PCG augmented by speech and breath sounds were 82.4 and 82.6%,and specificities were 82.6% and 82.7%, respectively). However, performance declined at 10 dB under louder conditions, such as a person speaking directly into the microphone (sensitivity: 79.1% and 77.4%, specificity: 72.7% and 80.9%, respectively).

Discussion

This study evaluated the performance of the eBNP model, which combined deep learning and GBDT approaches to predicting plasma BNP levels ≥100 pg/mL, using only 8 s of heart sounds and ECG data. The eBNP model demonstrated new potential for a non-invasive method of HF screening and exacerbation detection. In the BMI 18.5–25 group, representing the majority of Asian individuals, the AUROC reached 0.959, with sensitivity and specificity of 92.5% and 84.8%, respectively. In this study, the eBNP model’s detection performance was evaluated using sensitivity and specificity, metrics less influenced by disease prevalence, demonstrating its potential utility for screening a broad range of individuals, including those suspected of having HF. This novel approach to estimating BNP using a compact phonocardiogram device offers simplicity, rapidity, and standardization, making it ideally suited for POCT in HF detection and monitoring.

Early detection of HF is critical,⁴^,⁶ but in Japan, where 28.4% of the population are elderly, routine blood tests or echocardiography for all at-risk individuals is impractical. Although BNP is an essential HF biomarker, blood testing often has a delay in returning results, especially in outpatient settings reliant on external laboratories, posing challenges for rapid clinical decision-making. Traditional auscultation requires expertise and is subjective, lacking objective recordings for comparison. In this study we proposed a novel method for estimating BNP levels non-invasively, validated on diverse datasets from multiple hospitals, highlighting its generalizability. The similarity in Youden Index thresholds between the internal and external validation datasets supports its potential for clinical application. Furthermore, although Grad-CAM-based interpretability analysis of the eBNP model has certain limitations, it confirmed that the model primarily relied on the S1 and diastolic regions for prediction, suggesting physiologically plausible learning. Additionally, external noise evaluation demonstrated minimal impact on performance with background conversation, but excessive noise exposure reduced accuracy, emphasizing the importance of the effect of ambient noise in real-world settings.

Recent studies have explored techniques for predicting parameters such as LVEF and LV early diastolic relaxation velocity from ECG, phonocardiograms, and X-rays.¹³^,¹⁴^,²⁹^,³⁰ Those models offer morphological cardiac assessments, but structural abnormalities alone may not suffice for clinical evaluation as they may not immediately reflect cardiac dysfunction. In contrast, this eBNP model estimates BNP levels from the heart sounds and ECG data, enabling functional evaluation of left cardiac overload. BNP levels rise early in response to cardiac stress and correlate well with HF severity and cardiac dysfunction.³¹^,³² This simple approach provides hemodynamic insights, aiding subsequent diagnostic and therapeutic decisions.

The subgroup analysis by BMI revealed differences in AUROC, sensitivity, and specificity among the 3 BMI categories (low, middle, and high). The middle-BMI group showed the highest AUROC (0.959; 95% CI: 0.921–0.997), suggesting high performance for the majority of the Japanese population, approximately 70%.³³ In contrast, the low- and high-BMI groups showed reduced accuracy, indicating a need for further validation with larger sample sizes. In this study, the order of performance for AUROC and sensitivity was: middle>low>high, and for specificity, the order was: middle>high>low. The reduced performance in the low-BMI group may be attributed to insufficient contact between the device microphone and the anterior chest, potentially reducing the transmission of cardiac vibrations to the sensor and increasing susceptibility to external noise. Clinical settings frequently encounter such challenges, especially in patients with severe weight loss who have irregular anterior chest contours. In the high-BMI group, reduced accuracy may be explained by the inherent characteristics of plasma BNP and the effects of adipose tissue on vibration transmission. Plasma BNP levels in obese HF patients tend to rise less significantly, potentially underestimating the true state of cardiac overload.³⁴^–³⁷ This was supported by the observation that only 34% of the high-BMI group had BNP ≥100 pg/mL, along with the reduced sensitivity rather than specificity in this group. Because the eBNP model predicts BNP indirectly by estimating cardiac overload rather than measuring plasma BNP directly, the BNP values in the training dataset may have been learned as slightly lower than the actual cardiac overload, introducing a “noisy label” issue that likely affected the model’s sensitivity. Additionally, adipose tissue’s high Poisson’s ratio and large absorption coefficient may have caused significant energy loss of sound waves, altering the frequency characteristics of the acquired heart sounds and affecting the eBNP model’s predictive accuracy.³⁸ This is consistent with findings that low-frequency elastic waves in adipose tissue propagate more slowly and exhibit higher attenuation due to their physical properties.³⁸ Taken together, the findings suggest that adding clinically accessible parameters such as BMI to the model could enhance robustness and classification accuracy, offering clinicians a user-friendly and reliable diagnostic tool.

Overall, the findings demonstrated the clinical potential of analyzing heart sounds and ECG data for HF screening and exacerbation monitoring, particularly where invasive tests are impractical or rapid results are necessary. It may help identify undiagnosed HF patients, facilitating timely diagnostic and therapeutic interventions.

Study Limitations and Future Directions

Several limitations of this study should be acknowledged. First, the model output was a binary classification probability rather than a quantitative BNP value. The strong correlation observed between the eBNP model’s output probabilities and the actual BNP values suggests the potential for further development of models for continuous variable prediction, such as regression models. Additionally, this study included only a limited number of patients with NT-pro-BNP measurements, making it difficult to evaluate the model’s accuracy using NT-pro-BNP as a reference. Further validation using NT-pro-BNP data is warranted. Second, this study included outpatients or inpatients who only underwent echocardiography at general hospitals. Additionally, stratified sampling was performed to include patients with a wide range of BNP levels, intentionally incorporating high-BNP value patients to evaluate the model’s classification performance across broad patient characteristics. These approaches may have resulted in a BNP distribution that differs from that encountered in general screening populations. Third, the results of the subgroup analysis indicated that the model’s accuracy may vary depending on BMI, which suggests that the model may not be fully optimized for specific BMI categories. Further improvements that incorporate BMI into the model are necessary and are expected to enhance predictive accuracy. Additionally, this study included a limited number of patients with atrial fibrillation, preventing an adequate assessment of its impact on the eBNP model. Similarly, the influence of other clinical factors has not been thoroughly evaluated, highlighting the need for further investigation. Finally, this study did not assess the model’s prognostic value using follow-up data. The extent to which the proposed model contributes to prognosis prediction remains unclear. Despite these limitations, the results of this study demonstrate new potential for non-invasive HF screening and cardiac overload estimation, paving the way for further refinements through future research.

Conclusions

This study evaluated the eBNP model, which combines deep learning and GBDT approaches, for predicting plasma BNP levels ≥100 pg/mL using heart sounds and ECG data. In an independent external validation dataset, the model demonstrated robust performance, achieving an AUROC of 0.895, with sensitivity and specificity of 84.3% and 82.9%, respectively. Notably, in the population with a BMI 18.5–25, the eBNP model exhibited superior accuracy, with an AUROC of 0.959, and sensitivity and specificity of 92.5% and 84.8%, respectively. These findings suggest that the novel eBNP model provides robust accuracy while enabling a non-invasive approach, supporting its clinical utility in scenarios such as the early diagnosis of HF and monitoring of HF exacerbations.

Acknowledgments

We express our sincere gratitude to the hospitals and volunteers who collaborated to provide beneficial clinical data. Grammarly was used to review English grammar.

Funding

This article is based on results obtained from a project, JPNP18004, subsidized by New Energy and Industrial Technology Development Organization (NEDO).

Disclosures

K.T. and M.O. are members of Circulation Journal’s Editorial Team. K.T., M.O., N.S., M.Y., S.N., N.A., and S.N. received joint research funding or commissioned research funding from AMI Inc. H.S., K.I., Y.Y., and T. Komatsu are employees of AMI Inc., where S.O., S.S., G.S. are board members.

IRB Information

This study was approved by the Institutional Review Board (IRBs) of the Kumamoto University Hospital (Approval No. 2030), Saiseikai Kumamoto Hospital (Approval No. 932), Mitsubishi Kyoto Hospital (Approval No. 21-22), and Kajiki Onsen Hospital (Approval No. 507). Written informed consent was given by all patients.

Data Availability

The deidentified participant data will not be shared.

Supplementary Files

Please find supplementary file(s);

https://doi.org/10.1253/circj.CJ-25-0098

References

Corresponding author

Register with J-STAGE for free!