Article ID: CR-25-0180
Background: Reliable assessment of pitting edema remains a challenge, especially in remote care, because it is inherently subjective. We developed a video-based deep learning (DL) model to objectively classify the severity of pitting edema.
Methods and Results: A total of 247 videos from 34 consecutive hemodialysis patients were analyzed. A convolutional neural-network (EfficientNetB0) was trained using pre and postpressing pretibial images graded on a 0–4 scale. The model achieved 81.5% accuracy, 81.2% sensitivity, and 81.9% specificity in distinguishing grades 3–4 edema from grades 0–1. For extreme cases (grade 0 vs. 4), accuracy improved to 85.8%.
Conclusions: This pilot study demonstrated feasibility of video-based DL for edema detection. Larger, more diverse datasets and clinical validation are needed for generalization.

Study outline. (A) Top and side views of the imaging setup (Left). (Top right) edema labeling process. (Bottom right) The EfficientNet-B0 architecture (Architecture2). (B) Accuracy of Architecture2 over increasing epochs.
Leg edema is a fundamental clinical indicator of fluid overload,1 but visual inspection and manual palpation are inherently subjective. Deep learning (DL) is enabling automated image processing, even in the medical field,2,3 so we developed a video-based DL model to objectively classify the severity of pitting edema.
This study followed the principles of the Declaration of Helsinki. We enrolled consecutive adults undergoing hemodialysis at Juntendo University Hospital between July and September 2021. Patients were instructed to press their own lower leg (over the tibial crest) for 3 s during video recording before and after the dialysis sessions. From each video, the pre- and post-pressing images were extracted and trained personnel labeled them using a standardized pitting edema scale (grades 0–4). Grade 2 was excluded from training because of ambiguity and poor inter-rater agreement. The resulting binary classification task compared “no edema” (grades 0–1) with “definite edema” (grades 3–4).
We applied a transfer learning technique using an EfficientNet-B0 architecture pretrained on over 1,000,000 images from ImageNet. Architecture1 with only post-pressing images and Architecture2 with both pre- and post-pressing images (Central Figure) were trained using 3-fold cross-validation. To mitigate overfitting, it was trained using the Adam optimizer with data augmentation (random 30-degree rotations and random horizontal/vertical flips).
In total, 247 videos were recorded from 34 patients (median age, 68 [interquartile range 58–74] years, 35% female, median B-type natriuretic peptide 132 [91–213] pg/mL and body weight 64.5 [54.1–76.1] kg before dialysis).
The DL model (Architecture1), which used only post-pressing images to distinguish grades 3–4 from grades 0–1, achieved an accuracy of 69.0%. However, when both pre- and post-pressing images were used (Architecture2), the performance significantly improved, reaching 81.5% accuracy, 81.2% sensitivity, and 81.9% specificity. In the most polarized subset (grade 0 vs. 4), accuracy, sensitivity, and specificity reached 85.7%, 77.1%, and 100.0%, respectively, when both types of images were used.
Other methods of automating edema assessment use technologies such as bioimpedance or ultrasound, which require specialized equipment and lack scalability. In contrast, we used accessible video data and DL, incorporating dynamic changes to enable more practical and refined edema assessment. Furthermore, in the post-COVID-19 era, when demand for remote monitoring is increasing,4 such a system allows patients to monitor their condition at home and share data with their healthcare providers.
We acknowledge several limitations. First, the data size would generally be considered too small to train DL models. Although we used a number of techniques to mitigate the risk of overfitting, further research with larger datasets is necessary. Second, as the study population consisted exclusively of hemodialysis patients, the findings may not fully generalize to other clinical populations. Next, our study was limited by the exclusion of intermediate edema. This was a deliberate choice to reduce labeling ambiguity in this proof-of-concept stage, but we acknowledge it limits the clinical applicability. Furthermore, the ground truth for labeling was based on visual assessment from videos. This approach was chosen to align with the study’s goal of developing a remote-monitoring tool, but it means future validation against a clinical gold standard is necessary to ensure real-world accuracy.
This pilot study presents a low-cost and scalable method to quantify leg edema using DL. Further studies with larger, more diverse datasets and clinical validation are needed to advance this approach.
N.K. and T. Kasai were affiliated with a department endowed by Paramount-Bed. N.K. receives honorarium from Bristol-Myers-Squibb, Novartis, Otsuka-Pharma, Eli-Lilly, and Nippon-Boehringer-Ingelheim, and grants from Bristol-Myers-Squibb, AMI, EchoNous, and AstraZeneca. H.D. accepted remuneration from Daiichi-Sankyo, Kowa, MSD, Novartis-Pharma, and Bayer-Yakuhin, Sanofi-KK, Taisho-Pharmaceutical, Abott-Medical, Otsuka; grants from Fukuda-Denshi, Philips-Japan, Toho-Holdings, Asahi-Kasei, Inter-Reha, KYOCERA, Glory; and scholarship from Daiichi-Sankyo, Philips-Japan, and is a Circulation Reports’ Editorial Board member.
This study was sponsored by GLORY LTD.
Name of the ethics committee: Institutional-Review-Board of Juntendo University. Reference number: H21-0069.