2025 Volume 74 Issue 2 Pages 293-303
Deep learning in artificial intelligence is a method of algorithmically detecting hidden features in data by training on a large amount of data. This method can generate an accurate decision model in the form of a multi-layered neural network inspired by the neural circuits of the brain. Although automated morphology classification requires high accuracy in differentiating various cell types, it has been reported that some conventional systems using machine learning cannot achieve high accuracy for reactive or neoplastic cells. In this study, we developed models for normal–reactive–abnormal lymphocyte differentiation to demonstrate the usefulness of artificial intelligence-assisted technology in blood morphology testing. Five models using residual neural networks were applied to deep learning, and their performance in automated morphological differentiation was evaluated. The original image set for training consisted of 6,402 typical nucleated blood cell images. A data augmentation process was applied to the original images, and transfer learning and fine-tuning were performed on each model. The subjects for clinical assessment were 25 healthy persons, 25 cases of reactive lymphocytosis, and 15 cases of acute lymphoblastic leukemia. The results of clinical assessments showed that the total accuracy ranges were 0.9433–0.9791 for healthy subjects, 0.8108–0.8425 for reactive lymphocytosis, 0.8248–0.8545 for acute lymphoblastic leukemia, and 0.8645–0.8875 overall. Our proposed artificial intelligence model of lymphocyte morphology differentiation using deep learning achieved a high recognition accuracy. We expect that this approach will be beneficial in developing morphological differentiation assistance technology for blood smear screening.
人工知能技術の一つである深層学習は,人間の脳の神経回路を模した多層構造のニューラルネットワークを用いて大量のデータを学習させることで,データ中の隠れた特徴量を検出し,正確な判断モデルを生成する手法である。自動血液像分類装置には,細胞を正確に鑑別できる認識技術が要求されるが,従来の機械学習では,反応性や腫瘍性細胞に対して高い精度が得られない場合があることが報告されている。我々は血液形態検査における人工知能支援技術の有用性を明らかにするため,正常-反応性-異常リンパ球鑑別人工知能モデルを作成し,その有用性を評価した。5種のresidual neural networksを深層学習に適用した。学習用データは6,402枚の有核血球画像で構成した。学習データはデータ拡張処理を行い,転移学習とfine-tuningを行った。臨床評価の対象は,健常人25例,反応性リンパ球増多症25例,急性リンパ性白血病15例とした。Total accuracyの最低値-最高値は健常者0.9433–0.9791,反応性リンパ球増多症0.8108–0.8425,急性リンパ芽球性白血病0.8248–0.8545であり,全症例で0.8645–0.8875であった。リンパ球形態鑑別人工知能モデルは高い認識パフォーマンスを示し,本アプローチは血液塗抹標本スクリーニングにおける有用な形態鑑別支援技術と考えられた。
In recent years, artificial intelligence (AI) has been rapidly developing as a technology capable of making decisions that reflect human thought processes. While machine learning (ML) and deep learning (DL) are both major architectures of AI, there are notable differences between them. Traditional ML models usually rely on feature engineering, where humans manually extract relevant features from small or medium datasets. In contrast, DL models, specifically neural networks, can automatically learn with hierarchical representations from big datasets. They can learn intricate features directly from raw input data, reducing the need for extensive manual feature engineering.1),2) In the field of medicine, diagnostic or prognostic systems incorporating AI technology are being developed, with a particular focus on diagnostic imaging. Several AI systems have progressed from the research stage to the clinical stage, and their analysis accuracy has reportedly outperformed that of conventional technologies.3),4) In the field of hematology, many efforts have been initiated to apply DL to differentiate blood cells that are difficult to recognize by ML algorithms (e.g. k-nearest neighbor, support vector machines, Naïve Bayes, and decision trees) in peripheral blood or bone marrow smears, and their high-accuracy performance in morphology recognition has been reported.5),6) Nevertheless, AI research on blood morphology differentiation has been limited to classifying mature leukocyte cells, identifying neoplastic cells from normal cells, or differentiating among specific diseases. Very few approaches with DL have been reported for blood cells that cause normal–reactive–neoplastic changes for lymphocytosis, which are required at the initial screening.7),8) Even with advances in automated technology, most lymphocyte differentiation in lymphocytosis with polymorphic changes requires double-checking by a hematologist, and innovative breakthroughs using next-generation image recognition technology are essential to improve the accuracy of blood cell differentiation. The establishment of image recognition technology capable of differentiating with high accuracy between normal, reactive, and neoplastic series of morphological changes in lymphocytosis will contribute to the efficiency of clinical testing and the early detection of related diseases. In this study, we developed AI models for normal–reactive–abnormal lymphocyte differentiation. Furthermore, the generated AI model for morphology differentiation was clinically assessed in peripheral blood smear screening to clarify the usefulness of AI-assisted technology in lymphocytosis.
Peripheral venous blood from patients undergoing medical examinations at Hirosaki University Hospital was used to create the deep learning model and for clinical assessment of the AI models.
1. Subjects for Supervised TrainingThe study subjects for the supervised training were 100 healthy cases, 50 cases with the appearance of erythroblasts, 50 cases of reactive lymphocytosis (RL), and 25 cases of acute lymphoblastic leukemia (ALL). RL or ALL cases were defined as having a cutoff value of 3% or higher for the appearance of reactive lymphocytes in peripheral blood. Thin-layer blood smears were prepared from peripheral venous blood supplemented with ethylenediaminetetraacetic acid dipotassium salt dihydrate (EDTA-2K).
2. Subjects for Clinical AssessmentThe subjects for the clinical assessment were 25 healthy cases, 25 RL cases, and 15 ALL cases. Thin-layer blood smears were prepared from peripheral venous blood supplemented with EDTA-2K.
3. Hardware and Software for DLThe hardware used in this study consisted of a system equipped with an Intel Core i7-12700 3.6 GHz CPU and an NVIDIA GeForce RTX 3090 Ti GPU with 24 GB of VRAM. Neural Network Libraries v1.35.0 (Sony Network Communications), Anaconda 3.0, and Python 3.5 were used for the AI modeling. The residual neural network (ResNet) model was applied for DL.9)
In this study, we proceeded from the creation of the AI model to the evaluation of the AI model in the following steps: (1) preparation of thin blood smears, (2) capturing microscopic images, (3) assignment of correct labels to the microscopic images, (4) creation of deep learning models, and (5) validation of the accuracy of the AI models with images for clinical assessment. The outline of this study is shown in Figure 1.
Thin-layer blood smears underwent May–Grünwald–Giemsa (MGG) staining. The May–Grünwald and Giemsa solutions were manufactured by Merck & Co., Inc. (Rahway, NJ, USA). The validity of MGG staining was determined by three experts’ pass/fail decisions on healthy control specimens.
2. Microscopic ImagingThe MGG-stained smears were observed under a microscope (Scope A1; Carl Zeiss) using an objective oil immersion lens (100×).
1) Supervised Training ImagesImages of nucleated blood cells (leukocytes or erythroblasts) were captured (200–300 images per smear slide) using a microscope color camera (Axiocam ERc5s; Carl Zeiss) and saved in PNG format (1920 × 2560 pixels). These images were then trimmed to 750 × 750 pixels so that each image contained one cell.
2) Clinical Assessment ImagesImages of nucleated blood cells (leukocytes or erythroblasts) were captured (100–200 and 200–300 images per smear slide for healthy cases and RL or ALL cases, respectively) using a microscope color camera and saved in PNG format (1920 × 2560 pixels). These images were then trimmed to 750 × 750 pixels so that each image contained one nucleated blood cell.
3. Labeling of Blood Cell ImagesAll microscopic images were classified into the following eight categories by three experts (hematologists or clinical laboratory technologists): neutrophil (Neut), eosinophil (Eo), basophil (Baso), monocyte (Mono), normal lymphocyte (Lymph), reactive lymphocyte (R-lymph), lymphoblast (L-blast), or erythroblast (orthochromatic or polychromatic erythroblasts; EB). Cell categories with consistent morphological differentiation by three experts were assigned as the correct answer labels for the blood cell images. The leukocyte classification criteria followed the guidelines “Shared standard range for leukocyte visual morphology classification” designated by the Japanese Society for Laboratory Hematology Committee for Standardization.10)
4. Preparation of Datasets for Supervised TrainingIn total, 50,000 nucleated blood cell images were captured using a microscope camera for supervised training. From these, 6,402 images exhibiting a typical morphology in each cell category were randomly selected to align the balance of the number of cell images for each category in the datasets. The blood cell configurations in the dataset are shown in Table 1A. Next, 80% of these randomly extracted cell images were used for the training; these images were further reduced from a resolution of 750 × 750 pixels to a resolution of 480 × 480 pixels. The remaining 20% were used for hold-out validation; these images were further reduced to a resolution of 320 × 320 pixels.
5. Generation of AI Models for the Differentiation of Lymphocytosis CellsThis study used five ResNet models with 18, 34, 50, 101, and 152 layers to determine the best combination of the number of layers and optimizer for deep learning models in lymphocyte differentiation. The convolutional neural networks (CNNs) structure used is for DL shown in Table 1B. Data augmentation11) was performed to increase the number of training images in all ResNet models. Rotation, inversion, shift, or slice processing was applied to a randomly selected original image. Transfer learning and fine-tuning were performed at 500 epochs using the training data set and six types of optimizers (AdaBound, AdaGrad, Adadelta, AdaBelief, AMSBound, and AMSGrad). The hyperparameters are shown in Table 1C. We performed hold-out validation experiments with all AI models and calculated the total accuracy, recall, precision, and F1-score. The layer model with the highest total accuracy in the hold-out validation experiments was selected as the best model for clinical assessment. Total accuracy, recall, precision, and F1-score were calculated as follows:
Total accuracy = number of correct answers ÷ number of data
Recall = true positives ÷ (true positives + false negatives)
Precision = true positives ÷ (true positives + false positives)
F1-score = 2 × recall × precision ÷ (recall + precision).
6. Clinical Assessment of AI Models for the Differentiation of Lymphocytosis CellsA total of 11,403 images of nucleated blood cells were captured with a microscope camera for clinical assessment. The blood cell configurations used for the clinical assessment are shown in Table 1A. The resolution of all images was reduced from 750 × 750 pixels to 320 × 320 pixels. We performed a clinical assessment using the best model in each layer and calculated the total accuracy, recall, precision, and F1-score. The probability values in the classification inference of the nucleated blood cell images were calculated using each AI model, and the classification inference value was defined as the cell category with the highest probability value. Statistical analysis was performed using IBM SPSS Statistics 29.
7. Visualization of Lymphocyte Recognition Factors in Leukocyte Classification Using Explainable AIVisualization analysis using the local interpretable model-agnostic explanation (LIME) method was performed on images that did or did not match the classification in the clinical assessment.12)
The total accuracy distribution for each ResNet model in the holdout validation is shown in Figure 2. The highest value for each layer model ranged from 0.9352 to 0.9438. The AMSBound method showed the highest total accuracy for all models except the 34-layer model. All models except the 101-layer model with AdaBelief showed a total accuracy of 0.920 or better.
AI analysis indices for each layer model in the clinical evaluation are shown in Table 2. The total accuracy ranges were 0.9433–0.9791 for healthy subjects, 0.8108–0.8425 for RL cases, 0.8248–0.8545 for ALL cases, and 0.8645–0.8875 for the mean. The total accuracy increased with an increasing number of layers, with the 152-layer model showing the highest accuracy. The ranges of average F1-scores were 0.8648–0.9293 for healthy subjects, 0.7670–0.8455 for RL cases, 0.7543–0.7920 for ALL cases, and 0.7954–0.8447 for the mean. The F1-score increased with an increasing number of layers, with the 152-layer model showing the highest score.
Layer | Index | Healthy | Reactive lymphocytosis | Acute lymphoblastic leukemia | Average |
---|---|---|---|---|---|
18 | Total accuracy | 0.9645 | 0.8108 | 0.8306 | 0.8687 |
Ave-Recall | 0.9766 | 0.8580 | 0.7704 | 0.8683 | |
Ave-Precision | 0.9019 | 0.7778 | 0.8032 | 0.8277 | |
Ave-F1-score | 0.9293 | 0.7764 | 0.7675 | 0.8244 | |
34 | Total accuracy | 0.9433 | 0.8254 | 0.8248 | 0.8645 |
Ave-Recall | 0.9668 | 0.8912 | 0.8298 | 0.8959 | |
Ave-Precision | 0.8461 | 0.7164 | 0.7277 | 0.7634 | |
Ave-F1-score | 0.8648 | 0.7670 | 0.7543 | 0.7954 | |
50 | Total accuracy | 0.9770 | 0.8264 | 0.8323 | 0.8785 |
Ave-Recall | 0.9881 | 0.8844 | 0.8176 | 0.8967 | |
Ave-Precision | 0.8860 | 0.7975 | 0.7539 | 0.8125 | |
Ave-F1-score | 0.9239 | 0.8211 | 0.7765 | 0.8405 | |
101 | Total accuracy | 0.9785 | 0.8425 | 0.8348 | 0.8853 |
Ave-Recall | 0.9837 | 0.8903 | 0.8011 | 0.8917 | |
Ave-Precision | 0.8876 | 0.8319 | 0.7488 | 0.8228 | |
Ave-F1-score | 0.9179 | 0.8455 | 0.7640 | 0.8425 | |
152 | Total accuracy | 0.9791 | 0.8287 | 0.8545 | 0.8875 |
Ave-Recall | 0.9868 | 0.8863 | 0.8147 | 0.8959 | |
Ave-Precision | 0.8857 | 0.7961 | 0.7766 | 0.8194 | |
Ave-F1-score | 0.9211 | 0.8211 | 0.7920 | 0.8447 |
The results of the differentiation accuracy for the Lymph, R-lymph, and L-blast categories are given in Table 3. The range of the recall was Lymph 0.9121–0.9723 in healthy subjects; Lymph 0.7883–0.8874 and R-lymph 0.5894–0.7209 in RL cases; Lymph 0.6957–0.7671, R-lymph 0.4242–0.6970, and L-blast 0.8605–0.9238 in ALL cases; and 0.7399–0.7880 in all lymphocyte categories. The range of the F1-score was Lymph 0.9536–0.9860 in healthy subjects; Lymph 0.8196–0.8584 and R-lymph 0.7000–0.7415 in RL cases; Lymph 0.7668–0.7992, R-lymph 0.4035–0.5385, and L-blast 0.8626–0.8892 in ALL cases; and 0.7658–0.7923 in all lymphocyte categories.
Layer | Index | Healthy | Reactive lymphocytosis | Acute lymphoblastic leukemia | Average | |||
---|---|---|---|---|---|---|---|---|
Lymph | Lymph | R-lymph | Lymph | R-lymph | L-blast | |||
18 | Recall | 0.9121 | 0.8637 | 0.6199 | 0.6957 | 0.4242 | 0.9238 | 0.7399 |
Precision | 0.9991 | 0.7798 | 0.8260 | 0.8856 | 0.7368 | 0.8273 | 0.8424 | |
F1-score | 0.9536 | 0.8196 | 0.7082 | 0.7792 | 0.5385 | 0.8729 | 0.7787 | |
34 | Recall | 0.9210 | 0.7883 | 0.7209 | 0.7401 | 0.6970 | 0.8605 | 0.7880 |
Precision | 0.9991 | 0.8809 | 0.7633 | 0.8626 | 0.2840 | 0.8648 | 0.7758 | |
F1-score | 0.9585 | 0.8320 | 0.7415 | 0.7967 | 0.4035 | 0.8626 | 0.7658 | |
50 | Recall | 0.9617 | 0.8436 | 0.6136 | 0.6831 | 0.5606 | 0.9117 | 0.7624 |
Precision | 0.9992 | 0.8500 | 0.8147 | 0.8739 | 0.4933 | 0.8517 | 0.8138 | |
F1-score | 0.9801 | 0.8468 | 0.7000 | 0.7668 | 0.5248 | 0.8807 | 0.7832 | |
101 | Recall | 0.9707 | 0.8874 | 0.5894 | 0.7217 | 0.4394 | 0.8940 | 0.7504 |
Precision | 0.9983 | 0.8313 | 0.8717 | 0.8768 | 0.5000 | 0.8413 | 0.8199 | |
F1-score | 0.9843 | 0.8584 | 0.7033 | 0.7917 | 0.4677 | 0.8668 | 0.7787 | |
152 | Recall | 0.9723 | 0.8222 | 0.6628 | 0.7671 | 0.5000 | 0.9082 | 0.7721 |
Precision | 1.0000 | 0.8692 | 0.8037 | 0.8340 | 0.5156 | 0.8710 | 0.8156 | |
F1-score | 0.9860 | 0.8451 | 0.7265 | 0.7992 | 0.5077 | 0.8892 | 0.7923 |
Table 4 shows the confusion matrix using the highest accuracy 152-layer ResNet model. In healthy subjects, 1.2% (15/1,228 cells) were misclassified between Lymph and Mono (Table 4A). In RL cases, 20.5% (669/3,267 cells) were misclassified among Mono, Lymph, and L-blast (Table 4B). In ALL cases, 11.1% (279/2,506 cells) were misclassified between Lymph and L-blasts (Table 4C).
Experts/AI | Neut | Eo | Baso | Mono | Lymph | R-lymph | L-blast | EB | Recall |
---|---|---|---|---|---|---|---|---|---|
Neut | 1,720 | 0 | 30 | 0 | 0 | 0 | 0 | 0 | 0.9829 |
Eo | 0 | 48 | 0 | 0 | 0 | 0 | 0 | 0 | 1.0000 |
Baso | 0 | 0 | 41 | 0 | 0 | 0 | 0 | 0 | 1.0000 |
Mono | 0 | 0 | 0 | 140 | 0 | 3 | 0 | 0 | 0.9790 |
Lymph | 1 | 0 | 7 | 15 | 1,194 | 1 | 5 | 5 | 0.9723 |
R-lymph | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | — |
L-blast | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | — |
EB | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | — |
Precision | 0.9994 | 1.0000 | 0.5256 | 0.9032 | 1.0000 | — | — | — | |
F1-score | 0.9911 | 1.0000 | 0.6891 | 0.9396 | 0.9860 | — | — | — |
Experts/AI | Neut | Eo | Baso | Mono | Lymph | R-lymph | L-blast | EB | Recall |
---|---|---|---|---|---|---|---|---|---|
Neut | 1,010 | 0 | 18 | 2 | 1 | 1 | 0 | 0 | 0.9787 |
Eo | 0 | 43 | 0 | 0 | 0 | 1 | 0 | 0 | 0.9773 |
Baso | 0 | 0 | 18 | 0 | 0 | 0 | 0 | 0 | 1.0000 |
Mono | 0 | 0 | 0 | 640 | 37 | 53 | 0 | 0 | 0.8767 |
Lymph | 3 | 1 | 3 | 190 | 1,767 | 126 | 48 | 11 | 0.8222 |
R-lymph | 1 | 0 | 0 | 125 | 228 | 741 | 23 | 0 | 0.6628 |
L-blast | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | — |
EB | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | — |
Precision | 0.9961 | 0.9773 | 0.4615 | 0.6688 | 0.8692 | 0.8037 | — | — | |
F1-score | 0.9873 | 0.9773 | 0.6316 | 0.7587 | 0.8451 | 0.7265 | — | — |
Experts/AI | Neut | Eo | Baso | Mono | Lymph | R-lymph | L-blast | EB | Recall |
---|---|---|---|---|---|---|---|---|---|
Neut | 389 | 0 | 8 | 1 | 0 | 0 | 0 | 1 | 0.9749 |
Eo | 0 | 3 | 1 | 0 | 0 | 0 | 0 | 0 | — |
Baso | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | — |
Mono | 0 | 0 | 0 | 130 | 31 | 1 | 4 | 0 | 0.7831 |
Lymph | 2 | 1 | 3 | 40 | 794 | 21 | 166 | 8 | 0.7671 |
R-lymph | 0 | 0 | 0 | 1 | 14 | 33 | 18 | 0 | 0.5000 |
L-blast | 3 | 0 | 3 | 1 | 113 | 9 | 1,276 | 0 | 0.9082 |
EB | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 21 | 0.9545 |
Precision | 0.9873 | — | — | 0.7514 | 0.8340 | 0.5156 | 0.8710 | 0.7000 | |
F1-score | 0.9811 | — | — | 0.7670 | 0.7992 | 0.5077 | 0.8892 | 0.8077 |
A: Healthy subject, B: RL case, C: ALL case
The results of the image analysis of Lymph, R-lymph, and L-blast by the LIME method are shown in Figure 3. Figures 3A, B, and C show representative lymphocyte images in which the experts and AI models yielded similar differentiation results; in these cases, the AI model recognized the nucleus and cytoplasm as feature regions of interest. In contrast, Figures 3D and E show representative lymphocyte images in which the experts and AI models were in disagreement; here, the AI model classified the nucleus as the feature region of interest in the lymph category but did not recognize the cytoplasm as a feature region of interest. R-lymph and L-blast, which were misclassified into the Lymph category, were found to recognize only the nucleus or background erythrocytes as regions of interest.
A, B, C: Leukocytes with matched decisions between experts and AI models
D, E: Leukocytes with mismatched decisions between experts and AI models
A: Lymph, A-1: Original image, A-2: LIME analysis image (Lymph interest feature region)
B: R-lymph, B-1: Original image, B-2: LIME analysis image (R-lymph interest feature region), B-3: LIME analysis image (Lymph interest feature region)
C: L-blast, C-1: Original image, C-2: LIME analysis image (L-blast interest feature region), C-3: LIME analysis image (Lymph interest feature region)
D: R-lymph, D-1: Original image, D-2: LIME analysis image (R-lymph interest feature region), D-3: LIME analysis image (Lymph interest feature region)
E: L-blast, E-1: Original image, E-2: LIME analysis image (L-blast interest feature region), E-3: LIME analysis image (Lymph interest feature region)
Our AI modeling framework for lymphocyte differentiation yielded a maximum total accuracy of 0.9438 in the holdout validation. In contrast, the model showed a total accuracy of 0.9791 for healthy subjects, 0.8425 for RL, and 0.8545 for ALL in the clinical assessment, with a 0.13-point difference in accuracy for both RL and ALL cases relative to the healthy subjects. The recall for the R-lymph category in the RL-case group was 0.6628, and the cell categories inferred by the AI for the mismatched cells were mostly Mono (11.2%, 125/1,118 cells) and Lymph (20.4%, 228/1,118 cells). The morphological features of these mismatched cells were Type I (monocyte-like) cells in the Downey classification, the classic morphological classification of reactive lymphocytes. The recall of the L-blast category in the ALL-case group was 0.9082, and the cell category inferred by AI for the mismatched cells was mostly Lymph (8.0%, 113/1,402 cells). These results mean that mature normal lymphocytes have different morphologies in terms of size ranging from small to large and the presence or absence of intracytoplasmic granules, whereas the AI model has high differentiation accuracy. Similarly, the AI model also has high differentiation accuracy in lymphoblasts, which are monomorphic changes. In contrast, the AI model has poor differential accuracy in reactive lymphocytes with polymorphic changes. Kawakami et al.13) studied a method for differentiating R-lymph by combining automated cell imaging and DL methods and reported that R-lymph contains a morphologically diverse group of cells, ranging from cells with typical features to cells with features similar to Lymph or Mono. Brereton et al.14) studied virtual microscopy to analyze approaches and decision-making in hematology morphology and reported that neoplastic misclassification of reactive specimens occurred in 10% to 26% of cases. They concluded that in cases with little to no morphological abnormalities, accurate cell identification and classification were the primary requirements for success, whereas in reactive and abnormal cells with more complex forms, feature recognition and prioritization were the most important. The ResNet applied in this study identifies hidden features through convolution processing and achieves highly accurate image recognition based on fine features through super-deepening. Nevertheless, R-lymph is a cell group that includes polymorphic changes (monocyte-like cells, plasmacytoid cells, and lymphoblast-like cells), as described in the classical Downy classification. This suggests that creating a single linear model capable of differentiating R-lymph with high accuracy is limited in its ability to improve accuracy, even by ultra-deepening. Because the weighting in DL is automatically adjusted by error backpropagation, it is not possible to prioritize certain features as in conventional machine learning. In this regard, Nozaka et al.15) reported that the accuracy of blood cell classification can be improved by connecting multiple DL models that progressively reduce the number of blood cell categories to be differentiated. This report suggests the possibility of improving the accuracy of AI models by connecting DL models with conventional machine learning models. The combination of different learning methods should be considered for morphologically similar cell differentiation. In ALL cases, L-blasts were commonly misclassified into the Lymph category, and the morphological features of L-blast cells misclassified by AI tended to have a high nuclear/cytoplasm (N/C) ratio, distinct nucleoli, and a uniform distribution of chromatin in the nucleus (Figure 3E). This suggests that cells with a high N/C ratio may be useful in the revalidation of nuclear structures in the differentiation of L-blast cells. For misclassified cells of this type, Abir et al.16) proposed the implementation of explainable AI (XAI) in ALL case differentiation, and the usefulness of visualizing the causes of leukocyte misclassification by XAI has been reported in similar blood cell differentiation studies.17)–19) The results of LIME analysis in the classification-matched and -mismatched cells between experts and AI in this study indicate that the differentiation of lymphocyte groups by AI can be categorized into three patterns: (1) recognition of both the nucleus and cytoplasm as regions of interest, (2) recognition of only the nucleus and not the cytoplasm as a region of interest, and (3) recognition of background cells as regions of interest. The recognition of the nucleus and cytoplasm as regions of interest was confirmed to be an essential element in the differential match between experts and AI. Therefore, it is considered possible to avoid misclassification by AI by tentatively identifying visualization patterns matching (2) or (3) as cells to be reconfirmed and double-checking them with experts.
In the present study, our ResNet-based AI model for lymphocyte morphology differentiation achieved a total accuracy of 0.9791 for healthy subjects, 0.8425 for RL, and 0.8545 for ALL in the clinical assessment. The models showed high recognition accuracy in differentiating normal and neoplastic blood cells with monomorphic changes, though there remains room for improvement in differentiating reactive blood cells with polymorphic changes. Nevertheless, the ResNet-based AI model for lymphocytosis achieves high performance with a cellular differentiation accuracy of more than 85% in lymphoid diseases and is expected to contribute to highly accurate automated classification in peripheral blood morphology screening.
This work was presented at the 36th International Symposium on Technical Innovations in Laboratory Hematology on May 11–13, 2023 in New Orleans, USA.
Institutional Review Board Statement: This study was conducted in accordance with the Declaration of Helsinki. It was approved by the Committee of Medical Ethics of Hirosaki University Graduate School of Medicine (Approval No. 2021-044) and carried out according to the ethical guidelines for medical and biological research involving human subjects.
Informed Consent Statement: Informed consent was obtained from all subjects in the form of an online opt-out (https://www.med.hirosaki-u.ac.jp/hospital/outline/resarch.html).
Acknowledgments: The authors thank Sayaka Souma, Shizuku Hirano, Suzuka Kaga, Niina Sakaiya, and Shou Kimura for their technical assistance with the experiments. We are also grateful to the referees for their helpful comments.
Funding: This study was supported by Grants-in-Aid for Scientific Research (JSPS KAKENHI; Grant Nos. 19K21737, 21H00894, 22K18573, and 22K02799).
There is no potential conflict of interest to disclose.