The Tohoku Journal of Experimental Medicine
Online ISSN : 1349-3329
Print ISSN : 0040-8727
ISSN-L : 0040-8727
Regular Contributions
Deep Learning-Based Nuclear Lobe Count Method for Differential Count of Neutrophils
Mayu YabutaIori NakamuraHaruhi IdaHiromi MasauziKazunori OkadaSanae KagaKeiko MiwaNobuo Masauzi
Author information
JOURNAL OPEN ACCESS FULL-TEXT HTML

2021 Volume 254 Issue 3 Pages 199-206

Details
Abstract

Differentiating neutrophils based on the count of nuclear lobulation is useful for diagnosing various hematological disorders, including megaloblastic anemia, myelodysplastic syndrome, and sepsis. It has been reported that one-fifth of sepsis-infected patients worldwide died between 1990 and 2017. Notably, fewer nuclear-lobed and stab-formed neutrophils develop in the peripheral blood during sepsis. This abnormality can serve as an early diagnostic criterion. However, testing this feature is a complex and time-consuming task that is rife with human error. For this reason, we apply deep learning to automatically differentiate neutrophil and nuclear lobulation counts and report the world’s first small-scale pilot. Blood films are prepared using venous peripheral blood taken from four healthy volunteers and are stained with May–Grünwald Giemsa stain. Six-hundred 360 × 363-pixel images of neutrophils having five different nuclear lobulations are automatically captured by Cellavision DM-96, an automatic digital microscope camera. Images are input to an original architecture with five convolutional layers built on a deep learning neural-network platform by Sony, Neural Network Console. The deep learning system distinguishes the four groups (i.e., band-formed, two-, three-, and four- and five- segmented) of neutrophils with up to 99% accuracy, suggesting that neutrophils can be automatically differentiated based on their count of segmented nuclei using deep learning.

Introduction

Visual observations of blood-cell morphology comprise basic routine tests performed at clinical laboratories, rendering them of the highest clinical significance for blood-cell diagnoses. It is possible to distinguish among various pathological problems with the information obtained from the observations. In our laboratory, six types of neutrophils, including band-formed, segmented, lymphocyte, monocyte, eosinophil, and basophil are identified using texture analyses and gray-level co-occurrence matrices (Kono et al. 2018). However, it is extremely difficult to distinguish between neutrophils based on the number of nuclear lobulations. Nevertheless, such a method is useful for the diagnosis of various blood diseases, such as megaloblastic anemia and myelodysplastic syndrome. Moreover, it is useful for identifying sepsis (Chan et al. 2010), which has been recognized as one of the most serious menaces to mankind. It has been reported that one-fifth of the infected patients worldwide died of sepsis between 1990 and 2017 (Rudd et al. 2020). The criteria needed to diagnose sepsis includes a left sifting of the neutrophil (Ishimine et al. 2013) because the count of band-formed and/or smaller lobulated nuclear neutrophils increases in the peripheral blood infected with sepsis; this is a criteria which can be used for early diagnosis (Bernstein and Rucinski 2011; Mare et al. 2015; Farkas 2020). The automatic differentiation of neutrophils based on nuclear lobulation is extremely helpful; however, a visual examination under a microscope is a complex and time-consuming process for the clinical staff. To hone the skills for accurate discrimination, tremendous experience and skill is required (Kikuchi et al. 1995). However, the objectivity and reproducibility of such testing suffers due to human factors such as the fatigue and biases of the examiners. To resolve these problems, studies on automated analysis of neutrophil images using various image-analysis techniques have been reported (CellaVision 2019; Medica Corporation 2020; West Medica 2020). Standard machine learning image analysis techniques for blood cells have thus far been conducted using image segmentation, feature extraction, and automatic classification (Puigvi et al. 2017; Rodellar et al. 2018; Merino et al. 2018). These processes require smart analysis design and significant coding efforts to handle the large calculations required for the implementation. Therefore, we focused on deep-learning methods of machine learning which were modeled on human neurological systems (Chollet 2017. Deep learning algorithms automatically adjust parameters in their own network to minimize the differences between estimations and ground truth, and thus, to arrive at optimal solutions (Saito 2016; Sony Network Communications Inc. 2017; Wakui and Wakui 2017).

Several deep learning studies on the discrimination of peripheral blood leucocytes have been conducted. Shahin et al. (2019) reported a high accuracy for the five types of peripheral blood neutrophils using two convolutional neural-network (CNN) architectures; the accuracy achieved was 91.2% and 84.9% for each architecture. Acevedo et al. (2019) reported the results of distinguishing eight classes of normal peripheral blood cells, including segmented and band-formed nuclear neutrophils, using two different CNN architectures and achieved an accuracy of 96% and 95%. Nevertheless, these two reports neither identified the number of segmented nuclear lobes nor differentiated the number of segmented neutrophils.

In this study, we developed a new technology for automatically distinguishing the nuclear lobulation of neutrophil images using deep learning and reported the world’s first results of a small-size pilot study. Moreover, we compared various improvement effects of training-data augmentation.

Materials and Methods

Blood films from four healthy volunteers were prepared using venous peripheral blood stained with May-Grünwald Giemsa stain. Six hundred images (360 × 363 pixels) of neutrophils having five different nuclear lobulations [i.e., band-formed (band), two-segmented (2-seg), three-segmented (3-seg), four- and five-segmented (4- and 5-seg) nuclei] were automatically captured using CellaVision® DM96 equipment (CellaVision Japan, Tokyo, Japan). This equipment can automatically capture over hundreds of single neutrophil images in the center of each picture from a considerable number of peripheral blood films within several hours. The outputs of the classifier of DM96 were not used to label the training or testing images in this study. Instead, a rating team comprising three medical technologist students and a board hematologist classified the leukocyte images into the five groups.

Prior to combining the 4- and 5-seg groups, 500 images were randomly selected to compile the training data (Pre-TR); the remaining 100 images were used for pre-testing (Pre-TE) (Table 1). However, we found that, considered separately, the features were too sparse to draw border lines between 4-seg and 5-seg nuclei. To combine them, we captured 300 additional images for each of the two cell groups. The process that led us to arrive at this decision is further explained in the subsequent section. From a total of 2,400 images, 500 images representing the final four groups were randomly selected to compile the A-training dataset (ATR). The remaining 400 images constituted the A-testing dataset (ATE).

The criterion for the determination of the neutrophil’s nuclear lobulations, especially, the distinction between band- and segmented-neutrophils was published (Takami et al. 2021) by the Japanese Society for Laboratory Hematology and Japanese Association of Medical Technologists; this criterion has been widely accepted as the standard among actual observers working at clinical laboratories in Japan. Initially, we adopted this criterion to determine the correct label for each neutrophil’s image in the Pre-TR and Pre-TE (Table 1). However, we employed a clearer criterion for nuclear lobulation and rearranged the ATR and ATE such that a lobulated nucleus would be defined as connected by a thin thread of chromatin (Palmer et al. 2015) to enable selection of only typical images for each cell type.

Each copy of the 2,000 images in the ATR (Fig. 1A) was vertically inverted (Fig. 1B), horizontally inverted (Fig. 1C), and vertically and horizontally inverted (Fig. 1D). All 6,000 images were subsequently added to the ATR. The new total of 8,000 images served as the B-training (BTR) set. Each image in the ATR was randomly rotated right or left by 90° (Fig. 1E) and distorted (Fig. 1F). Subsequently, their aspect ratios were changed (Fig. 1G), and the 6,000 images were added to the ATR. The resultant 8,000 images served as the C-training (CTR) set. Two sets of the augmented 3,000 BTR and CTR images were then added to the ATR. The new total of 14,000 images served as the D-training (DTR).

The correct cell type was labeled for each image in the ATR, BTR, CTR, and DTR sets by the rating team as the input for the Neural Network Console (NNC) (Version 1.00, Sony Network Communications Inc., Tokyo, Japan). NNC is a software developed by Sony Communications Inc., which provides support for the creation of a neural network model for deep learning. Users build a suitable architecture for their respective problems using the graphical user interface by clicking and dragging icons with a pointing mouth in an easier manner as compared with that through conventional Python program libraries such as TensorFlow (https://www.tensorflow.org) or PyTorch (https://pytorch.org). While using conventional libraries, the system has to be coded using the relevant programing language, e.g., Python. Furthermore, users can repeatedly attempt to make, evaluate, and remake a deep learning system using appropriate training and test data prepared for their problem with Neural Network Console. All the images were automatically resized to 224 × 224 by the NNC based on the size of the input layer of the architecture (Krizhevsky et al. 2017) used in this study. A total of 300 epochs of learning was achieved for each round of training. We did not use an n-fold cross-validation scheme to evaluate the performance of the architecture because this method is not supported by the Neural Network Console. After each round, we saved all final weights and biases as the learned architecture of the round of training. After convergence, the cell type of each image in Pre-TE and ATE was estimated. We intended to evaluate regarding which image augmentation method resulted in the best accuracy for the estimation; therefore, five rounds of training and evaluation were performed using each of the four training datasets that were augmented with different methods. We have no logical grounds regarding the reason for performing five iterations of the machine learning algorithm. However, conducting the learning for merely three rounds was insufficient to evaluate the performance statistically. The accuracies were compared among the four sets of training data. Accuracy, precision, recall, and F-measure can be described using the following equations:

where, TP, TN, FP, and FN are the number of true-positive, true negative, false-positive, and false negative classifications, respectively.

The architecture used in this study (Fig. 2) was customized for a machine with one graphical processing unit (GPU), which is the half of the originally published architecture designed for two GPUs (Krizhevsky et al. 2017) and consisted of input (I), convolution (C), batch normalization (B), rectified linear unit (R, ReLU), max-pooling (M), drop out (D), affine (A), sigmoid (S), and softmax cross-entropy (S) (Sony Network Communications Inc. 2020). We calibrated the super parameters of the CNN for deep learning as follows: learning rate = 0.001, optimizer = Adam, stride for max pooling layers = 2 × 2, and batch size = 128. The device used for this study was a custom DELL ALIENWARE AURORA (Dell Inc., Kawasaki, Japan). The processor was Intel®CoreTM i7-7700 with 32-GB of main memory, and the GPU was an NVIDIA® GeForce® GTX 1080 Ti (11GMGDDR5X). The GPU was used as the vector processor. This study was approved (2018-101-01) by the Ethics Committee of the Faculty of Health Sciences from Hokkaido University.

Table 1.

Number of training, testing, and augmented images.

Number of training, testing, and augmented images used in this study are listed.

Fig. 1.

Examples of augmentation of neutrophil images in training data.

(A) Original image, (B) vertically inverted image, (C) horizontally inverted image, (D) vertically and horizontally inverted image, (E) rotated image, (F) distorted image, and (G) aspect ratio changed image.

Fig. 2.

Multilayer neural network for the five convolution layers used in this study.

Square boxes indicate the function of the layer. The figures of the right side of the box indicate the specifications for each layer. For example, at the right side of the first input layer, the three figures indicate the number of colors (color: red, green and blue), and the size (height and width) of input image, respectively. An identical format is used in the second convolution layer to indicate the number and the size (height and width) of feature map output, respectively. “ReLU” refers to rectified linear unit. “Kernel shape: 5.5” indicates the pixel size of each filter for convolving the input. Other specifiers in this figure are described in reference (Sony Network Communications Inc. 2020). We use “affine” in this report instead of the more common term, “fully connected layer” because “affine” is used in the Neural Network Console.

Results

Prior to combining the 4- and 5-seg lobes, a total of 300 learning epochs were achieved five times on all five groups using 2,500 images from Pre-TR. The confusion matrix for the five evaluations of Pre-TE is shown in Fig. 4, in which the total accuracy achieved was 0.576, and the recall of 4-seg neutrophils was extremely low at 0.198 and the rates of erroneous categorization of 4-seg neutrophils as 2-, 3-, or 5-seg were 0.012, 0.216, and 0.574, respectively; that of the 5-seg lobes was the highest greater than twice the value of the correct estimation. Therefore, we concluded that the architecture adopted for this study did not distinguish between 4- and 5-seg neutrophils. Hence, we united them into a group of a total of 600 newly captured images (300 each for 4-seg and 5-seg). From all 2,400 images of the four groups, 2,000 were that of ATR and 400 were that of ATE. All images in ATR were processed as described. Using ATR, BTR, CTR, and DTR, 300 epochs of learnings were achieved for five iterations. The cost, training error, and validation error decreased as shown in Fig. 3, indicating that convergence was achieved, and neural-network optimization was finished. The maximum values of the accuracy evaluated by the ATE were 0.700, 0.983, 0.978, and 0.990, respectively (Table 2). Detailed confusion matrices for the evaluation of deep learning with the maximum accuracy shown in Table 2 are presented in Fig. 5A-D. The other performance indicators for each the deep learning system are listed in Table 3 for learning, while the maximum accuracy is listed in Table 2. The time required for the convergence of the ML epochs are listed by Table 4.

Fig. 3.

Example of learning curve in this study.

This is an example of the learning curve output by Neural Network Console during training with D-training data. The horizontal axis of the graph represents the epoch indicating the number of repeated generations (epochs) of optimization. The left and right vertical axes represent the cost and error, i.e., the output of the loss function at the optimization stage and the output of loss function of the training data and testing data at the end of each epoch. The blue and red solid lines indicate the cost and training error. The red dotted line indicates the testing error.

Fig. 4.

Confusion matrix for the pre-testing data.

Number of each column is the total count of five times estimating results for pre-testing data (Pre-TE) by a neural network learned with pre-training data.

Table 2.

Accuracy of each round of learning conducted using testing data.

The accuracies of the neural network trained using a training dataset of ATR, BTR, CTR, and DTR are indicated in each corresponding column. The accuracies for each round number of learning are listed in each row. An asterisk * marks the best accuracy achieved among five rounds of learning with each training dataset.

Fig. 5.

Confusion matrices of the neural network learned using 4 different training data.

Estimation results for the neural network at the round of learning with the best accuracy in Table 2. Matrix 5A indicates the estimation result for the neural network at the 5th round of learning with the A-training data, Matrix 5B indicates that at the 4th round of learning with the B-training data, Matrix 5C indicates that at the 1st round of learning with the C-training data, and Matrix 5D indicates that the 4th round of learning with the D-training data.

Table 3.

Indicators of performance for each architecture along with the best accuracy.

All indicators in each column are calculated using values in the corresponding confusion matrix indicating the lowest low. “Average precision”, “average recall” and “average F-Measure” are the average of four values of precision, recall, and F-Measure, respectively for all groups (i.e., band, 2-seg, 3-seg, 4- and 5-seg).

Table 4.

Time required for 300 epochs of the machine learning algorithm.

“0:17:05” means 0 h, 17 min, and 5 s are needed to achieve 300 epochs of the machine learning algorithm.

Discussion

This study confirmed that the proposed deep learning system distinguished four groups of band-formed, 2-seg, 3-seg, 4- and 5-seg neutrophils with an accuracy of up to 99%. From Table 4, it is evident that a larger quantities of training data required longer learning times. From Fig. 4, it is evident that it was difficult to classify neutrophils into the five groups of nuclear cells using the architecture adopted in this study. However, it is not always easy even for experienced examiners to distinguish four or five lobes. After combining the 4- and 5-seg groups, the accuracy improved to 0.700 (Table 2) from an extremely low value of 0.567 (Fig. 4).

A total accuracy of 0.700 is insufficient for clinical practice. To improve this accuracy, we examined the effect of augmenting the training data. Several reports on the improvement of accuracy based on this method have been conducted. In this study, the best accuracy of the deep learning algorithm trained using the BTR was 0.983 (Table 2), wherein the images were inverted left, right, up, and down. Moreover, the accuracy improved to 0.978 and 0.99 using the CTR and DTR sets (Table 2), respectively. An accuracy of 99% is sufficient to examine the proposed method using clinical specimens. However, in this study, it was not possible to discriminate between 4- and 5-seg. For successful discrimination, it might be effective to increase the number of images in the original training data. Moreover, the accuracy might be further improved by adjusting the number and size of the filter in the convolution layer, or by changing the stride and setting additional hyperparameters in the architecture. Although several architectures have been developed, further study is required to find an optimal architecture suitable for discriminating leukocyte images.

The following is the limitations of our study. First, the study was based on a small dataset of individuals and images. Second, the images of the testing data were selected from identical smears for the images of the training data; this might result in a bias in the evaluation conducted in this study. Third, all the images were derived from the blood of normal volunteers and indicated typical morphological features for each cell type. However, the segmentation of nuclear lobes arrears more ambiguously in clinical laboratory. Thus, we should try the presented method using real clinical peripheral blood cell images, including patient samples.

In conclusion, the proposed deep learning system distinguishes four groups (i.e., band-formed, two-, three-, and four- and five- segmented) of neutrophils with up to 99% accuracy, suggesting that neutrophils can be automatically differentiated based on the count of segmented nuclei using deep learning.

Conflict of Interest

The authors declare no conflict of interest.

References
 
© 2021 Tohoku University Medical Press

This article is licensed under a Creative Commons [Attribution-NonCommercial-NoDerivatives 4.0 International] license.
This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC-BY-NC-ND 4.0). Anyone may download, reuse, copy, reprint, or distribute the article without modifications or adaptations for non-profit purposes if they cite the original authors and source properly.
https://creativecommons.org/licenses/by-nc-nd/4.0/
feedback
Top