Virtual reality (VR) has been applied to several fields such as entertainment, education, and medicine in recent years. VR is characterized by a high sense of immersion, which can be represented by the attention allocation from the real world to the virtual space. Although a high degree of attention allocation is significant in VR technology, most existing evaluation methods of VR applications are based on subjective questionnaires. Thus, quantitative and objective VR application evaluation methods are needed to realize advanced VR applications. In this study, we adopted a probe stimulus method to evaluate the attention allocation quantitatively and objectively in VR technology. Ten young adult participants underwent an auditory oddball task while they experienced VR content. The amount of attention directed to the VR content could be quantified based on the decrease in the event-related P300 wave response in the case of the oddball task. The participants watched two-dimensional and three-dimensional VR contents on a liquid crystal display and a head-mounted display, respectively, while brain activity was recorded in the form of electroencephalographic signals. A total of 230 probe stimuli at 1800 Hz (standard stimulus), 2000 Hz (target stimulus), and 500 Hz (deviant stimulus) were presented randomly via an earphone for 70 ms at 1000-ms intervals at the fractions of 70, 15, and 15%, respectively. Additionally, the reaction time and false reaction rate during the oddball task were measured as behavioral measures, and a questionnaire was used for subjective evaluation after the task. Based on a comparison of the subjective measure, behavioral measure, and amplitudes of P300 measured with the target stimulus from Pz and deviant stimulus from Cz, we found that attention allocation to the VR content can be quantitatively estimated using the amplitude of P300 for the deviant stimulus. These results suggest that the proposed method involving event-related potentials can be used as an indicator for attention allocation while watching VR content.
Currently, the shortage of care workers for the elderly has become a big problem, and more streamlined care operations are needed. In care facilities, care workers are required to use their subjective experience to detect anomalies in physical condition of care receivers, including serious or insignificant deterioration or behavioral and psychological symptoms of dementia, which can decrease the work efficiency. Therefore, we aim to create a model using objective data for detecting anomalies in physical condition. In this study, data from 13 subjects in a care facility were collected, and isolation forest models were constructed for each subject. The subject's anomalies in physical condition were documented in a care record by a nurse and used as reference for model evaluation. Recall and specificity were used to evaluate the model, expressed as the percentage of detection success for abnormal or normal conditions. Data collected for 1 to 60 days were used to train the isolation models, and the relationship between the amount of training data and model performance was simulated. Heart rate, respiratory rate, and time of getting out of bed were collected from a sensor placed on the subject's bed and used as the model features. In addition, dietary intake information was collected from the care record. Analysis of the evaluation results showed recall and specificity of 45.6 ± 46.7% and 83.88 ± 6.06%, respectively, for the model constructed using training data of 60 days. For future studies, we will continue to collect data and increase the number of participants to improve the robustness and accuracy of the proposed anomaly detection system.
Designing a deep neural network model that integrates clinical images with other electronic medical records entails various preprocessing operations. Preprocessing of clinical images often requires trimming of parts of the lesions shown in the images, whereas preprocessing of other electronic medical records requires vectorization of these records; for example, patient age is often converted into a categorical vector of 10-year intervals. Although these preprocessing operations are critical to the performance of the classification model, there is no guarantee that the preprocessing step chosen is appropriate for model training. The ability to integrate these preprocessing operations into a deep neural network model and to train the model, including the preprocessing operations, can help design a multi-modal medical classification model. This study proposes integration layers of preprocessing, both for clinical images and electronic medical records, in deep neural network models. Preprocessing of clinical images is realized by a vision transformer layer that selectively adopts the parts of the images requiring attention. The preprocessing of other medical electrical records is performed by adopting full-connection layers and normalizing these layers. These proposed preprocessing-integrated layers were verified using a posttreatment visual acuity prediction task in ophthalmology as a case study. This prediction task requires clinical images as well as patient profile data corresponding to each patient's posttreatment logMAR visual acuity. The performance of a heuristically designed prediction model was compared with the performance of the prediction model that includes the proposed preprocessing integration layers. The mean square errors between predicted and correct results were 0.051 for the heuristic model and 0.054 for the proposed model. Experimental results showed that the proposed model utilizing preprocessing integration layers achieved nearly the same performance as the heuristically designed model.
Computer-aided diagnostic methods that provide semantic segmentation of texture patterns of diffuse lung diseases (DLDs) on chest computed tomography (CT) are extremely useful for detecting, identifying, and quantifying lung pathologies. While a fully annotated dataset is desirable to build a semantic segmentation model, building such a dataset for DLDs is costly due to the requirements of manual segmentation and certified experts for annotation. Partially supervised learning (PSL) has been proposed recently to take advantage of the partially annotated dataset and reduce the full annotation burden. Creating a partially annotated dataset is much less expensive than creating a fully annotated dataset. Therefore, PSL has great potential to build a semantic segmentation model that only requires a feasible amount of annotation. In this study, we propose a method of PSL employing a loss function that uses both annotated and unannotated pixels of a partially annotated dataset. The proposed loss function is based on the cross entropy loss, and it uses unannotated pixels to penalize the leakage of the segmentation. A parameter that controls the balance between the two types of supervision is introduced into the loss function to allow tuning and studying of the proposed PSL. The effectiveness and characteristics of PSL for the segmentation of DLD classes (consolidation, ground grass opacity, honeycombing, emphysema, and normal) were investigated in experiments using chest CT images of 372 patients. The experimental results show that the proposed PSL improved the mean Dice score from 0.76 to 0.79, and that a higher value of the balancing parameter increased the precision of the segmentation. Using the proposed PSL, which takes full advantage of the partially annotated dataset, we improved the accuracy of DLD segmentation. Furthermore, the experimental results clarified that the proposed PSL improved the precision of the models using unannotated pixels. Our implementation of the proposed PSL is available at https://github.com/yk-szk/psl-dld.
Appropriate evaluation of the intraoperative state of a surgical team is essential for the improvement of teamwork and hence a safe surgical environment. Traditional methods to evaluate intraoperative team states such as interview and self-check questionnaire on each surgical team member often require human efforts, which are time-consuming and can be biased by individual recall. One effective solution is to analyze the surgical video and track the important team activities, such as whether the members are complying with the surgical procedure or are being distracted by unexpected events. However, due to the complexity of the situations in an operating room, identifying the team activities without any human effort remains challenging. In this work, we propose a novel approach that automatically recognizes and quantifies intraoperative activities from surgery videos. As a first step, we focus on recognizing two activities that especially involve multiple individuals: (a) passing of clean-packaged surgery instruments which is a representative interaction between the surgical technologists such as the circulating nurse and scrub nurse, and (b) group attention that may be attracted by unexpected events. We record surgical videos as input, and apply pose estimation and particle filters to extract individual's face orientation, body orientation, and arm raise. These results coupled with individual IDs are then sent to an estimation model that provides the probability of each target activity. Simultaneously, a person model is generated and bound to each individual, which describes all the involved activities along the timeline. We tested our method using videos of simulated activities. The results showed that the system was able to recognize instrument passing and group attention with F1 = 0.95 and F1 = 0.66, respectively. We also implemented a system with an interface that automatically annotated intraoperative activities along the video timeline, and invited feedback from surgical technologists. The results suggest that the quantified and visualized activities can help improve understanding of the intraoperative state of the surgical team.
In addition to traditional clinical research, advances in information communication technologies facilitates new medical research using internet of things devices and other cutting-edge technologies. Such medical research also simplifies the collection of data on research subjects in their daily lives internationally. In this context, medical research is increasingly required to comply with rules protecting patients' personal data. This study proposes a model to enable researchers and other stakeholders including ethics committees in such international medical research to easily verify whether the planned processing of patient data complies with relevant legal and ethical rules. The model proposed in this study consists of (1) how patient information is processed, (2) the rules that are relevant to the processing, and (3) the analysis of whether the processing complies with the rules. This study suggests that the model should describe the aspects of data processing that are subject to many rules, such as the location of the processing, categories of data, purposes of the processing, and the storage period. Thus, using the information described in the model as a guide, stakeholders can determine which national and international legal/ethical rules apply to the planned processing. Then, they can use the model to verify and document whether the processing complies with the specific regulatory rules. The use of the model in this study enables stakeholders in medical research to comply with the rules related to patient data more effectively than without using the model.
Sudden deterioration of condition in patients with various diseases, such as cardiopulmonary arrest, may result in poor outcome even after resuscitation. Early detection of deterioration is important in medical and long-term care settings, regardless of the acute or chronic phase of disease. Early detection and appropriate interventions are essential before resuscitating measures are required. Among the vital signs that indicate the general condition of a patient, respiratory rate has a greater ability to predict serious events such as thromboembolism and sepsis than heart rate and blood pressure, even in early stages. Despite its importance, however, respiratory rate is frequently overlooked and not measured, making it a neglected vital sign. To facilitate the measurement of respiratory rate, a non-invasive method of detecting respiratory sounds was developed based on deep learning technology, using a built-in microphone in a smartphone. Smartphones attached to the bed headboards of 20 participants undergoing polysomnography (PSG) at Kyoto University Hospital recorded respiratory sounds. Sound data were synchronized with overnight respiratory information. After excluding periods of abnormal breathing on the PSG report, sound data were processed for each 1-minute period. Expiration sound was determined using the pressure flow sensor signal on PSG. Finally, a model to identify the expiration section from the sound information was created using a deep learning algorithm from the convolutional Long Short Term Memory network. The accuracy of the learning model in identifying the expiratory section was 0.791, indicating that respiratory rate can be determined using the microphone in a smartphone. By collecting data from more patients and improving the accuracy of this method, respiratory rates could be more easily monitored in all situations, both inside and outside the hospital.
A conventional electrolarynx (EL), which is used by laryngectomees, produces monotonous sound and occupies a user's hand; hence, we developed a hands-free wearable device that improves voice quality. The proposed device estimates individual vocal tract features using linear predictive coding (LPC) and generates sound vibrations using an LPC inverse filter. Additionally, we reproduced the vibration sound using a transducer and amplified the first harmonic frequency and the second one. We conducted an objective experiment to compare the spectra of natural voice, a conventional EL, and the proposed device. We also conducted a subjective experiment in which we asked healthy subjects to listen to and evaluate the conventional EL and the proposed device. The results of the objective experiment demonstrated that our model was characterized by two formant peaks that were similar to the conventional EL and the natural voice. The results of the subjective experiment demonstrated that our model was more powerful and clearer than the conventional EL. These findings indicate that the voice of our device is spectrally close to human voice and gives the audience a more powerful and clearer sound.
Objective: The objective of the current study was to develop a novel, artificial intelligence (AI)-based system to diagnose coronavirus disease (COVID-19) using computed tomography (CT) slice images. Prior research has demonstrated that, if not focused on the lungs, AI diagnoses COVID-19 using information outside the lungs. The inclusion of CT training data from multiple facilities and CT models may also cause AI to diagnose COVID-19 with features that are irrelevant to COVID-19. Thus, the objective of the current study was to evaluate a combination of lung mask images and CT slice images from a single facility, using a single CT model, and use AI to differentiate COVID-19 from other types of pneumonia based solely on information related to the lungs.
Method: By superimposing lung mask images on image feature output using an existing AI structure, it was possible to exclude image features other than those around the lungs. The results of this model were also compared with the slice image findings from which only the lung region was extracted. The system adopted an ensemble approach. The outputs of multiple AIs were averaged to differentiate COVID-19 cases from other types of pneumonia, based on CT slice images.
Results: The system evaluated 132 scans of COVID-19 cases and 62 scans of non-COVID-19 cases taken at the single facility using a single CT model. The initial sensitivity, specificity, and accuracy of our system, using a threshold value of 0.50, was shown to be 95%, 53%, and 81%, respectively. Setting the threshold value to 0.84 adjusted the sensitivity and specificity to clinically usable values of 76% and 84%, respectively.
Conclusion: The system developed in the current study was able to differentiate between pneumonia due to COVID-19 and other types of pneumonia with sufficient accuracy for use in clinical practice. This was accomplished without the inclusion of images of clinically meaningless regions and despite the application of more stringent conditions, compared to prior studies.
Laparoscopic surgery holds great promise in medicine but remains challenging for surgeons because it is difficult to perceive depth while suturing. In addition to binocular parallax, such as three-dimensional vision, shadow is essential for depth perception. This paper presents an augmented reality system that draws virtual shadows to aid depth perception. On the visual display, the system generates shadows that mimic actual shadows by estimating shadow positions using image processing. The distance and angle between the forceps tip and the surface were estimated to evaluate the accuracy of the system. To validate the usefulness of this system in surgical applications, novices performed suturing tasks with and without the augmented reality system. The system error and delay were sufficiently small, and the generated shadows were similar to actual shadows. Furthermore, the suturing error decreased significantly when the augmented reality system was used. The shadow-drawing system developed in this study may help surgeons perceive depth during laparoscopic surgery.
Deterioration of the skin barrier function causes symptoms such as allergies because various chemical substances may enter the human body. Quantitative evaluation of the thickness and water content of the stratum corneum is useful as a measure of the skin barrier function in domains such as dermatology, nursing science, and cosmetics development. The stratum corneum is responsible for most of the skin barrier function, and two factors—the thickness and water content of the stratum corneum—are thus important. In this paper, the stratum corneum is regarded as a parallel model of resistance and capacitance. From measurements of the electrical impedance of the skin, we propose a new model for simultaneous estimation of the thickness and water content of the stratum corneum conventionally measured by a confocal laser scanning microscope and a confocal Raman spectrometer, respectively, and we discuss the results of the measurements. The electrical impedance of the skin was measured using a device that we developed. The measurement began 3 seconds after the electrodes on the measurement head of the device came into contact with the skin, and parameters including the impedance, which was obtained by applying an alternating current signal at two frequencies, were measured. We measured the thickness and water content of the stratum corneum using confocal laser microscopy and confocal Raman spectroscopy, respectively; investigated the relationship of the thickness and water content of the stratum corneum with the electrical impedance of the skin; and established a new potential model for estimating the thickness and water content of the stratum corneum from the parallel resistance and capacitance. The correlation coefficients of the verification data were 0.931 and 0.776, respectively; and the root-mean-squared error of the thickness of the stratum corneum was 2.3 µm, while the root-mean-squared error of the water content at the surface of the stratum corneum was 5.4 points. These findings indicate the feasibility of quantitative evaluation of the thickness and water content of the stratum corneum by measuring skin electrical impedance.
Numerous studies have suggested that sleep spindle waves may play a role in the hippocampal-cortical transmission of information associated with memory enhancement. In previous research, the clustering coefficient increased significantly from wakefulness to sleep, indicating that the graph theory may be able to characterize brain network activity during sleep. However, previous studies have not investigated in detail the characteristics of the brain network in individual sleep stages; the brain network activity in the EEG at each sleep stage has not yet been clarified. In this study, we compared the characteristics of the network activity in various sleep stages by determining the functional connectivity from EEG in individual stages, constructing the networks and comparing the clustering coefficients and characteristic path lengths. We found a significant decrease in the characteristic path length in LowBeta band (13–15 Hz) from Stage 1 to later stages. However, there was no significant difference in the clustering coefficient. Our results are consistent with the concept that sleep spindles are related to memory consolidation. Therefore, the results suggest that the networks generated by the brain are more efficient in middle and deep sleep.