Pupil Dynamics-derived Sleep Stage Classification of a Head-fixed          Mouse Using a Recurrent Neural Network

Goh Kobayashi; Kenji F. Tanaka; Norio Takata

doi:10.2302/kjm.2022-0020-OA

Abstract

The standard method for sleep state classification is thresholding the amplitudes of electroencephalography (EEG) and electromyography (EMG) data, followed by manual correction by an expert. Although popular, this method has some shortcomings: (1) the time-consuming manual correction by human experts is sometimes a bottleneck hindering sleep studies, (2) EEG electrodes on the skull interfere with wide-field imaging of the cortical activity of a head-fixed mouse under a microscope, (3) invasive surgery to fix the electrodes on the thin mouse skull risks brain tissue injury, and (4) metal electrodes for EEG and EMG recording are difficult to apply to some experimental apparatus such as that for functional magnetic resonance imaging. To overcome these shortcomings, we propose a pupil dynamics-based vigilance state classification method for a head-fixed mouse using a long short-term memory (LSTM) model, a variant of a recurrent neural network, for multi-class labeling of NREM, REM, and WAKE states. For supervisory hypnography, EEG and EMG recording were performed on head-fixed mice. This setup was combined with left eye pupillometry using a USB camera and a markerless tracking toolbox, DeepLabCut. Our open-source LSTM model with feature inputs of pupil diameter, pupil location, pupil velocity, and eyelid opening for 10 s at a 10 Hz sampling rate achieved vigilance state estimation with a higher classification performance (macro F1 score, 0.77; accuracy, 86%) than a feed-forward neural network. Findings from a diverse range of pupillary dynamics implied possible subdivision of the vigilance states defined by EEG and EMG. Pupil dynamics-based hypnography can expand the scope of alternatives for sleep stage scoring of head-fixed mice.

Introduction

Sleep stage classification is indispensable for investigating the function, mechanism, and pathology of sleep.¹^,²^,³^,⁴ Vigilance states of rodents are commonly divided into three: awake (WAKE), rapid eye movement (REM) sleep, and non-REM (NREM) sleep.⁵^,⁶ The standard procedure for vigilance state classification is thresholding the amplitudes of electroencephalogram (EEG) and electromyogram (EMG) data, followed by manual correction by an expert.⁷ Although accepted widely and used routinely,⁸ EEG-based and EMG-based sleep stage classifications suffer from several important shortcomings: (1) manual inspection of hypnograms by human experts is time consuming and sometimes poses a bottleneck that impedes sleep studies.⁹^,¹⁰^,¹¹ Manual scoring is nevertheless necessary because of individual differences of EEG-thresholds and EMG-thresholds among mice, and because of fluctuation of the thresholds, even in the same mouse, over time. (2) Also, EEG electrodes on the skull interfere with the wide-field imaging of cortical activity,¹²^,¹³^,¹⁴^,¹⁵^,¹⁶ preventing investigation of bilateral cortical activity during sleep.¹⁷ (3) The invasive surgery necessary to fix EEG electrodes on the thin mouse skull risks brain tissue injury and infection that might modulate the structure and function of the cortex.¹⁸^,¹⁹ (4) Applying EEG and EMG recording for functional magnetic resonance imaging (fMRI) is difficult in mice because of artifacts resulting from the metal electrodes on the thin skull of the mouse adversely affects cortical fMRI images,²⁰^,²¹ thereby allowing fMRI only in anesthetized or awake but not in sleeping mice.²² Earlier investigations of vigilance state classification without EEG and EMG recording have addressed some but not all the shortcomings outlined above.²³^,²⁴^,²⁵

Because eyes are closely associated with sleep, pupil dynamics is a promising candidate for hypnogram construction without the need to record EEGs and EMGs.²⁶^,²⁷^,²⁸ This association has been investigated intensively in human subjects.²⁹ During wake and alert periods, pupils are large.³⁰ Immediately before falling asleep, eyelids begin to droop, eyes move upward, pupils contract strongly, and the pupil diameter begins to drift erratically.³¹ During sleep, eye movements are slow and pendular. The pupil diameter shows spontaneous waves of constriction and dilation; such waves have been observed in human subjects using a lid crutch.³⁰ Rapid and jerky eye movements appear during REM sleep.³² Rodent studies have revealed tight coupling between pupil dynamics and neuronal activities.³³^,³⁴^,³⁵ Whole-cell patch-clamp recordings in awake mice have demonstrated that pupil constriction and dilation, respectively, accompanied synchronization and desynchronization of membrane potentials of cortical neurons.³⁴ Rapid and longer-lasting pupil dilations, respectively, coincide with phasic and sustained activities in noradrenergic and cholinergic axons.³⁶ An active role of pupillary constriction was demonstrated to stabilize deep sleep.³³ In that study, the vigilance states of head-fixed mice were estimated based on pupil diameter using a feed-forward neural network (FFNN) with a hidden layer of 30 nodes; this method achieved estimation accuracies of 96% for NREM, 58% for REM, and 95% for WAKE. However, estimations were limited to a vigilance state lasting more than 100 s, as confirmed from EEG and EMG recordings. Therefore, it remains unknown whether pupil dynamics can adequately represent the moment-by-moment dynamics of vigilance states.

Neural network models have been applied to sleep stage scoring, predominantly using EEG and EMG data.³⁷ Deep learning models have become popular in recent years, typical of which are FFNNs and recurrent neural networks (RNNs). A convolutional neural network (CNN), a variant of an FFNN combined with a hidden Markov model, was trained on 4-s bins of EEG and EMG data from only two mice and achieved automatic sleep stage scoring with an accuracy of 93–99%, with generalization capabilities.³⁸ The long short-term memory (LSTM) model, a variant of an RNN, is regarded as an effective model for sequence classification because the “forget gate” in its architecture facilitates the efficient usage of past information in a bin.³⁹^,⁴⁰ An LSTM model combined with a CNN was trained using 20-s bins of EEG and EMG data from 4200 mice and demonstrated sleep stage classification with an accuracy of 96.6%.¹¹ Although these state-of-the-art architectures based on EEG and EMG data resolved the time-consuming visual inspection of a hypnogram (shortcoming 1) and achieved sleep scoring performance that was comparable with or exceeded the inter-rater agreement rate of human experts (88–95%),¹¹^,⁴¹ these models did not address shortcomings 2–4 associated with EEG and EMG recording. A recent attempt to apply machine learning to a 10-s bin video of a freely moving mouse for sleep staging instead of EEG and EMG data achieved an accuracy of 92 ± 5% (mean ±SD).¹⁰ This machine vision method resolved the time-consuming manual correction of the hypnogram (shortcoming 1) and invasive surgery (shortcoming 3), but applying the method to a head-fixed mouse under a microscope (shortcoming 2) or fMRI (shortcoming 4) might be unfeasible because the visual classification of sleep relied on the movement and posture of the mouse.

As described in this article, we propose a pupil-dynamics derived hypnogram of a head-fixed mouse using a simple one-layer LSTM model. Additionally, a supervisory hypnogram with 10-s bins was prepared using EEG and EMG recordings from four male C57BL/6 J mice. Simultaneous monitoring of the left eye of a head-fixed mouse using a USB camera in combination with the markerless tracking toolbox DeepLabCut⁴² provided the following input features for LSTM models: pupil diameter (PD), pupil location (PL), pupil velocity (PV), and eyelid opening (EO) at a sampling rate of 10 Hz. Comparison of LSTM models with distinct hyperparameters of input features and other parameters was performed with a five-fold cross-validation technique using data from three mice. The performance of the best LSTM model was tested using data from the reserve mouse; the best model achieved an accuracy of 86% and a macro F1 of 0.77, which were higher than those of an FFNN model. Additionally, we noticed characteristic pupillary dynamics that indicated substates within vigilance states in an EEG and EMG-based hypnogram. Consequently, a pupil dynamics-derived hypnogram could provide an attractive choice for sleep staging of a head-fixed mouse.

Materials and Methods

Ethics statement

All animal experiments were conducted in accordance with the National Institutes of Health Guide for Care and Use of Laboratory Animals (NIH Publications No. 8023) and were approved by the Animal Ethics Committee of Keio University (approval number: 18076-(2)).

Surgery and habituation

Four male C57BL/6 mice, 8 weeks postnatal and weighing 22–27 g at the start of the experiment, were purchased from CLEA Japan (Tokyo, Japan). The animals were deeply anesthetized with a mixture of ketamine and xylazine (100 and 10 mg/kg, respectively, i.p.), and were fixed to a stereotaxic apparatus (SM-15; Narishige Scientific Instrument, Tokyo, Japan). After shaving the hair on their head and neck, a longitudinal incision was made in the scalp to expose the skull surface. The periosteum and blood on the skull were removed thoroughly. Three holes (ϕ1 mm) in the skull without penetration for EEG recording were prepared using a dental drill (ϕ0.8 mm, 23017; Nakanishi, Tochigi, Japan). These holes were made to accommodate the following: (1) a reference electrode 1.0 mm anterior to the Bregma and 1.5 mm lateral from the midline to the right (ML +1.5 mm) on the frontal skull; (2) a signal electrode at 1.0 mm anterior to the Lambda and ML +2.5 mm on the parietal skull; and (3) a ground electrode at the center of the interparietal skull (ML 0 mm). EEG electrodes, consisting of a silver wire (#786500; A-M Systems, Sequim, WA, USA) and a screw (ϕ1 mm, length 2 mm; #0–1 nabe; Yamazaki, Shiga, Japan), were screwed to the holes after attaching a metal chamber frame (CF-10; Narishige Scientific Instrument, Tokyo, Japan) on the skull using dental acrylic (Super-Bond C&B; Sun Medical, Shiga, Japan). EMG electrodes, consisting of a flexible stainless-steel wire (AS633; Cooner Wire, Chatsworth, CA, USA), were inserted into the trapezius muscles bilaterally. EEG and EMG wires were fixed to the skull using dental acrylic.

For recovery after surgery, animals were single-housed with water and food pellets provided ad libitum at ambient temperature (22–24°C) on a 12-h light:dark cycle (light on at 8:00 h and off at 20:00 h) for at least a week. Mice were then habituated to sleep in the head-restrained condition in a darkroom.²² The duration of the habituation increased gradually from 2 h/day to a maximum of 6 h/day. The habituation was performed for at least a week. At the beginning of habituation, mice were mostly awake under the head-fixed condition. During the latter part of the habituation, the mice showed characteristic polyphasic sleep under the head-fixed condition (Fig. 1C). Six-hour habituation seemed critical: repeated 2-h habituation sessions failed to induce head-fixed mice to sleep under these experimental conditions.

Fig. 1

Preparation of a supervisory hypnogram using EEG and EMG as ground truth labels for deep learning.

(A) Schematic of EEG and EMG recordings and pupillometry from a head-fixed mouse. A head frame (gray rectangle) was attached to the skull for fixation. (B) Representative 10-s EEG and EMG traces during periods of NREM (blue), REM (red), and WAKE (black). (C) Representative EEG and EMG traces for a single recording session with a supervisory hypnogram. Blue columns in the EMG spectrogram represent a marked decrease of EMG power during REM sleep. (D) Number of manual corrections required for the threshold-based hypnogram (Threshold) to prepare a supervisory hypnogram. Corrections from sleep states (NREM or REM) to WAKE states were significantly more numerous than vice versa (two-sided Wilcoxon signed-ranks test, n = 39 sessions from four mice. Counts of episode corrections per 6 h) 21 ± 24 episodes from NREM to WAKE vs. 3 ± 10 episodes from WAKE to NREM; 4 ± 5 episodes from REM to WAKE vs. 0.1 ± 0.8 episodes from WAKE to REM. Data are mean ±SD, ***p < 0.0001). (E) Time spent in each vigilance state in a single recording session (left) and duration for an individual episode in all recording session (right) based on a supervisory hypnogram. These values corresponded to those obtained for freely moving mice (see Results). The mean proportions of total recording for NREM (48.9 ± 3.0%) and WAKE (47.6 ± 3.1%) were significantly greater than those for REM (3.5 ± 0.3%; n = 29 sessions from four mice. Steel-Dwass test, ***p < 0.0001). Mean episode durations for NREM, REM, and WAKE were distinct (NREM, 8.2 ± 11.3 min; REM 1.7 ± 0.9 min; WAKE 9.1 ± 19.4 min, [mean ±SD]; Steel-Dwass test, ***p < 0.0001).

Electrophysiology and pupillometry

During each recording session, EEG, EMG, and left eye pupillometry were performed on head-fixed mice: sessions lasted 3–6 h/day in the light phase within Zeitgeber time 2–9 (10 AM to 6 PM). Specifically, EEG and EMG were amplified 1000 times with filter configurations of 1–300 Hz and 10–300 Hz, respectively, (Model 3000; A-M Systems), analog-to-digital converted (16-bit depth; NI-9215; National Instruments, Austin, TX, USA), and recorded at a sampling rate of 1000 Hz using a custom written program (LabVIEW 2016; National Instruments) on a personal computer (Windows 10 Pro; Microsoft, Redmond, WA, USA). Images of the mouse left eye were captured using a custom-written program (LabVIEW 2016; National Instruments, Austin, TX, USA) on the PC at 30.3 ± 16.5 Hz (mean ±SD) with an infrared (IR) LED ring light (940 nm, FRS5 JS; OptoSupply, Hong Kong) and a USB camera for which the IR cut-off filter was removed beforehand (BSW200MBK; Buffalo, Aichi, Japan). A single recording session was performed once a day for 3–6 h.

Pupil dynamics feature extraction

DeepLabCut (ver. 2.2rc3), a deep learning-based animal pose estimation program, was used to acquire the following locations in the left eye images: pupil rostral (PR), pupil caudal (PC), upper eyelid margin (UEM), and lower eyelid margin (LEM) (Fig. 2A).⁴²^,⁴³ We first manually extracted 600 left-eye images from 17 recording sessions of four mice (150 ± 167 images/mouse [mean ±SD]). The manual selection of eye images improved the positioning performance of DeepLabCut because rare or difficult images for positioning, e.g., pupil images with overlapping whiskers or IR-light reflections, can be selected intentionally to train DeepLabCut; this would have been difficult or impossible to accomplish using automatic image extraction by DeepLabCut through k-means clustering or random temporal sampling. We manually labeled the positions of PR, PC, UEM, and LEM in the extracted images using the graphical user interface of DeepLabCut. Specifically, we first defined a line connecting the inner and outer corner of the eye (I-O line; white dotted line in Fig. 2A). Parallel to the I-O line, a line segment was obtained connecting the rostral and caudal edges of the pupil with the largest distance (orange dashed line in Fig. 2B). These edges were defined respectively as the locations of PR (purple dot in Fig. 2A) and PC (light blue dot in Fig. 2A). The intersections of the vertical bisector of the I-O line with the upper and lower eyelids were defined as the UEM (yellow dot in Fig. 2A) and the LEM (red dot in Fig. 2A). If whiskers or IR-light reflections overlapped with any position, then a slightly displaced location was assigned to avoid the overlap. Next, 95% of the manually obtained dataset (570 of the 600 left eye images) was used to train DeepLabCut (left panel in Fig. 2B). We used a ResNet-50-based neural network with default parameters for 1,030,000 training iterations.⁴⁴ After the learning curve reached a plateau, the generalization performance of DeepLabCut was evaluated using the root mean square error (RMSE) of the remaining test dataset (30 of the 600 left eye images; right bar graph in Fig. 2B). After confirmation of successful positioning by DeepLabCut, PR, PC, UEM, and LEM were obtained automatically by DeepLabCut for 22,589,940 left eye images from 39 sessions with four mice. DeepLabCut was run on a local computer (MacOS 10.15 Catalina; Apple Computer) for manual extraction and labeling of eye images, and on a Google Cloud server using Google Colaboratory Pro (Alphabet) for training and automatic positioning.

Fig. 2

Preparation of pupil dynamics time series as input feature variables for deep learning.

(A) Definitions for manual positionings of pupil rostral (PR, purple dot), pupil caudal (PC, light blue dot), upper eyelid margin (UEM, yellow dot), and lower eyelid margin (LEM, red dot). Arrows in the orange and grey dashed lines represent the positive directions. I, O: inner and outer corners of the eye. (B) Training and assessment of DeepLabCut for automatic positioning of PR, PC, UEM, and LEM in images of the left eye. (Left) Learning curve of a cross-entropy loss (Loss) showed convergence of DeepLabCut training covering a million epochs. Different learning rates (distinct line colors) were used for optimal convergence. (Right) Generalization of the DeepLabCut estimation was confirmed by similar values of root mean square errors (RMSEs) for training data (open circles; 95% of the manually labeled 600 images) and test data (open triangles; 5% of the manually labeled images). Performance of DeepLabCut estimation was deemed to be satisfactory because RMSEs were smaller than the average minimum diameter of mouse pupils (n = 4 mice, 0.25 ± 0.02 mm). (C) Video frame images of mouse eyes and time series of pupillary dynamics from one recording session. (Top panels) Left eye images of a head-fixed mouse during WAKE, NREM, and REM episodes with colored dots denoting locations of rostral (purple) and caudal (light blue) edges of the pupil and upper (yellow) and lower (red) margins of the eyelid, which were obtained post-hoc using DeepLabCut (Supplementary Video). (Bottom traces) Time-series of pupil diameter, horizontal and vertical pupil locations, pupil velocity, and eyelid opening, which were calculated using the locations of the four colored dots. In the pupil location panel, upward indicates the rostral and dorsal directions, respectively, for horizontal and vertical pupil location. Fluctuation in the pupil diameter occurred during NREM periods. We also noted that fluctuations appeared over a period of approximately 10 min (green arrows) during WAKE episodes that preceded NREM. Eyelid opening sometimes showed a short deflection to zero in the time series (black arrowhead in the bottom trace) that corresponded to an eye blink. Traces were color-coded according to the supervisory hypnogram: WAKE, black; NREM, blue; REM, red. The temporal resolution of traces was 0.1 s. (D) Histograms of pupil-related variables during WAKE (black), NREM (blue), and REM (red) periods. Single pupil-related variables at a given time point were not useful for reliably determining sleep–wake states because of their overlap among vigilance states. For example, a normalized pupil diameter of 0.9 appeared in all vigilance states.

Based on time series of the labeled positions for PR, PC, UEM, and LEM, five pupil dynamics features [pupil diameter (PD), pupil horizontal and vertical locations (HL and VL), pupil center velocity (PV), and eyelid opening (EO)] were calculated using a program that was custom written in Matlab (20201; The MathWorks, Natick, MA, USA). Specifically, values in the pupil dynamics time series with estimated likelihoods less than 0.99 by DeepLabCut were replaced with values obtained from linear interpolation of the adjacent values: this was done in 244,432 frames for PR (1.1%), 272,421 frames for PC (1.2%), 5248 frames for UEM (0.023%), and 5097 frames for LEM (0.023%) of 22,589,862 images from all 39 sessions. After replacement, down-sampling to 10 Hz was applied. Then, PD was calculated as the distance between PR and PC and scaled to the range [0, 1] for each session. HL and VL were obtained as distances in millimeters from the average center position between PR and PC. Using the difference of the pupil position every 0.1 s, PV was calculated in millimeters per second. Also, EO was calculated as the distance between UEM and LEM and was scaled to the range of [0, 1] for each session.

Hypnograms

An EEG-based and EMG-based hypnogram was obtained using the conventional thresholding method for EEG and EMG amplitudes and a theta/delta power ratio with automatic correction.⁷ Specifically, the average power of the delta (1–4 Hz) and theta (6–9 Hz) waves of the EEG, and of the EMG (30–300 Hz) were calculated over an entire session. A 10-s bin was assigned as a WAKE period if the mean EMG power during the bin was greater than its session average. Otherwise, the bin was judged as NREM sleep if the delta wave power was larger than its session average and the theta/delta power was <2.⁴⁵ Alternatively, the bin was classified as REM sleep if the delta wave power was smaller than its session average and the theta/delta power was >2. The remaining bins which matched none of these criteria were assigned the vigilance state of the preceding bin. Subsequently, automatic correction was applied to prohibit a single bin flip-flop transition such as an NREM state to a WAKE state then immediately back to an NREM state, which might represent simple twitch behaviors during sleep.⁴⁶^,⁴⁷ Hereinafter, we refer to hypnograms obtained using the automatic procedure described above as threshold-based hypnograms (Threshold in Fig. 3E). Each threshold-based hypnogram was then visually inspected and manually corrected to obtain the supervisory hypnogram for machine learning. Custom written Matlab codes were used for this vigilance state classification.

Fig. 3

Pupil dynamics-based sleep-stage labeling using a deep neural network.

(A) Process flow of a long short-term memory (LSTM) network for pupil-based sleep stage classification. This example LSTM model had five input features: PD, HL, VL, PV, and EO. The feature vectors were sequentially processed in the LSTM cell to obtain a three-dimensional output vector, y, that consisted of log likelihood values for NREM, REM, and WAKE periods, the largest of which was used to estimate the vigilance state at time t: PD, pupil diameter; HL and VL, pupillary horizontal and vertical locations; PV, pupil velocity; EO, eyelid opening. (B) Assignment of datasets for training (blue), validation (orange), and testing (purple) of the LSTM networks to avoid data leakage. A stratified, five-fold, cross-validation technique was used for hyperparameter tuning that compared the performances of various LSTM architectures. The lengths of horizontal bars correspond to the session durations. (C) Representative learning curves of an LSTM model in a single fold for hyperparameter optimization in five-fold cross-validation. The training loss (blue) approached a plateau, implying convergence of the model. Validation-based early stopping was used to prevent overfitting of the network. In this example, training of the LSTM model was halted in the middle of learning at epoch 238 (vertical dotted line). (D) The vigilance state classification performance of a variety of LSTM models was evaluated using the macro F1-score (worst 0, best 1) on validation data (orange bars in Fig. 3B) for hyperparameter optimization. We selected the LSTM model with the highest performance score as the hyperparameter-optimized LSTM model (LSTMoptim; the leftmost thick bar) that exploited all input features (PD, PL, PV, and EO). The performance of LSTMoptim was significantly higher than that of LSTM models with a single feature input (Dunnett’s test, n = 34 sessions, *** P < 0.001 vs. LSTMoptim as control). Pupil location (PL) includes features for pupillary horizontal and vertical locations (HL and VL). (E) Comparison of the classification performance using test data (purple bars in Fig. 3B) of LSTMoptim with minor modification (LSTMm), the feed-forward neural network (FFNN), and the conventional thresholding method without manual correction (Threshold). LSTMm and FFNN used pupil dynamics for sleep staging, whereas Threshold used EEG and EMG data. The overall classification performance was assessed based on Macro f1 and accuracy. LSTMm showed higher overall performance than either FFNN or Threshold. In particular, LSTMm had a significantly higher macro F1-score than that of FFNN (purple bars, two-sided paired t-test: LSTMm 0.77 ± 0.02 vs. FFNN 0.70 ± 0.04, *p = 0.027, n = 4 mice). The classification performance for each vigilance state was assessed using recall and precision (blue, red, and black bars). LSTMm had significantly higher recall in REM than FFNN (red bars for recall: two-sided paired t-test: LSTMm 0.64 ± 0.07 vs. FFNN 0.40 ± 0.09, *p = 0.0134, n = 4 mice). Note that higher REM recall for Threshold was achieved at the expense of lower WAKE recall. Statistical tests were adjusted for family-wise error using the Benjamini-Hochberg procedure with a false discovery rate of 0.05. (F) Comparison of the supervisory hypnogram (top), and the LSTMm hypnogram (bottom), demonstrating their overall mutual congruence. Characteristic differences between them were that NREM states in the supervisory hypnogram were estimated occasionally as REM using LSTMm. Retrospective reexamination of EEG data during such periods indicated a transient theta power surge, which implies that pupil dynamics might be a sensitive indicator of REM episodes.⁴⁸

Pupil dynamics-based vigilance state classification of head-fixed mice was performed with a single-layer, long short-term memory (LSTM), neural network model (Fig. 3A). We compared the performances of LSTM models with different hyperparameters, including selected input features (PD, HL, VL, PV, EO) for 10 s at 10 Hz, batch sizes (512, 1024, 2048), hidden state vector dimensions (32, 128, 512), and dropout rates (0, 0.5) (Fig. 3D, Table 1). The feature vector input was processed sequentially for 100 time steps that corresponded to a 10-s bin. The first feature vector x_t₋₉₉ at time t−99 was processed in the first LSTM cell to obtain an m-dimensional hidden state vector, h_t₋₉₉. This hidden state vector h_t₋₉₉ and the second feature vector x_t₋₉₈were fed into the second LSTM cell to obtain the next hidden state vector: h_t₋₉₈. These processes were repeated 100 times to obtain the last hidden state vector, h_t, which was then affine-transformed to produce a three-dimensional output vector, y, consisting of log likelihood values for NREM, REM, and WAKE episodes. The largest likelihood in the output vector was assigned as the estimated vigilance state at time t. We also prepared an FFNN model as reported earlier, but with a minor modification.³³ The FFNN model consisted of an input layer for all five input features for 10 s duration, one hidden layer with 30 nodes, and a three-dimensional output layer for WAKE, NREM, and REM. The training procedure used for the FFNN model was the same as that used for the LSTM models.

Table 1. Evaluation results of hyperparameter tuning with 5-fold cross validation of LSTM

Hyperparameter					NREM		REM		WAKE
Input	Batch size	hdim	dr	Macro f1	Recall	Precision	Recall	Precision	Recall	Precision
PDPLPVEO	2048	512	0.5	0.78 ± 0.01	0.83 ± 0.03	0.89 ± 0.01	0.64 ± 0.02	0.59 ± 0.03	0.9 ± 0.02	0.86 ± 0.03
PDPLPVEO	2048	512	0	0.78 ± 0.01	0.84 ± 0.04	0.88 ± 0.01	0.62 ± 0.02	0.61 ± 0.02	0.89 ± 0.02	0.86 ± 0.03
PDPLPVEO	2048	128	0.5	0.78 ± 0.01	0.83 ± 0.04	0.88 ± 0.01	0.65 ± 0.02	0.59 ± 0.02	0.89 ± 0.02	0.86 ± 0.03
PDPLPVEO	2048	128	0	0.78 ± 0.01	0.83 ± 0.04	0.88 ± 0.01	0.65 ± 0.03	0.57 ± 0.03	0.89 ± 0.02	0.86 ± 0.03
*PDPLPVEO	2048	32	0.5	0.78 ± 0.01	0.84 ± 0.04	0.88 ± 0.01	0.63 ± 0.02	0.6 ± 0.03	0.89 ± 0.02	0.87 ± 0.03
PDPLPVEO	2048	32	0	0.77 ± 0.01	0.83 ± 0.04	0.87 ± 0.02	0.63 ± 0.04	0.58 ± 0.03	0.89 ± 0.02	0.85 ± 0.03
PDPLPVEO	1024	512	0.5	0.78 ± 0.01	0.83 ± 0.03	0.88 ± 0.01	0.64 ± 0.02	0.59 ± 0.03	0.9 ± 0.02	0.86 ± 0.03
PDPLPVEO	1024	512	0	0.78 ± 0.01	0.83 ± 0.03	0.88 ± 0.01	0.62 ± 0.02	0.61 ± 0.03	0.9 ± 0.01	0.85 ± 0.03
PDPLPVEO	1024	128	0.5	0.78 ± 0.01	0.84 ± 0.03	0.88 ± 0.01	0.64 ± 0.03	0.59 ± 0.03	0.89 ± 0.02	0.86 ± 0.03
PDPLPVEO	1024	128	0	0.78 ± 0.01	0.83 ± 0.03	0.88 ± 0.01	0.64 ± 0.02	0.58 ± 0.02	0.9 ± 0.02	0.85 ± 0.03
PDPLPVEO	1024	32	0.5	0.78 ± 0.01	0.83 ± 0.03	0.88 ± 0.01	0.64 ± 0.03	0.59 ± 0.03	0.9 ± 0.01	0.86 ± 0.03
PDPLPVEO	1024	32	0	0.78 ± 0.01	0.83 ± 0.03	0.88 ± 0.01	0.64 ± 0.02	0.58 ± 0.02	0.9 ± 0.02	0.86 ± 0.03
PDPLPVEO	512	512	0.5	0.78 ± 0.01	0.84 ± 0.04	0.88 ± 0.01	0.64 ± 0.03	0.6 ± 0.03	0.89 ± 0.02	0.87 ± 0.04
PDPLPVEO	512	512	0	0.78 ± 0.01	0.83 ± 0.04	0.88 ± 0.01	0.65 ± 0.03	0.58 ± 0.03	0.89 ± 0.02	0.86 ± 0.03
PDPLPVEO	512	128	0.5	0.78 ± 0.01	0.83 ± 0.03	0.88 ± 0.01	0.63 ± 0.03	0.58 ± 0.02	0.9 ± 0.01	0.86 ± 0.03
PDPLPVEO	512	128	0	0.78 ± 0.01	0.83 ± 0.03	0.88 ± 0.01	0.63 ± 0.02	0.59 ± 0.03	0.9 ± 0.01	0.86 ± 0.03
PDPLPVEO	512	32	0.5	0.78 ± 0.01	0.83 ± 0.03	0.89 ± 0.01	0.64 ± 0.03	0.57 ± 0.02	0.91 ± 0.01	0.86 ± 0.03
PDPLPVEO	512	32	0	0.78 ± 0.01	0.83 ± 0.04	0.88 ± 0.01	0.64 ± 0.02	0.57 ± 0.02	0.9 ± 0.02	0.86 ± 0.03
PD	1024	512	0.5	0.67 ± 0.04	0.82 ± 0.04	0.82 ± 0.02	0.34 ± 0.11	0.36 ± 0.1	0.86 ± 0.01	0.84 ± 0.03
PD	1024	512	0	0.66 ± 0.04	0.82 ± 0.02	0.81 ± 0.02	0.32 ± 0.11	0.38 ± 0.1	0.85 ± 0.02	0.83 ± 0.03
PD	1024	128	0.5	0.66 ± 0.02	0.79 ± 0.03	0.81 ± 0.02	0.34 ± 0.02	0.39 ± 0.05	0.84 ± 0.04	0.82 ± 0.03
PD	1024	128	0	0.7 ± 0.03	0.81 ± 0.04	0.81 ± 0.03	0.46 ± 0.04	0.45 ± 0.06	0.84 ± 0.03	0.84 ± 0.04
PD	1024	32	0.5	0.55 ± 0.01	0.84 ± 0.03	0.79 ± 0.02	0.0 ± 0.0	0.0 ± 0.0	0.83 ± 0.04	0.83 ± 0.03
PD	1024	32	0	0.56 ± 0.01	0.83 ± 0.04	0.79 ± 0.02	0.03 ± 0.03	0.15 ± 0.09	0.83 ± 0.03	0.82 ± 0.03
PD	2048	512	0.5	0.63 ± 0.03	0.81 ± 0.03	0.81 ± 0.01	0.23 ± 0.1	0.31 ± 0.09	0.85 ± 0.02	0.83 ± 0.03
PD	2048	512	0	0.71 ± 0.03	0.8 ± 0.03	0.84 ± 0.01	0.44 ± 0.04	0.47 ± 0.06	0.87 ± 0.02	0.83 ± 0.03
PD	2048	128	0.5	0.65 ± 0.01	0.8 ± 0.03	0.79 ± 0.02	0.32 ± 0.01	0.37 ± 0.04	0.82 ± 0.03	0.82 ± 0.03
PD	2048	128	0	0.62 ± 0.02	0.81 ± 0.03	0.79 ± 0.02	0.22 ± 0.08	0.29 ± 0.08	0.83 ± 0.03	0.83 ± 0.03
PD	2048	32	0.5	0.56 ± 0.02	0.83 ± 0.03	0.79 ± 0.02	0.04 ± 0.02	0.13 ± 0.09	0.83 ± 0.04	0.82 ± 0.03
PD	2048	32	0	0.58 ± 0.03	0.84 ± 0.03	0.78 ± 0.02	0.08 ± 0.06	0.17 ± 0.1	0.81 ± 0.03	0.83 ± 0.03
PD	512	512	0.5	0.68 ± 0.04	0.79 ± 0.04	0.83 ± 0.02	0.38 ± 0.1	0.49 ± 0.06	0.87 ± 0.02	0.82 ± 0.03
PD	512	512	0	0.72 ± 0.01	0.81 ± 0.03	0.85 ± 0.01	0.49 ± 0.05	0.49 ± 0.03	0.89 ± 0.01	0.84 ± 0.03
PD	512	128	0.5	0.71 ± 0.02	0.8 ± 0.03	0.84 ± 0.01	0.45 ± 0.05	0.48 ± 0.05	0.88 ± 0.01	0.83 ± 0.03
PD	512	128	0	0.73 ± 0.03	0.81 ± 0.04	0.85 ± 0.01	0.5 ± 0.05	0.5 ± 0.05	0.87 ± 0.02	0.85 ± 0.04
PD	512	32	0.5	0.55 ± 0.01	0.85 ± 0.03	0.78 ± 0.02	0.0 ± 0.0	0.0 ± 0.0	0.82 ± 0.04	0.83 ± 0.03
PD	512	32	0	0.61 ± 0.02	0.82 ± 0.04	0.79 ± 0.02	0.22 ± 0.07	0.28 ± 0.08	0.81 ± 0.04	0.83 ± 0.03
PL	2048	32	0.5	0.72 ± 0.01	0.8 ± 0.03	0.83 ± 0.02	0.52 ± 0.04	0.53 ± 0.04	0.84 ± 0.03	0.81 ± 0.03
PV	2048	32	0.5	0.61 ± 0.01	0.8 ± 0.03	0.81 ± 0.02	0.19 ± 0.01	0.32 ± 0.04	0.79 ± 0.04	0.77 ± 0.03
EO	2048	32	0.5	0.51 ± 0.02	0.74 ± 0.04	0.76 ± 0.02	0.0 ± 0.0	0.1 ± 0.1	0.84 ± 0.01	0.75 ± 0.04
PDPL	2048	32	0.5	0.73 ± 0.02	0.81 ± 0.03	0.84 ± 0.01	0.51 ± 0.06	0.52 ± 0.02	0.87 ± 0.02	0.84 ± 0.03
PDPV	2048	32	0.5	0.77 ± 0.01	0.83 ± 0.03	0.87 ± 0.01	0.62 ± 0.01	0.58 ± 0.02	0.88 ± 0.02	0.86 ± 0.03
PDEO	2048	32	0.5	0.65 ± 0.03	0.79 ± 0.03	0.83 ± 0.01	0.28 ± 0.09	0.4 ± 0.06	0.88 ± 0.01	0.83 ± 0.03
PLPV	2048	32	0.5	0.74 ± 0.02	0.81 ± 0.03	0.85 ± 0.02	0.59 ± 0.04	0.54 ± 0.03	0.86 ± 0.02	0.83 ± 0.03
PLEO	2048	32	0.5	0.73 ± 0.01	0.8 ± 0.03	0.84 ± 0.01	0.56 ± 0.04	0.52 ± 0.02	0.87 ± 0.01	0.82 ± 0.03
PVEO	2048	32	0.5	0.72 ± 0.02	0.79 ± 0.04	0.86 ± 0.01	0.52 ± 0.03	0.49 ± 0.04	0.88 ± 0.02	0.82 ± 0.04
PDPLPV	2048	32	0.5	0.78 ± 0.01	0.83 ± 0.04	0.88 ± 0.02	0.63 ± 0.03	0.59 ± 0.04	0.89 ± 0.03	0.86 ± 0.03
PDPLEO	2048	32	0.5	0.74 ± 0.01	0.81 ± 0.03	0.86 ± 0.01	0.55 ± 0.04	0.54 ± 0.04	0.89 ± 0.01	0.84 ± 0.03
PDPVEO	2048	32	0.5	0.77 ± 0.01	0.82 ± 0.03	0.63 ± 0.01	0.90 ± 0.01	0.88 ± 0.01	0.58 ± 0.02	0.85 ± 0.03
PLPVEO	2048	32	0.5	0.75 ± 0.02	0.81 ± 0.04	0.87 ± 0.02	0.59 ± 0.03	0.55 ± 0.03	0.88 ± 0.02	0.83 ± 0.03

The LSTM architecture with the best performance (LSTMoptim) is shown in bold typeface with an asterisk: PD, pupil diameter; PL, pupil location; PV, pupil velocity; EO, eyelid opening; hdim, hidden state vector dimension; dr, dropout rate.

The entire dataset (n = 39 sessions, four mice) was assigned to training, validation, and testing of our neural network models (Fig. 3B). Data from mouse SNT286 (four sessions) were used only as test data (purple bars in Fig. 3B) to prevent leakage of the test data into the training data. The remaining data (35 sessions from three mice: SNT267, SNT287, and SNT288) were used for training and evaluation of our neural network models with distinct hyperparameters using a stratified five-fold cross-validation method. Specifically, the remaining 35 sessions were divided randomly into five groups, each including seven sessions. This session-by-session grouping was used to prevent leakage of information within a session. In a respective fold, four groups and the one remaining group were assigned respectively to training data (blue bars in Fig. 3B) and validation data (orange bars in Fig. 3B). These group assignments were repeated five times (5 folds) to cover all combinations of the five groups.

The training of neural networks was performed by backpropagation of the error algorithm using the adaptive moment estimation optimizer with hyperparameters: learning rate α=7.5 × 10⁻⁴, gradient decay factor β₁=0.9, and squared gradient decay factor β₂=0.999. Learning curves for each LSTM network for each training step (epoch in Fig. 3C) were evaluated by an average weighted cross-entropy loss in the training dataset (blue trace in Fig. 3C) and in the validation dataset (orange trace in Fig. 3C). The weights for WAKE, REM, and NREM were set as inversely proportional to the bin counts for each vigilance state, thereby avoiding underestimation of the minor class (REM) by assigning greater weight to its entropy loss. To avoid overfitting in neural networks and to avoid an increase of the validation loss, we used validation-based early stopping when the validation loss did not decrease for more than 100 consecutive epochs (orange shaded rectangle in Fig. 3C) after training of at least 250 epochs, or when the training reached the maximum 500 epochs.

For hyperparameter optimization (Fig. 3D), the macro F1-score on the validation data (orange bars in Fig. 3B) was used to evaluate the vigilance state classification performance of various LSTM architectures. The macro F1-score, the harmonic mean of recall and precision, is a representative machine-learning metric for a multiclass classification with imbalanced data. The present dataset for vigilance state classification was moderately imbalanced because the minority class (REM) accounted for only a small proportion of the total recording (3.5 ± 0.3%; Fig. 1E left panel), which might engender underestimation of REM episodes by the common metric accuracy. We designated the LSTM architecture with the highest macro F1-score as LSTMoptim (the leftmost thick bar in Fig. 3D). To achieve better performance, hypnography obtained using LSTMoptim was slightly modified [designated LSTMm (Fig. 3E and 3F)] such that if an estimated episode did not persist for three consecutive bins, i.e., ≥30 s, the estimated state was changed to that of the preceding episode. After hyperparameter optimization, the generalization performances for the vigilance states classification by LSTMm, FFNN, and Threshold were evaluated using test data (n = 4 sessions of SNT286; Fig. 3E). We used the metrices macro F1-score and accuracy to evaluate the vigilance state classification as a whole and used the metrices recall and precision to evaluate the classification performance for each vigilance state. Neural network training was performed on Google Colaboratory pro using a Python library PyTorch (ver. 1.12.0). Data are expressed as mean ±SEM, except where noted otherwise.

Data and code availability

Sample data for inputs (an eye movie and its timestamps), intermediate files (weights of LSTMoptim and DeepLabCut), and an output file (a pupil dynamics-derived hypnogram) are deposited in Mendeley Data (https://doi.org/10.17632/rr4gc6mybg.1). Python codes for LSTMoptim and DeepLabCut, and a Matlab code for pupil feature extraction are available online (https://github.com/gk-hazard/Pupil-to-Hypnogram).

Results

Preparation of a supervisory hypnogram based on EEG and EMG data

EEG and EMG recordings and pupillometry were performed for a head-fixed mouse (Fig. 1A). A single recording session lasted 3–6 h/day in a darkroom during a light phase. The amplitudes of EEG and EMG waves from head-fixed mice during each vigilance state corresponded to that of freely behaving mice (Fig. 1B): the EEG amplitudes were larger during NREM than during either the WAKE or REM periods, whereas the EMG amplitudes were larger during WAKE than during either NREM or REM periods. We first obtained a hypnogram automatically using a conventional thresholding method for EEG and EMG data combined with automatic corrections to prohibit single-bin flip-flop transitions such as NREM–WAKE–NREM. We designated this hypnogram without manual correction as the threshold-based hypnogram (Threshold). Then, a supervisory hypnogram was obtained by manual inspection and correction of the threshold-based hypnogram (Fig. 1C). The supervisory hypnograms for head-fixed mice reproduced the typical polyphasic nature of mouse sleep. Manual corrections of the vigilance states in the Threshold hypnograms were necessary for 27 sessions among the 39 sessions from four mice (Fig. 1D).

The time spent in each vigilance state in a single recording session and the duration for an individual episode in all recording sessions were calculated using the supervisory hypnogram from all data (Fig. 1E; total recording/session, 323 ± 63 min [mean ±SD]; 1672 episodes, 39 sessions, four mice; central line, median; bottom and top of box, 25^th and 75^th percentiles; whiskers, adjacent values). NREM sleep accounted for most of the total sleep time in a recording session (NREM, 48.9 ± 3.0%; REM, 3.5 ± 0.3%; NREM/[NREM + REM], 92.7 ± 0.5%), which was consistent with an earlier report on freely moving mice during the light phase (NREM/[NREM + REM], ≈88 ± 5%).² The median episode durations (Fig. 1E, right panel) were 5.5, 1.7, and 3.0 min, respectively, for NREM, REM, and WAKE, which were also comparable with an earlier study for freely moving mice in the light phase (≈2.3, 1.5, and 1.1 min for NREM, REM, and WAKE).³³

Obtaining pupil dynamics time series as feature inputs for a deep neural network

Pupillometry through wakefulness and sleep states was possible because head-fixed mice sleep with incomplete closure of their eyelids (nocturnal lagophthalmos), as reported earlier.³³^,³⁵ The reasons and mechanisms for open eyelids during sleep are unclear. Our mice slept with closed eyelids in their home cage, suggesting that the surgery did not prohibit closing of the eyelids.

We acquired the four relevant locations, i.e., pupil rostral (PR), pupil caudal (PC), upper eyelid margin (UEM), and lower eyelid margin (LEM), in left-eye movies using DeepLabCut (Fig. 2A). We prepared supervisory data to train DeepLabCut by labeling the four positions manually in 600 left-eye images. The training of DeepLabCut was assessed using the cross-entropy loss (Loss) with 95% of the manually prepared dataset (570 from the 600 left eye images). Loss reached a plateau after a million epochs, indicating convergence of the training (Fig. 2B left). Generalization of the DeepLabCut position estimation was evaluated using the root mean square error (RMSE) of the distance from the manually defined supervisory positions to the DeepLabCut-estimated positions. Acceptable generalization was achieved, as inferred from the similar RMSE values found for the training data and test data (5% of the manually obtained dataset, i.e., 30 from the 600 left eye images), as shown in Fig. 2B (right, n = 4 mice, RMSE for training and test data: PR, 0.11 ± 0.01 mm, 0.11 ± 0.02 mm; PC: 0.10 ± 0.00 mm, 0.10 ± 0.02 mm; UEM, 0.10 ± 0.01 mm, 0.11 ± 0.01 mm; LEM, 0.094 ± 0.004 mm, 0.089 ± 0.023 mm). Indeed, no significant differences in RMSE values were found between features (PR, PC, UEM, and LEM: F(3, 24)=2.74, P = 0.07, two-way ANOVA), between datasets (training vs. test: F(1, 24)=0.17, P = 0.69), or for their interaction (F(3, 24)=0.28, P = 0.84). We concluded that the performance of our DeepLabCut estimation method was satisfactory because the overall RMSEs for training data and test data (Fig. 2B right; n = 4 mice, 0.10 ± 0.01 mm, and 0.10 ± 0.01 mm) were smaller than the minimum diameter of the pupil (n = 4 mice, 0.25 ± 0.02 mm).

The trained DeepLabCut was used to estimate automatically the four positions in the 22,589,940 left eye images in all movies from 39 sessions with four mice (Fig. 2C upper panels, Supplementary Video). Five features of the eye dynamics were derived from the four positions (Fig. 2C traces). The pupil diameter (PD) is the normalized distance between PR and PC (top trace in Fig. 2C). Although PD was large for most cases during WAKE periods, we found a decreasing trend and fluctuations of PD from 5–10 min before the transition to NREM sleep (green arrows in Fig. 2C top trace). During NREM states, low-frequency fluctuation of PD was evident, as reported previously.³³ The PD fluctuation amplitude seemed to increase with the NREM state duration, implying that larger PD fluctuations accompanied deeper NREM sleep. During REM sleep, PD was small, but sometimes rapid dilation of PD occurred immediately before awakening. Horizontal and vertical pupil locations (HL, VL) were calculated as distances from the mean intermediated positions of PR and PC (second row from the top in Fig. 2C traces). Antiparallel movements of HL and VL were observed frequently during REM sleep, indicating anterior downward movements of the pupil. The pupil velocity (PV) was calculated for the intermediate position between PR and PC (third row from the top in Fig. 2C traces). Results show that PV was greater during WAKE and REM than during NREM periods. Although a small fraction, the highest pupil velocities were found during REM periods (Fig. 2D pupil velocity). Periodic increases of PV during NREM periods might coincide with PD fluctuations. The normalized distance between UEM and LEM represents the eyelid opening (EO; bottom row of Fig. 2C traces). On average, EO was greatest during WAKE, followed by NREM, and smallest during REM, which were similar to the pattern for pupil diameter but with distinct distributions during NREM periods, i.e., distributions for PD and EO during NREM periods were, respectively, uniform and peaked (Fig. 2D Eyelid opening). The figure shows that EO exhibited a gradual decrease during NREM and reached minimum values during REM. Occasionally, EO showed a surge immediately before the transition from REM to WAKE episodes, which was similar to the PD dynamics. The rapid deflection of EO to the minimum during WAKE and NREM periods corresponded to eye blinks (black arrowhead in the bottom row of Fig. 2C traces). These five features (PD, HL, VL, PV, and EO) individually cannot determine the vigilance state because of their overlaps among WAKE, REM, and NREM episodes (Fig. 2D); moreover, PD showed an almost uniform distribution during NREM sleep.

Sleep stage estimation from pupil dynamics using a deep neural network

We developed LSTM models that are able to estimate vigilance states from pupil dynamics (Fig. 3A). Inputs for the model consisted of one to five features such as PD and EO. The inputs were 10 s long with a 10-Hz sampling rate, resulting in 100 time steps, each of which was processed sequentially with 100 LSTM cells. The hidden layer vector of the last LSTM cell, h_t, was affine-transformed into a three-dimensional output vector, y, that consisted of log likelihoods for NREM, REM, and WAKE states at time t. The state with the largest log likelihood was used as the estimated vigilance state of the model. To train the LSTM model, this estimated vigilance state was compared with the state in the supervisory hypnogram based on EEG and EMG data. To avoid data leakage that might engender overestimation of the model performance, we (1) set aside some mouse data as test data (purple bars in Fig. 3B; four sessions of mouse SNT286) from the entire dataset (39 sessions of four mice), and (2) performed hyperparameter optimization using a stratified, five-fold, cross-validation method on a session-by-session basis (blue and orange bars in Fig. 3B). Consequently, we avoided data leakage in tests and validation because (1) the generalization performance of a model was evaluated with test data that were used neither for training nor validation, and because (2) session data that were used for training were not used for validation during hyperparameter optimization.

Supervised training of LSTM models was performed using validation-based early stopping to avoid overfitting (orange rectangle in Fig. 3C). After convergence in training, we compared the respective classification performance results obtained using LSTM models with distinct hyperparameters such as input features (distinct combinations of PD, PL, PV, and EO), the size of the hidden layer vector for each LSTM cell (h_t: 32, 138, or 512), the dropout rate (0 or 0.5) for the final hidden state h_t, and the size of the mini-batch gradient descent (512, 1024, or 2048). The comparison was based on the average macro F1-scores using validation data over five folds (Fig. 3D). The largest macro F1-score was achieved using the LSTM model with all input features, a hidden layer vector size of 32, a dropout rate of 0.5, and a mini-batch size of 2048 (the leftmost thick bar in Fig. 3D; 0.78 ± 0.01 [mean ± SEM of five folds’ data]). The model with the highest macro F1-score was designated the hyperparameter-optimized LSTM model (LSTMoptim). The macro F1-score of LSTMoptim was significantly higher than those of LSTM models with a single input feature (right four bars in Fig. 3D; Dunnett’s test, n = 34 sessions, ***P < 0.001 vs. LSTMoptim as a control); however, the performance score of LSTMoptim was not significantly different from those of other LSTM models with all input features (PD, PL, PV, and EO), irrespective of other hyperparameter values for hidden layer unit size, dropout rate, or mini-batch size (left four bars in Fig. 3D). It is noteworthy that the LSTM model using the single feature EO (rightmost bar in Fig. 3D) showed comparable performance to that obtained with the model using the single feature PD (fourth bar from the right in Fig. 3D); this finding implies that EO was as informative as PD, a commonly used feature for sleep–wake discrimination.³⁰Table 1 presents the respective performances of all the LSTM models with distinct hyperparameters. We observed that macro F1-scores remained stable across hidden layer vector sizes of 32, 128, and 512, whereas feature set combinations heavily influenced the scores for the hyperparameter optimization.

The generalization performance for vigilance state estimation was evaluated using the test data (Fig. 3E). To achieve better performance, the original hypnogram produced using LSTMoptim was modified slightly and automatically by introducing a criterion for vigilance state persistence. If an estimated vigilance state by LSTMoptim did not persist for at least three consecutive bins (≥30 s), then the estimated state was changed to the state immediately preceding it. We designated this modified hypnogram LSTMm. The rationale for this modification was that we noticed short-duration flip-flop transitions between WAKE and NREM states in the original LSTMoptim hypnogram during WAKE periods preceding NREM periods; these transitions were apparently attributable to PD fluctuations immediately before the WAKE-to-NREM transition that resembled PD fluctuations during NREM periods (green arrows in Fig. 2C). This modification changed 4.6% of vigilance states determined by LSTMoptim (379 bins in 8736 bins in the test data) and significantly improved the macro F1-score, the recall of REM episodes, and the precision of REM episodes in LSTMm (n = 4 sessions, paired t test, family-wise error was corrected with the false discovery rate for P = 0.05, LSTMm vs. LSTMoptim: macro F1-score, 0.77 ± 0.02 vs. 0.73 ± 0.03, *P = 0.0093; REM recall, 0.64 ± 0.07 vs 0.57 ± 0.07, P = 0.11; REM precision, 0.55 ± 0.05 vs. 0.45 ± 0.05, *P = 0.0019).

We compared the performances of LSTMm, the feed-forward neural network (FFNN), and the threshold-based method (Threshold) (Fig. 3E). These methods require no manual correction of the hypnogram: for hypnogram estimation, LSTMm and FFNN used pupil dynamics, whereas Threshold used EEG and EMG. Overall performance for vigilance state estimation was assessed using macro f1 and accuracy (purple and yellow bars in Fig. 3E). LSTMm performed better than either FFNN or Threshold. In particular, the macro F1-score of LSTMm was significantly higher than that of FFNN (purple bars in Fig. 3E, two-sided paired t-test: LSTMm 0.77 ± 0.02 vs. FFNN 0.70 ± 0.04, *P = 0.027, n = 4 mice). The classification performance for each vigilance state was assessed using recall and precision (blue, red, and black bars in Fig. 3E). LSTMm performed better than FFNN for all six comparisons. Particularly, LSTMm achieved a significantly higher recall for REM than FFNN did (red bars in Fig. 3E, two-sided paired t-test: LSTMm 0.64 ± 0.07 vs. FFNN 0.40 ± 0.09, *P = 0.0134, n = 4 mice). The precision for REM was also higher for LSTMm than for FFNN, although the difference was not significant (red bars in Fig. 3E, two-sided paired t-test: LSTM 0.55 ± 0.05 vs. FFNN 0.45 ± 0.07, P = 0.144, n = 4 mice). Notably, Threshold showed the highest recall for NREM and REM (0.97 ± 0.01 and 0.94 ± 0.02; blue and red bars for Threshold in Fig. 3E); however, these scores were achieved at the expense of lower scores for NREM and REM precision (0.71 ± 0.10 and 0.34 ± 0.04; blue and red bars for Threshold in Fig. 3E), and WAKE recall (0.56 ± 0.04; black bar for Threshold in Fig. 3E). Indeed, misidentification of WAKE states as NREM or REM states is apparent in Fig. 1D, with indications of higher recall and lower precision for NREM and REM (blue and red bars in Fig. 3E), and lower recall and higher precision for WAKE (black bars in Fig. 3E).

Time series of vigilance states labeled by LSTMm were compared with a supervisory hypnogram (Fig. 3F) and demonstrated congruent estimation. We noted additional REM states in the LSTMm hypnogram compared with the supervisory hypnogram (green arrow in Fig. 3F). Retrospective inspection of EEG data during these REM episodes identified by LSTMm revealed a transient theta power increase in EEG during the 10-s bin that was labeled as an NREM state in the supervisory hypnogram, suggesting that pupil dynamics is a sensitive indicator of REM; this suggestion was in accordance with an earlier report of a theta–pupil interaction.⁴⁸

Discussion

We demonstrated continual vigilance state estimation of head-fixed mice based on 10-s pupil dynamics using a single-layer LSTM model. The overall estimation performance was considered to be satisfactory because the accuracy of sleep staging by our pupil dynamics-based estimator (86.4 ± 4.3%, mean ±SD) was comparable to the inter-rater agreement rate achieved by human experts (90–96%).¹¹^,⁴¹^,⁴⁹ Moreover, the performance of our pupil dynamics-based estimator approached that of recently reported sophisticated EEG-based and EMG-based neural networks such as a combination of a convolutional neural network (CNN) plus a hidden Markov model (93 ± 3.4% to 99 ± 0.4% for mouse cohorts)³⁸ and a CNN plus a bidirectional LSTM model (95.0–96.7%).¹¹ The classification accuracy of these EEG-based and EMG-based estimators developed for freely moving mice might be lower when applied to head-fixed mice because the EMG activity of head-fixed mice during WAKE periods seemed not to be consistently large enough to be clearly distinguishable from those during sleep periods. Indeed, the conventional Threshold method misassigned many WAKE periods as REM or NREM episodes (Fig. 1D), resulting in lower recall for WAKE episodes (Fig. 3E). One advantage of pupil dynamics-based sleep staging over EEG-based and EMG-based staging is expected to be its compatibility with wide-field imaging of the cortex because pupil imaging requires no invasive surgery for electrode positioning on the thin mouse skull that would interfere with the imaging field.⁵⁰

There is great scope for improvements in the performance of pupil dynamics-based vigilance state estimation. The performances of FFNN and RNN are known to be similar if there is no strong temporal dependence in the data.⁵¹^,⁵² Consequently, the higher performance of our LSTM model than the FFNN model implies that the temporal dependence of pupil dynamics was important for sleep staging, even with 10-s time series. Therefore, the incorporation of pupil dynamics over a longer duration is expected to improve performance because PD fluctuation preceded NREM sleep by approximately 10 min (green arrows in Fig. 2C). Moreover, the amplitude of PD fluctuation seemed to increase along with the duration of NREM episodes beyond approximately 10 min. Long-term pupil dynamics of such lengths are impossible to capture in the present LSTM model with feature vector inputs of 10 s. Indeed, characteristic mislabeling by our naïve LSTM model (LSTMoptim) did occur. For example, WAKE periods of approximately 10 min preceding NREM periods were occasionally estimated incorrectly as a repetition of WAKE and NREM states. The preceding PD fluctuations during WAKE periods might have been misrecognized by LSTMoptim as PD fluctuations during NREM sleep. This mislabeling was corrected in the proposed LSTM model (LSTMm) by introducing the simple rule that a change of vigilance state should persist for at least three consecutive states. Longer-duration feature vector inputs to an LSTM model might obviate the need for this arbitrary rule and improve the scoring performance by constraining sleep staging from a prolonged perspective. Furthermore, improving the quality of the feature vector inputs themselves is expected to improve sleep staging performance by an LSTM model. For example, using a shorter bin duration, e.g., 4 s, instead of the current 10 s for the supervisory hypnogram can be expected to improve the sleep scoring performance because we noticed that a 10-s bin that had been mislabeled as an REM period by LSTMm during an NREM period in a supervisory hypnogram was accompanied by a transient increase of theta powers in EEG data. This observation is consistent with an earlier report stating that a longer bin duration (20 s) for sleep–wake scoring of mice tended to miss very short events. Moreover, a higher frame rate video (>60 Hz) can be expected to improve REM state detection because the frame rate of the current USB camera (approximately 33 Hz) was too slow for complete capture of rapid eye movement of mice recorded for 2–3 frames during movement.³³

A few modifications would be necessary to convert the current offline sleep-staging method to an online method. Although the processing of LSTMm was completed rapidly and with only negligible resources, e.g., only a few seconds to process 6-h feature vector inputs, preprocessing of the pupil video images was time-consuming. The pre-trained DeepLabCut took 2.5 h to process a 6-h pupil movie that contained >700 million frames. The bottleneck was uploading the movie to the Google Colab website and doing DeepLabCut processing online. These findings indicate that a local computer for DeepLabCut processing would likely be necessary for real-time sleep staging. Another issue for online sleep staging is that input features PD and EO were relative values in this study, requiring whole-session data to calculate them. Using absolute, instead of relative, values might resolve this issue. Applying the current method to a freely moving rodent would be a challenge because mice usually sleep with their eyes closed. One possible solution for freely moving rodents might be pupillometry through eyelids by combining head-mounted, infrared, back-illumination pupillometry,³³ a miniaturized ocular videography system,⁵³^,⁵⁴ and scatter-resistant single photon imaging.⁵⁵ Another limitation of the present method arises from its application to albino mice: we were unable to detect their pupils using a USB camera under infrared-red illumination.

Pupillary dynamics can be expected to provide fruitful opportunities to elucidate the brain states of mice, rather than just providing an alternative to EEG-based and EMG-based hypnography.⁵⁶^,⁵⁷ While training our LSTM model to classify vigilance states into three stages in the current study, we noted distinct pupillary dynamics even during a vigilance state that was assigned as a single state based on EEG and EMG data. During WAKE periods, we noticed that pupil diameter fluctuation preceded the onset of NREM sleep, which was difficult to capture with EEG and EMG. Extracellular spike recording has shown that mesopontine cholinergic neurons decreased their discharge frequencies 10–20 s before the onset of NREM sleep.⁵⁸ Therefore, the preceding oscillations of pupil diameter could imply the presence of neurons whose activity changes over longer periods (several minutes) ahead of the onset of NREM sleep.⁵⁹^,⁶⁰ In a human study, spontaneous waves of constriction and dilation of the pupils were attributed to decreased levels of wakefulness.³⁰ Considering a recent finding that a hypnagogic period, which is a twilight zone between wakefulness and sleep, ignited creative sparks in human subjects,⁶¹^,⁶² it is intriguing to speculate that mice pupillary oscillations preceding NREM sleep might be a useful measure to investigate sleepiness and creativity in mice. During NREM sleep, PD fluctuates with increasing amplitude along with NREM duration,³³ indicating that mouse NREM sleep was not a monotonic state but was instead composed of different sleep depths, similar to that of humans.⁶³ A recent report noted that the oscillatory population-level activity of dorsal raphe serotonin neurons gradually increased in amplitude during NREM sleep,⁶⁴ which implies a relation between deeper NREM sleep and larger PD fluctuations. During REM sleep, two microstates can be defined using eye movement frequencies⁶⁵^,⁶⁶^: phasic REM sleep accompanies bursts of eye movements, and tonic REM sleep has fewer eye movements. This is another example of a subdivision of a brain state suggested by pupil dynamics. Additionally, we noted rapid dilation of PD (mydriasis) at the end of REM sleep when a transition from REM to WAKE states occurred. This pupillary dilation might indicate neuronal activation of the noradrenergic locus coeruleus and related circuits in the brain.⁶⁷ In conclusion, our simple and inexpensive approach to measuring pupil dynamics using a conventional USB camera could expand the repertoire of sleep stage scoring of head-fixed mice.

Acknowledgments

This work was supported by JSPS KAKENHI Grants (Numbers 18H04952, 19KK0387 and 22H03033 to N.T.) and the Keio University Academic Development Fund for Joint Research 2022 (N.T.).

Conflicts of Interest

The authors declare that no conflicts of interest exist.

Supplementary Video. Pupil dynamics of a head-fixed mouse in a sleep–wake cycle

Left eye movies of a head-fixed mouse during WAKE, NREM, and REM periods for 10 s each. Positions for pupil rostral (PR, purple), pupil caudal (PC, light blue), upper eyelid margin (UEM, yellow), and lower eyelid margin (LEM, red) were obtained using DeepLabCut.

References

1. González-Rueda A, Pedrosa V, Feord RC, Clopath C, Paulsen O: Activity-dependent downscaling of subthreshold synaptic inputs during slow-wave-sleep-like activity in vivo. Neuron 2018; 97: 1244–1252.e5. PMID:29503184 https://doi.org/10.1016/j.neuron.2018.01.047
2. Funato H, Miyoshi C, Fujiyama T, Kanda T, Sato M, Wang Z, Ma J, Nakane S, Tomita J, Ikkyu A, Kakizaki M, Hotta-Hirashima N, Kanno S, Komiya H, Asano F, Honda T, Kim SJ, Harano K, Muramoto H, Yonezawa T, Mizuno S, Miyazaki S, Connor L, Kumar V, Miura I, Suzuki T, Watanabe A, Abe M, Sugiyama F, Takahashi S, Sakimura K, Hayashi Y, Liu Q, Kume K, Wakana S, Takahashi JS, Yanagisawa M: Forward-genetics analysis of sleep in randomly mutagenized mice. Nature 2016; 539: 378–383. PMID:27806374 https://doi.org/10.1038/nature20142
3. Zhang Z, Zhong P, Hu F, Barger Z, Ren Y, Ding X, Li S, Weber F, Chung S, Palmiter RD, Dan Y: An excitatory circuit in the perioculomotor midbrain for non-REM sleep control. Cell 2019; 177: 1293–1307.e16. PMID:31031008 https://doi.org/10.1016/j.cell.2019.03.041
4. Hasegawa E, Miyasaka A, Sakurai K, Cherasse Y, Li Y, Sakurai T: Rapid eye movement sleep is initiated by basolateral amygdala dopamine signaling in mice. Science 2022; 375: 994–1000. PMID:35239361 https://doi.org/10.1126/science.abl6618
5. Jouvet M: Neurophysiology of the states of sleep. Physiol Rev 1967; 47: 117–177. PMID:5342870 https://doi.org/10.1152/physrev.1967.47.2.117
6. Seibt J, Richard CJ, Sigl-Glöckner J, Takahashi N, Kaplan DI, Doron G, de Limoges D, Bocklisch C, Larkum ME: Cortical dendritic activity correlates with spindle-rich oscillations during sleep in rodents. Nat Commun 2017; 8: 684. PMID:28947770 https://doi.org/10.1038/s41467-017-00735-w
7. Chemelli RM, Willie JT, Sinton CM, Elmquist JK, Scammell T, Lee C, Richardson JA, Williams SC, Xiong Y, Kisanuki Y, Fitch TE, Nakazato M, Hammer RE, Saper CB, Yanagisawa M: Narcolepsy in orexin knockout mice: molecular genetics of sleep regulation. Cell 1999; 98: 437–451. PMID:10481909 https://doi.org/10.1016/S0092-8674(00)81973-X
8. Silber MH, Ancoli-Israel S, Bonnet MH, Chokroverty S, Grigg-Damberger MM, Hirshkowitz M, Kapen S, Keenan SA, Kryger MH, Penzel T, Pressman MR, Iber C: The visual scoring of sleep in adults. J Clin Sleep Med 2007; 3: 121–131. PMID:17557422 https://doi.org/10.5664/jcsm.26814
9. Sunagawa GA, Séi H, Shimba S, Urade Y, Ueda HR: FASTER: an unsupervised fully automated sleep staging method for mice. Genes Cells 2013; 18: 502–518. PMID:23621645 https://doi.org/10.1111/gtc.12053
10. Geuther B, Chen M, Galante RJ, Han O, Lian J, George J, Pack AI, Kumar V: High-throughput visual assessment of sleep stages in mice using machine learning. Sleep 2022; 45: zsab260. PMID:34718812 https://doi.org/10.1093/sleep/zsab260
11. Yamabe M, Horie K, Shiokawa H, Funato H, Yanagisawa M, Kitagawa H: MC-SleepNet: large-scale sleep stage scoring in mice by deep neural networks. Sci Rep 2019; 9: 15793. PMID:31672998 https://doi.org/10.1038/s41598-019-51269-8
12. Ren C, Komiyama T: Characterizing cortex-wide dynamics with wide-field calcium imaging. J Neurosci 2021; 41: 4160–4168. PMID:33893217 https://doi.org/10.1523/JNEUROSCI.3003-20.2021
13. Ota K, Oisi Y, Suzuki T, Ikeda M, Ito Y, Ito T, Uwamori H, Kobayashi K, Kobayashi M, Odagawa M, Matsubara C, Kuroiwa Y, Horikoshi M, Matsushita J, Hioki H, Ohkura M, Nakai J, Oizumi M, Miyawaki A, Aonishi T, Ode T, Murayama M: Fast, cell-resolution, contiguous-wide two-photon imaging to reveal functional network architectures across multi-modal cortical areas. Neuron 2021; 109: 1810–1824.e9. PMID:33878295 https://doi.org/10.1016/j.neuron.2021.03.032
14. Sofroniew NJ, Flickinger D, King J, Svoboda K: A large field of view two-photon mesoscope with subcellular resolution for in vivo imaging. eLife 2016; 5: e14472. PMID:27300105 https://doi.org/10.7554/eLife.14472
15. Vanni MP, Murphy TH: Mesoscale transcranial spontaneous activity mapping in GCaMP3 transgenic mice reveals extensive reciprocal connections between areas of somatomotor cortex. J Neurosci 2014; 34: 15931–15946. PMID:25429135 https://doi.org/10.1523/JNEUROSCI.1818-14.2014
16. Shibuki K, Hishida R, Murakami H, Kudoh M, Kawaguchi T, Watanabe M, Watanabe S, Kouuchi T, Tanaka R: Dynamic imaging of somatosensory cortical activity in the rat visualized by flavoprotein autofluorescence. J Physiol 2003; 549: 919–927. PMID:12730344 https://doi.org/10.1113/jphysiol.2003.040709
17. Zhang X, Landsness EC, Chen W, Miao H, Tang M, Brier LM, Culver JP, Lee JM, Anastasio MA: Automated sleep state classification of wide-field calcium imaging data via multiplex visibility graphs and deep learning. J Neurosci Methods 2022; 366: 109421. PMID:34822945 https://doi.org/10.1016/j.jneumeth.2021.109421
18. Xu HT, Pan F, Yang G, Gan WB: Choice of cranial window type for in vivo imaging affects dendritic spine turnover in the cortex. Nat Neurosci 2007; 10: 549–551. PMID:17417634 https://doi.org/10.1038/nn1883
19. Eichhoff G, Brawek B, Garaschuk O: Microglial calcium signal acts as a rapid sensor of single neuron damage in vivo. Biochim Biophys Acta Mol Cell Res 2011; 1813: 1014–1024. PMID:21056596 https://doi.org/10.1016/j.bbamcr.2010.10.018
20. Chuapoco MR, Choy M, Schmid F, Duffy BA, Lee HJ, Lee JH: Carbon monofilament electrodes for unit recording and functional MRI in same subjects. Neuroimage 2019; 186: 806–816. PMID:30391560 https://doi.org/10.1016/j.neuroimage.2018.10.082
21. Pais-Roldán P, Takahashi K, Sobczak F, Chen Y, Zhao X, Zeng H, Jiang Y, Yu X: Indexing brain state-dependent pupil dynamics with simultaneous fMRI and optical fiber calcium recording. Proc Natl Acad Sci USA 2020; 117: 6875–6882. PMID:32139609 https://doi.org/10.1073/pnas.1909937117
22. Yoshida K, Mimura Y, Ishihara R, Nishida H, Komaki Y, Minakuchi T, Tsurugizawa T, Mimura M, Okano H, Tanaka KF, Takata N: Physiological effects of a habituation procedure for functional MRI in awake mice using a cryogenic radiofrequency probe. J Neurosci Methods 2016; 274: 38–48. PMID:27702586 https://doi.org/10.1016/j.jneumeth.2016.09.013
23. Zeng T, Mott C, Mollicone D, Sanford LD: Automated determination of wakefulness and sleep in rats based on non-invasively acquired measures of movement and respiratory activity. J Neurosci Methods 2012; 204: 276–287. PMID:22178621 https://doi.org/10.1016/j.jneumeth.2011.12.001
24. Bastianini S, Alvente S, Berteotti C, Lo Martire V, Silvani A, Swoap SJ, Valli A, Zoccoli G, Cohen G: Accurate discrimination of the wake-sleep states of mice using non-invasive whole-body plethysmography. Sci Rep 2017; 7: 41698. PMID:28139776 https://doi.org/10.1038/srep41698
25. Kloefkorn H, Aiani LM, Lakhani A, Nagesh S, Moss A, Goolsby W, Rehg JM, Pedersen NP, Hochman S: Noninvasive three-state sleep-wake staging in mice using electric field sensors. J Neurosci Methods 2020; 344: 108834. PMID:32619585 https://doi.org/10.1016/j.jneumeth.2020.108834
26. Lowenstein O, Loewenfeld IE: The sleep-waking cycle and pupillary activity. Ann N Y Acad Sci 1964; 117: 142–156. PMID:14196638 https://doi.org/10.1111/j.1749-6632.1964.tb48169.x
27. Ungurean G, Martinez-Gonzalez D, Massot B, Libourel PA, Rattenborg NC: Pupillary behavior during wakefulness, non-REM sleep, and REM sleep in birds is opposite that of mammals. Curr Biol 2021; 31: 5370–5376.e4. PMID:34670112 https://doi.org/10.1016/j.cub.2021.09.060
28.Berlucchi G, Moruzzi G, Salvi G, Strata P: Pupil behavior and ocular movements during synchronized and desynchronized sleep. Arch Ital Biol 1964; 102: 230–244.
29. Wilhelm B, Giedke H, Lüdtke H, Bittner E, Hofmann A, Wilhelm H: Daytime variations in central nervous system activation measured by a pupillographic sleepiness test. J Sleep Res 2001; 10: 1–7. PMID:11285049 https://doi.org/10.1046/j.1365-2869.2001.00239.x
30. Yoss RE, Moyer NJ, Hollenhorst RW: Pupil size and spontaneous pupillary waves associated with alertness, drowsiness, and sleep. Neurology 1970; 20: 545–554. PMID:5463609 https://doi.org/10.1212/WNL.20.6.545
31. Lowenstein O, Feinberg R, Loewenfeld IE: Pupillary movements during acute and chronic fatigue: a new test for the objective evaluation of tiredness. Invest Ophthalmol Vis Sci 1963; 2: 138–157.
32. Aserinsky E, Kleitman N: Regularly occurring periods of eye motility, and concomitant phenomena, during sleep. Science 1953; 118: 273–274. PMID:13089671 https://doi.org/10.1126/science.118.3062.273
33. Yüzgeç Ö, Prsa M, Zimmermann R, Huber D: Pupil size coupling to cortical states protects the stability of deep sleep via parasympathetic modulation. Curr Biol 2018; 28: 392–400.e3. PMID:29358069 https://doi.org/10.1016/j.cub.2017.12.049
34. Reimer J, Froudarakis E, Cadwell CR, Yatsenko D, Denfield GH, Tolias AS: Pupil fluctuations track fast switching of cortical states during quiet wakefulness. Neuron 2014; 84: 355–362. PMID:25374359 https://doi.org/10.1016/j.neuron.2014.09.033
35. Tsunematsu T, Patel AA, Onken A, Sakata S: State-dependent brainstem ensemble dynamics and their interactions with hippocampus across sleep states. eLife 2020; 9: e52244. PMID:31934862 https://doi.org/10.7554/eLife.52244
36. Reimer J, McGinley MJ, Liu Y, Rodenkirch C, Wang Q, McCormick DA, Tolias AS: Pupil fluctuations track rapid changes in adrenergic and cholinergic activity in cortex. Nat Commun 2016; 7: 13289. PMID:27824036 https://doi.org/10.1038/ncomms13289
37. Schaltenbrand N, Lengelle R, Toussaint M, Luthringer R, Carelli G, Jacqmin A, Lainey E, Muzet A, Macher JP: Sleep stage scoring using the neural network model: comparison between visual and automatic analysis in normal subjects and patients. Sleep 1996; 19: 26–35. PMID:8650459 https://doi.org/10.1093/sleep/19.1.26
38. Miladinović Đ, Muheim C, Bauer S, Spinnler A, Noain D, Bandarabadi M, Gallusser B, Krummenacher G, Baumann C, Adamantidis A, Brown SA, Buhmann JM: SPINDLE: end-to-end learning from EEG/EMG to extrapolate animal sleep scoring across experimental settings, labs and species. PLOS Comput Biol 2019; 15: e1006968. PMID:30998681 https://doi.org/10.1371/journal.pcbi.1006968
39. Hochreiter S, Schmidhuber J: Long short-term memory. Neural Comput 1997; 9: 1735–1780. PMID:9377276 https://doi.org/10.1162/neco.1997.9.8.1735
40. Gers FA, Schmidhuber J, Cummins F: Learning to forget: continual prediction with LSTM. Neural Comput 2000; 12: 2451–2471. PMID:11032042 https://doi.org/10.1162/089976600300015015
41. Pack AI, Galante RJ, Maislin G, Cater J, Metaxas D, Lu S, Zhang L, Smith RV, Kay T, Lian J, Svenson K, Peters LL: Novel method for high-throughput phenotyping of sleep in mice. Physiol Genomics 2007; 28: 232–238. PMID:16985007 https://doi.org/10.1152/physiolgenomics.00139.2006
42. Mathis A, Mamidanna P, Cury KM, Abe T, Murthy VN, Mathis MW, Bethge M: DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nat Neurosci 2018; 21: 1281–1289. PMID:30127430 https://doi.org/10.1038/s41593-018-0209-y
43. Nath T, Mathis A, Chen AC, Patel A, Bethge M, Mathis MW: Using DeepLabCut for 3D markerless pose estimation across species and behaviors. Nat Protoc 2019; 14: 2152–2176. PMID:31227823 https://doi.org/10.1038/s41596-019-0176-0
44. He K, Zhang X, Ren S, Sun J: Deep residual learning for image recognition. arXiv preprint arXiv.1512.03385 (2015).
45. Miyamoto D, Hirai D, Fung CC, Inutsuka A, Odagawa M, Suzuki T, Boehringer R, Adaikkan C, Matsubara C, Matsuki N, Fukai T, McHugh TJ, Yamanaka A, Murayama M: Top-down cortical input during NREM sleep consolidates perceptual memory. Science 2016; 352: 1315–1318. PMID:27229145 https://doi.org/10.1126/science.aaf0902
46. Burgess C, Lai D, Siegel J, Peever J: An endogenous glutamatergic drive onto somatic motoneurons contributes to the stereotypical pattern of muscle tone across the sleep-wake cycle. J Neurosci 2008; 28: 4649–4660. PMID:18448642 https://doi.org/10.1523/JNEUROSCI.0334-08.2008
47. Brooks PL, Peever J: A temporally controlled inhibitory drive coordinates twitch movements during REM sleep. Curr Biol 2016; 26: 1177–1182. PMID:27040781 https://doi.org/10.1016/j.cub.2016.03.013
48. Lin H, Saunders B, Hutcherson CA, Inzlicht M: Midfrontal theta and pupil dilation parametrically track subjective conflict (but also surprise) during intertemporal choice. Neuroimage 2018; 172: 838–852. PMID:29107773 https://doi.org/10.1016/j.neuroimage.2017.10.055
49. Barger Z, Frye CG, Liu D, Dan Y, Bouchard KE: Robust, automated sleep scoring by a compact neural network with distributional shift correction. PLoS One 2019; 14: e0224642. PMID:31834897 https://doi.org/10.1371/journal.pone.0224642
50. Urai AE, Doiron B, Leifer AM, Churchland AK: Large-scale neural recordings call for new insights to link brain and behavior. Nat Neurosci 2022; 25: 11–19. PMID:34980926 https://doi.org/10.1038/s41593-021-00980-9
51. Hornik K, Stinchcombe M, White H: Multilayer feedforward networks are universal approximators. Neural Netw 1989; 2: 359–366. https://doi.org/10.1016/0893-6080(89)90020-8
52. Bahari M, Alahi A: Feed-forwards meet recurrent networks in vehicle trajectory prediction. STRC Tech Rep. http://infoscience:epfl:ch/record/268406 (2019).
53. Senzai Y, Scanziani M: A cognitive process occurring during sleep is revealed by rapid eye movements. Science 2022; 377: 999–1004. PMID:36007021 https://doi.org/10.1126/science.abp8852
54. Wallace DJ, Greenberg DS, Sawinski J, Rulla S, Notaro G, Kerr JN: Rats maintain an overhead binocular field at the expense of constant fusion. Nature 2013; 498: 65–69. PMID:23708965 https://doi.org/10.1038/nature12153
55. Maruca S, Rehain P, Sua YM, Zhu S, Huang Y: Non-invasive single photon imaging through strongly scattering media. Opt Express 2021; 29: 9981–9990. PMID:33820159 https://doi.org/10.1364/OE.417299
56. Schulz H: Rethinking sleep analysis. J Clin Sleep Med 2008; 4: 99–103. PMID:18468306 https://doi.org/10.5664/jcsm.27124
57. Schneider M, Hathway P, Leuchs L, Sämann PG, Czisch M, Spoormaker VI: Spontaneous pupil dilations during the resting state are associated with activation of the salience network. Neuroimage 2016; 139: 189–201. PMID:27291493 https://doi.org/10.1016/j.neuroimage.2016.06.011
58. Steriade M, Datta S, Paré D, Oakson G, Curró Dossi RC: Neuronal activities in brain-stem cholinergic nuclei related to tonic activation processes in thalamocortical systems. J Neurosci 1990; 10: 2541–2559. PMID:2388079 https://doi.org/10.1523/JNEUROSCI.10-08-02541.1990
59. Steriade M, Paré D, Bouhassira D, Deschênes M, Oakson G: Phasic activation of lateral geniculate and perigeniculate thalamic neurons during sleep with ponto-geniculo-occipital waves. J Neurosci 1989; 9: 2215–2229. PMID:2746326 https://doi.org/10.1523/JNEUROSCI.09-07-02215.1989
60. Larsen RS, Waters J: Neuromodulatory correlates of pupil dilation. Front Neural Circuits 2018; 12: 21. PMID:29593504 https://doi.org/10.3389/fncir.2018.00021
61. Lacaux C, Andrillon T, Bastoul C, Idir Y, Fonteix-Galet A, Arnulf I, Oudiette D: Sleep onset is a creative sweet spot. Sci Adv 2021; 7: eabj5866. PMID:34878849 https://doi.org/10.1126/sciadv.abj5866
62. Goldinger SD, Papesh MH: Pupil dilation reflects the creation and retrieval of memories. Curr Dir Psychol Sci 2012; 21: 90–95. PMID:29093614 https://doi.org/10.1177/0963721412436811
63. Neckelmann D, Olsen ØE, Fagerland S, Ursin R: The reliability and functional validity of visual and semiautomatic sleep/wake scoring in the Møll-Wistar rat. Sleep 1994; 17: 120–131. PMID:8036366 https://doi.org/10.1093/sleep/17.2.120
64. Kato T, Mitsukura Y, Yoshida K, Mimura M, Takata N, Tanaka KF: Oscillatory population-level activity of dorsal raphe serotonergic neurons is inscribed in sleep structure. J Neurosci 2022; 42: 7244–7255. PMID:35970565 https://doi.org/10.1523/JNEUROSCI.2288-21.2022
65. Brankačk J, Scheffzük C, Kukushka V, Vyssotski AL, Tort AB, Draguhn A: Distinct features of fast oscillations in phasic and tonic rapid eye movement sleep. J Sleep Res 2012; 21: 630–633. PMID:22812730 https://doi.org/10.1111/j.1365-2869.2012.01037.x
66. Meng Q, Tan X, Jiang C, Xiong Y, Yan B, Zhang J: Tracking eye movements during sleep in mice. Front Neurosci 2021; 15: 616760. PMID:33716648 https://doi.org/10.3389/fnins.2021.616760
67. Joshi S, Li Y, Kalwani RM, Gold JI: Relationships between pupil diameter and neuronal activity in the locus coeruleus, colliculi, and cingulate cortex. Neuron 2016; 89: 221–234. PMID:26711118 https://doi.org/10.1016/j.neuron.2015.11.028

Corresponding author

Version information

Register with J-STAGE for free!