A distribution-dependent analysis of open-field test movies

Although the open-field test has been widely used, its reliability and compatibility are frequently questioned. Many indicating parameters were introduced for this test; however, they did not take data distributions into consideration. This oversight may have caused the problems mentioned above. Here, an exploratory approach for the analysis of video records of tests of elderly mice was taken that described the distributions using the least number of parameters. The locomotor activity of the animals was separated into two clusters: dash and search. The accelerations found in each of the clusters were distributed normally. The speed and the duration of the clusters exhibited an exponential distribution. Although the exponential model includes a single parameter, an additional parameter that indicated instability of the behaviour was required in many cases for fitting to the data. As this instability parameter exhibited an inverse correlation with speed, the function of the brain that maintained stability would be required for a better performance. According to the distributions, the travel distance, which has been regarded as an important indicator, was not a robust estimator of the animals’ condition.


Introduction
The open-field test has been used to investigate the locomotor activity of animals. The animals' behaviour is summarized using sets of indicating parameters [1][2][3][4][5][6][7]. However, questions repeatedly arise regarding the reliability and compatibility of the open-field test given the variety of test apparatuses and procedures [1,[5][6]. In recent years, video-recording systems have become popular [3][4]8] because they can accurately capture animal behaviour; however, the movies comprise a large volume of complex information. It may be possible to summarize such information systematically using exploratory data analysis to identify the patterns of data distribution, which has been disregarded in the past [9].
The distribution model can describe the characteristics of the data using the least number of parameters. This approach has several advantages; e.g., the parameters can describe the characteristics of the data. The distribution may provide clues about the data's hidden background mechanism, which determines the nature of the data. Because the distribution pattern would be a common character among experiments, it may provide a standard unit that can be shared beyond experiments, as is the case for z-scores. Moreover, the character of the distribution will indicate which parameter is sufficiently robust to overcome the experimental noise. Thus, the distribution model would improve the reliability and compatibility of the analysis.

Materials and methods
To observe how animals change their ability with age, tests were repeated for the long-term. For this purpose, four retired female C57BL/6NJcl mice were obtained from CLEA Japan, Inc. (Tokyo, Japan). They were housed together in a room with controlled temperature (22°C ± 2°C) and a 12 h light/dark cycle. The open-field tests were performed at 60-110 weeks of age in a white polypropylene container (30.9 × 43.9 cm; height, 30 cm) in the daytime without altering the daily light conditions (400 lux), to reduce stress of mice. The tests were performed once or twice a week. The duration time was 30 min. The activity was recorded as a movie using a digital camera (IXY 430F; Canon, Tokyo, Japan). The present study was approved by the Review Board of the Animal Ethics Committee of the Akita Prefectural University (H19-04) and followed institutional guidelines for the care and use of laboratory animals.
The condition of the test animals at a given moment was estimated as follows. The recorded movie was read as a stack of images using the ImageJ program [10]. Each of the images was binarized to determine the shape of the animal; subsequently, the area and position of the centre of mass were determined. The table of areas and positions was transferred and analysed using R software [11]. The detailed protocols and scripts used for the movie analysis are included in the Supplementary Materials. A detailed protocol and R-code are available at protocols.io, dx.doi.org/10.17504/protocols.io.bhc8j2zw and github, https://github.com/TomokazuKonishi/A-distribution-dependent-analysis-of-open-field-test-movies

Movement of the centre of mass in the body shape
The animals engaged in activity constantly without rest during the 30 min of each test, which was in contrast with their ordinary diurnal behaviour; i.e., sleeping. As is well known, the animals rarely entered the central region of the field; rather, they moved in the peripheral space (Fig. S1A). The speed of their movement was not constant; e.g., when an animal went around the field for 10 s (Fig. S1B and Supplementary Materials movie 1), the recorded speed exhibited five sharp peaks (Fig. S1C, solid line). The accelerations and decelerations also exhibited peaks (dotted line), which appeared at half of the peak speed. Between movements, the animals searched around; i.e., they moved slowly and engaged in undirected sniffing, raised their forelimbs when rearing, or investigated the walls (to escape from the container).

Shape of the area of animal movement
Viewed from above, the shape of the areas of the animal movements provided information about their habitual movements. The area records showed a tendency to be smaller toward the end of the test (Fig. S2A), which was an artefact caused by drift of the exposure of the camera. Thus, a corrected data set was used for further analysis (Fig. S2B).
When the animals ran fast, their stretched bodies showed larger areas. During rearing, the areas became smaller. Hence, the area and speed of the animals exhibited a weak correlation (Fig. S1D). The plot exhibited two clusters; i.e., "faster and larger" and "slower and smaller" clusters. Although those clusters should show some overlap, for simplicity, the clusters were separated using a line. The slope and intercept of the separating line were temporarily placed manually by the analysist and were then adjusted during the fitting process of the speed distributions, as described below.
For descriptive purposes, we called the two clusters dash and search, respectively. Both clusters appeared sequentially as a series array in the tracking records (Fig. S1B, S1C).

Noise
The original record of the centre mass area contains noise. Any movement changes the recorded shape of the animal, which alters the estimations of its centre mass, even if the animal stays at the same location. Such alteration produces noise, which can be reduced by taking moving averages of the animal positions for 1s (R script, data supplement). The speed and acceleration data shown in Fig. S1A and S1C underwent such noise reduction; otherwise, it would have been difficult to analyse the raw record (Fig. S2C). The effect of noise reduction was larger at slower speeds ( Fig.  S2D).

Data distribution
If the data distribution obeys certain rules, they would show a pattern that could be described using a limited number of parameters, which represent the characteristics of data. Here, such patterns and parameters were introduced regarding the locomotor activity of the animals (Fig. 1). As the noise reduction would alter the distribution, the patterns were extracted directly from the raw data.

Acceleration
The accelerations were approximated using three normal distributions. The normal distribution has two parameters: location (µ) and scale (σ). In this case, the three normal distributions had their own mixing ratios of the whole. All distributions had µ = 0 (Fig. 1A); therefore, the three accelerations could be practically described using six parameters. The largest σ was found for accelerations of the dash cluster. The quantile-quantile (Q-Q) plot (Fig. 1B) compares the sorted data against the normal distribution; i.e., if the data are normally distributed, the plot will produce a straight line. The minor distribution with the smallest σ might reflect the noise of the raw record. An intermediate σ was found by separating the noise fraction from the search cluster. The normality of the distribution indicated that the accelerations and decelerations had the same magnitude.

Speed
The logarithms of the speed were approximated using a mix of four normal distributions: two major and two minor ones (Fig. 1D). However, a better fitting was achieved using an exponential distribution (Fig. 1C). The fine-tuning process was performed by determining the line that separates the dash and search clusters (Fig. S1D). The other parameters were automatically found in the linear relationships with the speed records of the dash or search clusters in a log scale (Fig. S3A, S3B).
The exponential distribution has only one parameter, λ, which determines the rate of occurrence (Supplementary Materials, see Appendix); its probability density function is monotonically decreasing, and the density was concentrated near zero (Fig. S3C). Such heavy skewing could reduce the accuracy of parameter estimations; however, the skewing is eased by taking the logarithms of the speed (Fig. 1C). Here, the data distribution of log(Speed) was approximated by a linear transformation of the logarithms of two sets of exponential distributions (λ = 1) (Fig. 1C,  1F). Hence, the number of parameters used was three for both the dash and search clusters: mixing rates, slopes a, and intercepts b (Supplementary Materials, see Appendix). For estimations of the parameters a and b, random sampling gave errors that could be approximated by normal distributions (Fig. S3D).
The exponential distribution represents intervals of events that occur randomly with a constant rate, λ. The new parameter a changes the shape of the distribution (Appendix). When a<1, the distribution is concentrated around its mode. When the animal did not keep the rate λ, the distribution exhibited a wider scale and a became >1 (Fig. 1E). In short, parameter a is indicative of the instability of the behaviour.

Area
The body areas found in the dash and search clusters did not show a rigid pattern, suggesting that the estimation of areas may not be accurate and may contain a larger level of noise than that of speed. However, the bell shapes could be approximated by normal distributions (Fig. S3E). The video showed that a value 0.6 times σ lower than µ seemed to be indicative of rearing (Supplementary Materials, movie 2 shows examples of the observed rearing).

Distribution of the duration of the forms of behaviour
After separating the forms of behaviours into dash and search (or rearing and grounding: keeping the four paws on the ground), their duration became clear (Figs 1F, S3F). Their logarithms were approximated by logarithms of the exponential distribution, as was the case for speed (Fig.  S3G-J). The distribution of the model exhibited a slight skew toward a negative value and gave a better fit than those that were obtained using a lognormal distribution without skewing (Figs 1C, 1D, 1F).

Relationships among the parameters
Some parameters showed correlations with each other. As expected, the modes of acceleration and speed of the dash set were correlated ( Fig. 2A). Animals with better stability were faster (Fig.  2B). The instability parameters a showed correlation among the various forms of behaviour ( Fig.  S4A-C).
Because the speed and frequency of the dash cluster showed a weak inverse correlation (Fig.  S4D), the faster animals did not always travel farther (Fig. 2C). One of the tested mice showed a decrease in dash speed; this back and forth between the two phases may be an indication of aging (Fig. 2D). This was accompanied by the increase of the a parameter of speed (Fig. S4E). This phenomenon was not apparent for the travel distance (Fig. S4F).
Alongside, young BALB/c mice at 3m of age were tested, showing the same patterns of data distributions presented here (not shown).

Advantage of video monitoring and parametric analysis
A movie contains a large volume of information that guarantees accurate and objective analyses. Finding the specific distributions and their parameters is beneficial for summarizing the data and extracting the characteristics of the test animals.
One of the benefits of knowing the data distribution becomes obvious during the investigation of rearing. The data obeyed the exponent of the exponential distribution. Data handling without this knowledge is quite difficult, for the following reasons: first, it is required for identifying the parameters. Most of the duration of behaviours is concentrated within very short periods. This hampers observations; thus, compensations for counting omissions or false positives are indispensable, even just for assessing frequency. Here, the compensation was performed by comparing the data quantiles with the exponential distribution (Fig. S3I, J). Second, the identified parameters tell the state of animals. The heavily skewed distribution renders the arithmetic mean useless because it is not robust and does not show the mode. Therefore, measuring the duration repeatedly using a manual approach and taking the mean does not generate useful information. In contrast, the parameters a and b can describe the behaviour of the animal accurately and can be treated parametrically (Fig. S3D). Those parameters will inform on the frequency, length, and constancy of the recorded behaviours. Because of the distribution model, those parameters can be extracted from movie records automatically. In contrast, the naked eye cannot follow the quick movement of mice.
Although separating the dash from search cluster (Fig. S1D) worked well for estimating the parameters, it may have produced a sort of error. As the two clusters are closely located, they must have some overlap; the same problem also exists in the separation of rearing and grounding (Fig.  S3E). Therefore, classifying a moment into the two forms has a basic difficulty. The error will cut several continuous actions into smaller pieces, or overlook shorter fragments. This would be the cause of the departure at the lower part of Q-Q plots (Fig. S3 A, B, G-J). Hence, the slope and intercept parameters should be extracted from the upper part of the plots.

The distribution models
The mode of dash speed obeyed the exponentiation of the exponential distribution (Figs 1C and  S3A). This reflected the distribution of the duration of the dash cluster (Fig. 1F). As the acceleration was normally distributed (Figs 1A and 1B), the speed should also be normally distributed if it was just a sum of randomly selected accelerations. However, the accelerations were coherently coordinated with the speed (Fig. 2B). In many cases, the instability parameter a was nearly or slightly less than 1 (Figs S4A-C) and exhibited an ordinal exponential distribution, which may indicate intervals of randomly occurring events. However, in some cases, compensation by using the instability parameter a was required to approximate the data distribution. The parameter changes the distribution to either concentrating on the mode or instability of λ.
Unlike the case of the exponentiated exponential distribution, which is used to analyze lifetime data of machinery [12], the exponentiation of the exponential distribution may not have been studied so far because these are different distributions. The later has two identical parameters, a and b, both of which relate to λ = 10^(-b/a). As was presented here, the distribution shows the interval time of random events, with an instability of the randomness that is presented by parameter a. It is possible that this also could approximate lifetime data. Moreover, the same pattern would appear in the other locomotive behaviour of animals.
A smaller a value may indicate a better condition of the brain. Although the smaller a observed for the duration in the dash cluster does not mathematically guarantee a faster speed (e.g., a longer dash duration could produce a faster speed), animals with more stable behaviour were faster in reality (Fig. 2B). Moreover, faster animals showed larger accelerations, which require the coincidental activation of a larger number of muscle cells (Fig. 2A). As the activations are under the control of the brain, these observations imply that stricter control by the brain is required for achieving a faster speed. In the case of a smaller a, the distribution would be concentrated around their modes. Therefore, the peak speed and duration values in a test become constant. As the faster speed guarantees a shorter arrival time, the duration and frequency of dash were decreased (Fig.  S4A, B, D), which favours safer movement. Hence, the constant pattern of behaviour does not imply a stereotypical behaviour; rather, repeating a suitable movement within a box would generate such a steady pattern. In contrast, the loss of this ability increased the instability parameter a; eventually, the occurrence rate of behaviours, λ, would become unstable (Fig. 1E). As the a values found in different forms were correlated (Fig. S4A-C), the same or related parts of the brain seem to maintain the stability of behaviours (Fig. 1E). In short, smaller a shows constancy of behavior, and it correlates with greater acceleration and dash speed, and those would represent the better condition of the brain.
The exponential distribution and the lognormal distribution are similar; thus, either of them can be used to approximate the distributions of speed or duration. However, the former reproduced the mode with better fidelity (Figs 1C, D, F, and S3F); this characteristic will improve the accuracy of parameter estimations that use fitting processes. Moreover, the distribution model may become a hint of the hidden structure of the data [9,13], which generates it, such as that shown for the mechanism of transcriptome production [14]. Here, it was difficult to explain the lognormal distributions, but a tentative explanation of the exponential distribution is available.
As the sampling noise was practically distributed normally (Fig. S3D), the parameters have potential compatibility with parametric models, such as ANOVA; this is beneficial for statistical tests. However, it should be noted that we must pay attention to the individual differences among animals; these would be especially apparent for older animals. In such cases, it is not adequate to expect a normal distribution of the experimental noise; rather, an experimental design that considers this issue will be required, such as pairwise comparisons.

The Parameters
The distributions and the parameters can be easily extracted from video records. A parametric analysis performed by checking the distributions will improve the accuracy of the analysis. According to this point of view, the traditional methods of estimation of moving distance, such as those that use grid beam breaking, may have a high level of noise, and the distance itself is not a robust indicator of the animal's state. Such inappropriate choices of indicating parameters may have been at the origin of the concerns raised regarding the open-field tests [1,[5][6].
The modes of the steady distributions, such as those observed for speed, should be robust indicators of the animals' locomotor activities (Fig. 2D). Conversely, it has become clear that the distance of movement is not a robust indicator, as the distance is the sum total of the product of speed and duration, the distributions of which are heavily skewed (Fig. 1C, D). In a mathematical sense, it is certain that the product is unstable and can be easily altered by any tiny effects that occur by chance. In turn, the distance did not indicate the vitality of the animal in a linear way. In fact, the fastest animal did not always travel farther (Fig. 2C) and the effect of aging was not observed in the case of distance (Fig. S4F).
The contraction of muscles may explain the distribution pattern of accelerations. According to Newton's equation of motion, acceleration is directly proportional to the force applied. The strength of the force given by the muscle of the test animal is proportional to the number of muscle cells that are activated at that moment. At a specific moment, the selection of activated muscle cells, which were stimulated by the excited nerves, would exhibit a certain randomness. This is just a hypothesis, however, if this is true the number of activated cells would be normally distributed, as observed here.
It should be noted that accelerations and decelerations showed the same magnitudes (Figs 1A  and 1B). This finding shows that, although they depend on different sets of muscles, the same amount of force was consumed. The deceleration was not the follow-through of a running event, but rather, it was active speed braking. This clearly shows that activity is controlled to satisfy the animals' purposes.
An open-field test has been frequently used to measure not only locomotive activity, but anxiety behaviour. For this purpose, immobile time and stay time in the centre partition were used for the parameter. Because this study was designed to detect the effects of aging, the animals were repeatedly tested, and this could have caused habituation to the test by reducing the anxiety of the animals through acclimatisation. Actually, the stay time in the centre partition did not change by age. However, it is clear that immobile times are included in the distribution of rearing time (Fig.  S3I); i.e., the measured immobile times are extreme values of the distribution. As these values are not robust estimators and can be easily altered by noise, using the parameter is not recommended.

Conclusion
The data distributional on animal behaviour was reported. The exponentiation of the exponential distribution was frequently found; this is an extension of the exponential distribution, where the frequency is not stable. Since the application of the model is rather easy, we can use it for a parametric analysis to achieve a more accurate analysis. In contrast, several traditionally used parameters such as immobile time or distance are not an appropriate indicator since they would be weak to experimental noise.