A screening experiment focuses on narrowing down the factors that affect experimental results. To deal with the increased number of experimental runs due to an increase in the number of factors caused by product diversification, a screening experiment was applied to two-level supersaturated design.
When the design size becomes large, the computational complexity becomes enormous, and the analysis becomes impossible. Therefore, we propose a method for reducing the computational burden of the analysis and allowing it to be performed even for large experimental designs.
The Box–Meyer method calculates the posterior probabilities of all models and extracts factors all at once. The proposed method, meanwhile, extracts factors sequentially, adds them to the model, calculates the posterior probability and extracts factors again.
We compare the Box–Meyer method with the proposed method in simulations of seven different design sizes, which can be analyzed by the Box–Meyer method. Furthermore, we evaluate the proposed method in absolute terms in simulations of four different design sizes, that cannot be analyzed using the Box–Meyer method.
The simulation evaluation based on accuracy and running time, indices revealed that the proposed approach is better than the Box–Meyer method for large experimental designs.
The Recognition-Taguchi (RT) method has been proposed for evaluating binary data such as image data. Additionally, the RT-PC method can be applied to continuous-type data when the variables have different units. The RT-PC method uses the principal component analysis (PCA); however, applying PCA to high-dimensional data causes complications in the estimation accuracy of the eigenvalue and eigenvector. Therefore, a method based on the sparse PCA is proposed in this context instead of the conventional PCA. However, previous studies have not investigated the assumption that the anomaly-detection performance of the RT-PC method is based on a typical high-dimensional PCA. In this study, we introduce the noise-reduction and cross-data (CD) matrix methods to the RT-PC method and evaluate their performance using Monte Carlo simulation. The RT-CD method uses the calculation process of the CD matrix method instead of PCA. The simulation analysis results show a better performance in the pattern of inner anomalies when compared to the RT-PC method. The RT-NR method uses the calculation process of the noise-reduction method in the RT-PC method instead of PCA. Finally, the RT-NR and RT-PC methods are observed to exhibit the same anomaly-detection performance.
Collaborative filtering is a recommendation model that evaluates and recommends items that match the preferences of each user and with high purchase probability. However, some items are regularly purchased and do not necessarily require recommendation. Thus, identifying items with a high recommendation effect is one of the challenges when using recommendation models. The recommendation effect can be regarded as an intervention effect in causal inference by considering the recommendation of an item as intervention. The intervention effect of individual users can be estimated using models such as counterfactual regression (CFR). Nonetheless, because this model can only manage one type of intervention, it cannot be easily and directly applied to recommender systems when many recommended items appear as intervention. In this study, we extend this model to estimate the effects of individual interventions in multiple interventions using a single CFR by combining user and item features to form covariates of user–item pairs, thereby allowing the recommendation effects to be estimated for all user–item pairs. We demonstrate the effectiveness of the proposed method through experiments using artificial data.
The extreme events of floods, rainfalls, waves, and wind often cause serious damage. The observed data can be used to detect the sign of abnormal phenomena and can prevent the crucial damage. Anomaly detection techniques, the techniques of distinguishing between normal and abnormal cases from the data, have been one of the most used methods for detecting suspicious events from the collected data. Although many methods of the anomaly detection have appeared in the literature, almost all of them are based on multivariate statistical process control techniques and machine learning techniques. Recently, under the idea that information for suspicious and critical events is more likely to be involved in the largest or smallest values than values around the mean or median, the anomaly detection methods are based on the “Extreme Value Theory” (EVT), which is the statistical theory dealing with the largest or smallest data, attract the variety of areas. However, they can be applied to only univariate data since they have been based on the EVT for univariate data, and hence there has been little work for the multivariate anomaly detection based on the EVT. The multivariate anomaly detection method is important in applications because almost all data for anomaly detection are multivariate and can be misleading by applying univariate methods to multivariate data independently. In this paper, we propose the new anomaly detection method based on the multivariate EVT. The performance of the proposed methods is evaluated by the Monte Carlo simulation. To illustrate all the proposed methods of anomaly detection for multivariate data, we apply them to analyze the new real data of precipitation events in 2021. The numerical results show that the proposed method outperforms the existing methods.
Since discriminative models are usually constructed by learning a set of given training data, it is impossible to guarantee the predictive performance of data that are not generated from the same distribution as the training data. Such data are known as out-of-distribution (OOD) data, while data that follow the same distribution as the training data are referred to as in-distribution data. For practical applications, it is important to detect the OOD data before they are input into highly qualified classifiers. Recently, a likelihood ratio-based method of OOD detection has been proposed. In this method, the likelihood ratio calculated using two generative models with different noise conditions functions as a detection index to evaluate semantic information only, ignoring the background information associated with all classes of data. Here, the generative model used for OOD detection should estimate the true distribution of in-distribution data accurately; all classes are estimated together, using the conventional method. However, in-distribution data may follow a simpler distribution if each class is estimated separately; these simple distribution structures are easier to learn. For this reason, estimation accuracy can be improved by models that estimate the distribution structure of each class independently. In this study, we propose an OOD detection method that uses generative models trained independently for each class. We also conduct evaluation experiments, using image datasets to demonstrate the effectiveness of the proposed method.
This study proposes an analytical model based on the robust variational autoencoder (RVAE) to analyze store characteristics and further formulate item strategies at the store level for future reduction of the loss of unsold items. The data of the display item vector of each store characterizes the sales strategy of the store. However, it is sparse because the number of items displayed in each store accounts for only approximately 10% of the total number of item brands sold in all the stores. To manage this sparsity, a simple autoencoder model is inappropriate because of the large variation in input data, which is as a result of the store characteristics of selling unique products. Therefore, this study employs the RVAE to analyze the store characteristics of the listed items. Here, the latent representation of an RVAE is usually an output from an estimated probability distribution in the middle layer of the trained network. Generally, the similarity of the latent representations of two input data is evaluated using the sample data of the probability distributions. This study proposes a model to calculate the distance between probability distributions without sampling, for valid estimations of the similarities between latent representations. It proposes a method to use the reconstruction error obtained through the RVAE that enables the detection of stores whose trends differ significantly from those of other stores. Additionally, the proposed method can detect groups of stores with similar trends. In this study, the proposed model was applied to an actual dataset, after which its effectiveness was verified.