In this paper, a Happiness Cups (H-cups) system is proposed to bi-directionally convey the holding-cup motions of paired cups between two remote users. To achieve this goal, the H-cups system uses three important components. Firstly, paired cups are embedded with accelerometers and gyro sensors to transmit the three-dimensional acceleration and angular signals to a motion recognizer application. This application is designed on an Android phone. The sensors then receive the remotely recognized motions and flash a specific color on the local cup's RGB-LED via Bluetooth. Secondly, the application considers holding-cup motion recognition from the cup based on long short-term memory (LSTM) and sends the local recognized motion through an intermediate server to transmit to the remote paired cup via the internet. Finally, an intermediate server is established and used to exchange and forward the recognized holding-cup motions between two paired cups, in which five holding-cup motions, including drinking, horizontal shaking, vertical shaking, swaying and raising toasts are proposed and recognized by LSTM. The experimental results indicate that the recognition accuracy of the holding-cup motion can reach 97.3% when using our method.
The proliferation of Massive Open Online Courses has made it a challenge for the user to select a proper course. We assume a situation in which the user has targeted on the knowledge defined by some knowledge categories. Then, knowing how much of the knowledge in the category is covered by the courses will be helpful in the course selection. In this study, we define a concept of knowledge category coverage and aim to estimate it in a semi-automatic manner. We first model the knowledge category and the course as a set of concepts, and then utilize a taxonomy and the idea of centrality to differentiate the importance of concepts. Finally, we obtain the coverage value by calculating how much of the concepts required in a knowledge category is also taught in a course. Compared with treating the concepts uniformly important, we found that our proposed method can effectively generate closer coverage values to the ground truth assigned by domain experts.
Peer assessments, in which people review the works of peers and have their own works reviewed by peers, are useful for assessing homework. In conventional peer assessment systems, works are usually allocated to people before the assessment begins; therefore, if people drop out (abandoning reviews) during an assessment period, an imbalance occurs between the number of works a person reviews and that of peers who have reviewed the work. When the total imbalance increases, some people who diligently complete reviews may suffer from a lack of reviews and be discouraged to participate in future peer assessments. Therefore, in this study, we adopt a new adaptive allocation approach in which people are allocated review works only when requested and propose an algorithm for allocating works to people, which reduces the total imbalance. To show the effectiveness of the proposed algorithm, we provide an upper bound of the total imbalance that the proposed algorithm yields. In addition, we extend the above algorithm to consider reviewing ability. The extended algorithm avoids the problem that only unskilled (or skilled) reviewers are allocated to a given work. We show the effectiveness of the proposed two algorithms compared to the existing algorithms through experiments using simulation data.
In recent years, cognition and use of manga pervade, and people who use manga for various purposes such as entertainment, study, marketing are increasing more and more. However, when people who do not specialize in it create it for these purposes, they can write plots expressing what they want to convey but the technique of the composition which arranges elements in manga such as characters or balloons corresponding to the plot create obstacles to using its merits for comprehensibility based on high flexibility of its expression. Therefore, we consider that support of this composition technique is necessary for amateurs to use manga while taking advantage of its benefits. We propose a method of generating composition proposal to support manga creation by amateurs. For the method, we also define new manga metadata model which summarize and extend metadata models by earlier studies. It represents the compostion and the plot in manga. We apply a neural machine translation mechanism for learing the relation between the composition and the plot. It considers that the plot annotation is the source of the composition annotation that is the target, and learns from the annotation dataset based on the metadata model. We conducted experiments to evaluate how the composition proposal generated by our method helps amateur manga creation, and demonstrated that it is useful.
Knowledge graph embedding aims to embed entities and relations of multi-relational data in low dimensional vector spaces. Knowledge graphs are useful for numerous artificial intelligence (AI) applications. However, they (KGs) are far from completeness and hence KG embedding models have quickly gained massive attention. Nevertheless, the state-of-the-art KG embedding models ignore the category specific projection of entities and the impact of entity types in relational aspect. For example, the entity “Washington” could belong to the person or location category depending on its appearance in a specific relation. In a KG, an entity usually holds many type properties. It leads us to a very interesting question: are all the type properties of an entity are meaningful for a specific relation? In this paper, we propose a KG embedding model TPRC that leverages entity-type properties in the relational context. To show the effectiveness of our model, we apply our idea to the TransE, TransR and TransD. Our approach outperforms other state-of-the-art approaches as TransE, TransD, DistMult and ComplEx. Another, important observation is: introducing entity type properties in the relational context can improve the performances of the original translation distance based models.
For amateur creators, it has been becoming popular to create new content based on existing original work: such new content is called derivative work. We know that derivative creation is popular, but why are individual derivative works created? Although there are several factors that inspire the creation of derivative works, such factors cannot usually be observed on the Web. In this paper, we propose a model for inferring latent factors from sequences of derivative work posting events. We assume a sequence to be a stochastic process incorporating the following three factors: (1) the original work's attractiveness, (2) the original work's popularity, and (3) the derivative work's popularity. To characterize content popularity, we use content ranking data and incorporate rank-biased popularity based on the creators' browsing behaviors. Our main contributions are three-fold. First, to the best of our knowledge, this is the first study modeling derivative creation activity. Second, by using real-world datasets of music-related derivative work creation, we conducted quantitative experiments and showed the effectiveness of adopting all three factors to model derivative creation activity and considering creators' browsing behaviors in terms of the negative logarithm of the likelihood for test data. Third, we carried out qualitative experiments and showed that our model is useful in analyzing following aspects: (1) derivative creation activity in terms of category characteristics, (2) temporal development of factors that trigger derivative work posting events, (3) creator characteristics, (4) N-th order derivative creation process, and (5) original work ranking.
As smartphones and IoT devices become widespread, probabilistic event streams, which are continuous analysis results of sensing data, have received a lot of attention. One of the applications of probabilistic event streams is monitoring of time series events based on regular expressions. That is, we describe a monitoring query such as “Has the tracked object moved from RoomA to RoomB in the past 30 minutes?” by using a regular expression, and then check whether corresponding events occur in a probabilistic event stream with a sliding window. Although we proposed the fundamental monitoring method of time series events in our previous work, three problems remain: 1) it is based on an unusual assumption about slide size of a sliding window, 2) the grammar of pattern queries did not include “negation”, and 3) it was not optimized for multiple monitoring queries. In this paper, we propose several techniques to solve the above problems. First, we remove the assumption about slide size, and propose adaptive slicing of sliding windows for efficient probability calculation. Second, we calculate the occurrence probability of a negation pattern by using an inverted DFA. Finally, we propose the merge of multiple DFAs based on disjunction to process multiple queries efficiently. Experimental results using real and synthetic datasets demonstrate effectiveness of our approach.
We present a retrieval method for 3D CAD assemblies consisted of multiple components. The proposed method distinguishes not only shapes of 3D CAD assemblies but also layouts of their components. Similarity between two assemblies is computed from feature quantities of the components constituting the assemblies. In order to make the similarity robust to translation and rotation of an assembly in 3D space, we use the 3D Radon transform and the spherical harmonic transform. We show that this method has better retrieval precision and efficiency than targets for comparison by experimental evaluation.
The goal of cross-lingual entity alignment is to match entities from knowledge graph of different languages that represent the same object in the real world. Knowledge graphs of different languages can share the same ontology which we guess may be useful for entity alignment. To verify this idea, we propose a novel embedding model based on TransC. This model first adopts TransC and parameter sharing model to map all the entities and relations in knowledge graphs to a shared low-dimensional semantic space based on a set of aligned entities. Then, the model iteratively uses reinitialization and soft alignment strategy to perform entity alignment. The experimental results show that, compared with the benchmark algorithms, the proposed model can effectively fuse ontology information and achieve relatively better results.
A novel compress-and-forward (CF) system based on multi-relay network is proposed. In this system, two networks are linked, wherein one is a sensor network connecting the analog source and the relays, and the other is a communication network between the relays and the destination. At several parallel relay nodes, the analog signals are transformed into digital signals after quantization and encoding and then the digital signals are transmitted to the destination. Based on the Chief Executive Officer (CEO) theory, we calculate the minimum transmission rate of every source-relay link and we propose a system model by combining sensor network with communication network according to Shannon channel capacity theory. Furthermore, we obtain the best possible system performance under system power constraint, which is measured by signal-to-noise ratio (SNR) rather than bit error rate (BER). Numerical simulation results show that the proposed CF outperforms the traditional amplify-and-forward (AF) system in the performance versus SNR.
Side-channel Attack, such as simple power analysis and differential power analysis (DPA), is an efficient method to gather the key, which challenges the security of crypto chips. Side-channel Attack logs the power trace of the crypto chip and speculates the key by statistical analysis. To reduce the threat of power analysis attack, an innovative method based on random execution and register randomization is proposed in this paper. In order to enhance ability against DPA, the method disorders the correspondence between power trace and operands by scrambling the data execution sequence randomly and dynamically and randomize the data operation path to randomize the registers that store intermediate data. Experiments and verification are done on the Sakura-G FPGA platform. The results show that the key is not revealed after even 2 million power traces by adopting the proposed method and only 7.23% slices overhead and 3.4% throughput rate cost is introduced. Compared to unprotected chip, it increases more than 4000× measure to disclosure.
Linear feed-forward/feedback shift registers are used as an effective tool of testing circuits in various fields including built-in self-test and secure scan design. In this paper, we consider the issue of testing linear feed-forward/feedback shift registers themselves. To test linear feed-forward/feedback shift registers, it is necessary to generate a test sequence for each register. We first present an experimental result such that a commercial ATPG (automatic test pattern generator) cannot always generate a test sequence with high fault coverage even for 64-stage linear feed-forward/feedback shift registers. We then show that there exists a universal test sequence with 100% of fault coverage for the class of linear feed-forward/feedback shift registers so that no test generation is required, i.e., the cost of test generation is zero. We prove the existence theorem of universal test sequences for the class of linear feed-forward/feedback shift registers.
In this paper, we present the effectiveness of image compression based on a convolutional auto encoder (CAE) with region of interest (ROI) for quality control. We propose a method that adapts image quality for prioritized parts and non-prioritized parts for CAE-based compression. The proposed method uses annotation information for the distortion weights of the MS-SSIM-based loss function. We show experimental results using a road damage image dataset that is used to check damaged parts and an image dataset with segmentation data (ADE20K). The experimental results reveals that the proposed weighted loss function with CAE-based compression from F. Mentzer et al. learns some characteristics and preferred bit allocations of the prioritized parts by end-to-end training. In the case of using road damage image dataset, our method reduces bpp by 31% compared to the original method while meeting quality requirements that an average weighted MS-SSIM for the road damaged parts be larger than 0.97 and an average weighted MS-SSIM for the other parts be larger than 0.95.
Recent studies suggest that learning “how to learn” is important because learners must be self-regulated to take more responsibility for their own learning processes, meta-cognitive control, and other generative learning thoughts and behaviors. The mechanism that enables a learner to self-regulate his/her learning strategies has been actively studied in classroom settings, but has seldom been studied in the area of real-world learning in out-of-school settings (e.g., environmental learning in nature). A feature of real-world learning is that a learner's cognition of the world is updated by his/her behavior to investigate the world, and vice versa. This paper models the mechanism of real-world learning for executing and self-regulating a learner's cognitive and behavioral strategies to self-organize his/her internal knowledge space. Furthermore, this paper proposes multimodal analytics to integrate heterogeneous data resources of the cognitive and behavioral features of real-world learning, to structure and archive the time series of strategies occurring through learner-environment interactions, and to assess how learning should be self-regulated for better understanding of the world. Our analysis showed that (1) intellectual achievements are built by self-regulating learning to chain the execution of cognitive and behavioral strategies, and (2) a clue to predict learning outcomes in the world is analyzing the quantity and frequency of strategies that a learner uses and self-regulates. Assessment based on these findings can encourage a learner to reflect and improve his/her way of learning in the world.
This manuscript discusses a new indoor positioning method and proposes a multi-distance function trilateration over k-NN fingerprinting method using radio signals. Generally, the strength of radio signals, referred to received signal strength indicator or RSSI, decreases as they travel in space. Our method employs a list of fingerprints comprised of RSSIs to absorb interference between radio signals, which happens around the transmitters and it also employs multiple distance functions for conversion from distance between fingerprints to the physical distance in order to absorb the interference that happens around the receiver then it performs trilateration between the top three closest fingerprints to locate the receiver's current position. An experiment in positioning performance is conducted in our laboratory and the result shows that our method is viable for a position-level indoor positioning method and it could improve positioning performance by 12.7% of positioning error to 0.406 in meter in comparison with traditional methods.
In this paper, we design and develop a sensor-embedded office chair that can measure the posture of the office worker continuously without disturbing their job. In our system, eight accelerometers, that are attached at the back side of the fabric surface of the chair, are used for recognizing the posture. We propose three sitting posture recognition algorithms by considering the initial position of the chair and the difference of physique. Through the experiment with 28 participants, we confirm that our proposed chair can recognize the sitting posture by 75.4% (algorithm 1), 83.7% (algorithm 2), and 85.6% (algorithm 3) respectively.
Massive open online course (MOOC) is an online course aimed at unlimited participation and open access via the web. Although there are many MOOC providers, they typically focus on the online course providing and typically do not link with traditional education and business sector requirements. This paper presents a MOOC service framework that focuses on adopting MOOC to provide additional services to support students in traditional education and to provide credit bank consisting of student academic credentials for business sector demand. Particularly, it extends typical MOOC to support academic/ credential record and transcript issuance. The MOOC service framework consists of five layers: authentication, resources, learning, assessment and credential layers. We discuss the adoption of the framework in Thai MOOC, the national MOOC system for Thai universities. Several main issues related to the framework adoption are discussed, including the service strategy and model as well as infrastructure design for large-scale MOOC service.
When people learn a handicraft with instructional contents such as books, videos, and web pages, many of them often give up halfway because the contents do not always assure how to make it. This study aims to provide origami learners, especially beginners, with feedbacks on their folding operations. An approach for recognizing the state of the learner by using a single top-view camera, and pointing out the mistakes made during the origami folding operation is proposed. First, an instruction model that stores easy-to-follow folding operations is defined. Second, a method for recognizing the state of the learner's origami paper sheet is proposed. Third, a method for detecting mistakes made by the learner by means of anomaly detection using a one-class support vector machine (one-class SVM) classifier (using the folding progress and the difference between the learner's origami shape and the correct shape) is proposed. Because noises exist in the camera images due to shadows and occlusions caused by the learner's hands, the shapes of the origami sheet are not always extracted accurately. To train the one-class SVM classifier with high accuracy, a data cleansing method that automatically sifts out video frames with noises is proposed. Moreover, using the statistics of features extracted from the frames in a sliding window makes it possible to reduce the influence by the noises. The proposed method was experimentally demonstrated to be sufficiently accurate and robust against noises, and its false alarm rate (false positive rate) can be reduced to zero. Requiring only a single camera and common origami paper, the proposed method makes it possible to monitor mistakes made by origami learners and support their self-learning.
In this article, we propose a method called “continuous noise masking (cNM)” that allows eliminating residual buzziness in a continuous vocoder, i.e. of which all parameters are continuous and offers a simple and flexible speech analysis and synthesis system. Traditional parametric vocoders generally show a perceptible deterioration in the quality of the synthesized speech due to different processing algorithms. Furthermore, an inaccurate noise resynthesis (e.g. in breathiness or hoarseness) is also considered to be one of the main underlying causes of performance degradation, leading to noisy transients and temporal discontinuity in the synthesized speech. To overcome these issues, a new cNM is developed based on the phase distortion deviation in order to reduce the perceptual effect of the residual noise, allowing a proper reconstruction of noise characteristics, and model better the creaky voice segments that may happen in natural speech. To this end, the cNM is designed to keep only voice components under a condition of the cNM threshold while discarding others. We evaluate the proposed approach and compare with state-of-the-art vocoders using objective and subjective listening tests. Experimental results show that the proposed method can reduce the effect of residual noise and can reach the quality of other sophisticated approaches like STRAIGHT and log domain pulse model (PML).
Lombard speech is produced in noisy environments due to the Lombard effect and is intelligible in adverse environments. To adaptively control the intelligibility of transmitted speech for public announcement systems, in this study, we focus on perceptually mimicking Lombard speech under backgrounds with varying noise levels. Other approaches map corresponding neutral speech features to Lombard speech features, but as this can only be applied to one noise level at a time, it is unsuitable for varying noise levels because the characteristics of Lombard speech are varied according to noise level. Instead, we utilize a rule-based method that automatically generates rules and flexibly controls features with any change of noise level. Specifically, we conduct a feature tendency analysis and propose a continuous rule generation model to estimate the effect of varying noise levels on features. The proposed techniques, which are based on a coarticulation model, MRTD, and spectral-GMM, can easily modify neutral speech features by following the generated rules. Voices having these features are then synthesized by STRAIGHT to obtain Lombard speech fitting to noises with varying levels. To validate our proposed method, the quality of mimicking speech is evaluated in subjective listening experiments on similarity, intelligibility, and naturalness. In varying noise levels, the results show equal similarity with Lombard speech between the proposed method and a state-of-the-art method. Intelligibility and naturalness are comparable with some feature modifications.
Air quality index (AQI) is a non-dimensional index for the description of air quality, and is widely used in air quality management schemes. A novel method for Air Quality Index Forecasting based on Deep Dictionary Learning (AQIF-DDL) and machine vision is proposed in this paper. A sky image is used as the input of the method, and the output is the forecasted AQI value. The deep dictionary learning is employed to automatically extract the sky image features and achieve the AQI forecasting. The idea of learning deeper dictionary levels stemmed from the deep learning is also included to increase the forecasting accuracy and stability. The proposed AQIF-DDL is compared with other deep learning based methods, such as deep belief network, stacked autoencoder and convolutional neural network. The experimental results indicate that the proposed method leads to good performance on AQI forecasting.
Figure-ground (FG) segregation has been considered as a fundamental step towards object recognition. We explored plausible mechanisms that estimate global figure-ground segregation from local image features by investigating the human visual system. Physiological studies have reported border-ownership (BO) selective neurons in V2 which signal the local direction of figure (DOF) along a border; however, how local BO signals contribute to global FG segregation has not been clarified. The BO and FG processing could be independent, dependent on each other, or inseparable. The investigation on the differences and similarities between the BO and FG judgements is important for exploring plausible mechanisms that enable global FG estimation from local clues. We performed psychophysical experiments that included two different tasks each of which focused on the judgement of either BO or FG. The perceptual judgments showed consistency between the BO and FG determination while a longer distance in gaze movement was observed in FG segregation than BO discrimination. These results suggest the involvement of distinct neural mechanism for local BO determination and global FG segregation.
This paper presents a method for reducing the redundancy in both fully connected layers and convolutional layers of trained neural network models. The proposed method consists of two steps, 1) Neuro-Coding: to encode the behavior of each neuron by a vector composed of its outputs corresponding to actual inputs and 2) Neuro-Unification: to unify the neurons having the similar behavioral vectors. Instead of just pruning one of the similar neurons, the proposed method let the remaining neuron emulate the behavior of the pruned one. Therefore, the proposed method can reduce the number of neurons with small sacrifice of accuracy without retraining. Our method can be applied for compressing convolutional layers as well. In the convolutional layers, the behavior of each channel is encoded by its output feature maps, and channels whose behaviors can be well emulated by other channels are pruned and update the remaining weights. Through several experiments, we comfirmed that the proposed method performs better than the existing methods.
Biometric template protection techniques have been proposed to address security and privacy issues inherent to biometric-based authentication systems. However, it has been shown that the robustness of most of such techniques against reversibility and linkability attacks are overestimated. Thus, a thorough security analysis of recently proposed template protection schemes has to be carried out. Negative iris recognition is an interesting iris template protection scheme based on the concept of negative databases. In this paper, we present a comprehensive security analysis of this scheme in order to validate its practical usefulness. Although the authors of negative iris recognition claim that their scheme possesses both irreversibility and unlinkability, we demonstrate that more than 75% of the original iris-code bits can be recovered using a single protected template. Moreover, we show that the negative iris recognition scheme is vulnerable to attacks via record multiplicity where an adversary can combine several transformed templates to recover more proportion of the original iris-code. Finally, we demonstrate that the scheme does not possess unlinkability. The experimental results, on the CASIA-IrisV3 Interval public database, support our theory and confirm that the negative iris recognition scheme is susceptible to reversibility, linkability, and record multiplicity attacks.
This paper presents an automated patient-specific ECG classification algorithm, which integrates long short-term memory (LSTM) and convolutional neural networks (CNN). While LSTM extracts the temporal features, such as the heart rate variance (HRV) and beat-to-beat correlation from sequential heartbeats, CNN captures detailed morphological characteristics of the current heartbeat. To further improve the classification performance, adaptive segmentation and re-sampling are applied to align the heartbeats of different patients with various heart rates. In addition, a novel clustering method is proposed to identify the most representative patterns from the common training data. Evaluated on the MIT-BIH arrhythmia database, our algorithm shows the superior accuracy for both ventricular ectopic beats (VEB) and supraventricular ectopic beats (SVEB) recognition. In particular, the sensitivity and positive predictive rate for SVEB increase by more than 8.2% and 8.8%, respectively, compared with the prior works. Since our patient-specific classification does not require manual feature extraction, it is potentially applicable to embedded devices for automatic and accurate arrhythmia monitoring.
The aim of this paper is to show an upper bound for finding defective samples in a group testing framework. To this end, we exploit minimization of Hamming weights in coding theory and define probability of error for our decoding scheme. We derive a new upper bound on the probability of error. We show that both upper and lower bounds coincide with each other at an optimal density ratio of a group matrix. We conclude that as defective rate increases, a group matrix should be sparser to find defective samples with only a small number of tests.
In this prompt report, we present the basic performance evaluation of Intel Optane Data Center Persistent Memory Module (Optane DCPMM), which is the first commercially-available, byte-addressable non-volatile memory modules released in April 2019. Since at the moment of writing only a few reports on its performance were published, this letter is intended to complement other performance studies. Through experiments using our own measurement tools, we obtained that the latency of random read-only access was approximately 374 ns. That of random writeback-involving access was 391 ns. The bandwidths of read-only and writeback-involving access for interleaved memory modules were approximately 38 GB/s and 3 GB/s, respectively.
Embedded SQL inserts SQL statements into the host programming language and executes them at program run time. SQL injection is a known attack technique; however, detection techniques are not introduced in embedded SQL. This paper introduces a technique based on candidate code generation that can detect SQL injection vulnerability in the C/C++ host programming language.
Software defect prediction (SDP) plays a vital role in allocating testing resources reasonably and ensuring software quality. When there are not enough labeled historical modules, considerable semi-supervised SDP methods have been proposed, and these methods utilize limited labeled modules and abundant unlabeled modules simultaneously. Nevertheless, most of them make use of traditional features rather than the powerful deep feature representations. Besides, the cost of the misclassification of the defective modules is higher than that of defect-free ones, and the number of the defective modules for training is small. Taking the above issues into account, we propose a cost-sensitive and sparse ladder network (CSLN) for SDP. We firstly introduce the semi-supervised ladder network to extract the deep feature representations. Besides, we introduce the cost-sensitive learning to set different misclassification costs for defective-prone and defect-free-prone instances to alleviate the class imbalance problem. A sparse constraint is added on the hidden nodes in ladder network when the number of hidden nodes is large, which enables the model to find robust structures of the data. Extensive experiments on the AEEEM dataset show that the CSLN outperforms several state-of-the-art semi-supervised SDP methods.
This letter investigates the secure transmission improvement scheme for indoor visible light communications (VLC) by using the protected zone. Firstly, the system model is established. For the input signal, the non-negativity and the dimmable average optical intensity constraint are considered. Based on the system model, the secrecy capacity for VLC without considering the protected zone is obtained. After that, the protected zone is determined, and the construction of the protected zone is also provided. Finally, the secrecy capacity for VLC with the protected zone is derived. Numerical results show that the secure performance of VLC improves dramatically by employing the protected zone.
This letter proposes a gradient-enhanced softmax supervisor for face recognition (FR) based on a deep convolutional neural network (DCNN). The proposed supervisor conducts the constant-normalized cosine to obtain the score for each class using a combination of the intra-class score and the soft maximum of the inter-class scores as the objective function. This mitigates the vanishing gradient problem in the conventional softmax classifier. The experiments on the public Labeled Faces in the Wild (LFW) database denote that the proposed supervisor achieves better results when compared with those achieved using the current state-of-the-art softmax-based approaches for FR.
The increase in computation cost and storage of convolutional neural networks (CNNs) severely hinders their applications on limited-resources devices in recent years. As a result, there is impending necessity to accelerate the networks by certain methods. In this paper, we propose a loss-driven method to prune redundant channels of CNNs. It identifies unimportant channels by using Taylor expansion technique regarding to scaling and shifting factors, and prunes those channels by fixed percentile threshold. By doing so, we obtain a compact network with less parameters and FLOPs consumption. In experimental section, we evaluate the proposed method in CIFAR datasets with several popular networks, including VGG-19, DenseNet-40 and ResNet-164, and experimental results demonstrate the proposed method is able to prune over 70% channels and parameters with no performance loss. Moreover, iterative pruning could be used to obtain more compact network.
Performance in Automatic Speech Recognition (ASR) degrades dramatically in noisy environments. To alleviate this problem, a variety of deep networks based on convolutional neural networks and recurrent neural networks were proposed by applying L1 or L2 loss. In this Letter, we propose a new orthogonal gradient penalty (OGP) method for Wasserstein Generative Adversarial Networks (WGAN) applied to denoising and despeeching models. WGAN integrates a multi-task autoencoder which estimates not only speech features but also noise features from noisy speech. While achieving 14.1% improvement in Wasserstein distance convergence rate, the proposed OGP enhanced features are tested in ASR and achieve 9.7%, 8.6%, 6.2%, and 4.8% WER improvements over DDAE, MTAE, R-CED(CNN) and RNN models.
To improve the likability of speech, we propose a voice conversion algorithm by controlling the fundamental frequency (F0) and the spectral envelope and carry out a subjective evaluation. The subjects can manipulate these two speech parameters. From the result, the subjects preferred speech with a parameter related to higher brightness.
We measured eye movements at gaze points while subjects performed calculation tasks and examined the relationship between the eye movements and fatigue and/or internal state of a subject by tasks. It was suggested that fatigue and/or internal state of a subject affected eye movements at gaze points and that we could measure them using eye movements at gaze points in real time.