For the increasing demands of computation, heterogeneous multicore architecture is believed to be a promising solution to fulfill the edge computational requirement. In FPGAs, the heterogeneous multicore is realized as multiple soft processor cores with custom processing elements. Since FPGA is a resource-constrained device, sharing the hardware resources among the soft processor cores can be advantageous. A few research works have focused on the resource sharing between soft processors, but they do not study how much FPGA logic is minimized for a different pipeline processor. This paper proposes the microarchitecture of four, and five stage pipeline processors that enables the sharing of functional units for execution among the multiple cores as well as sharing the BRAM ports. We then investigate the performance and hardware resource utilization for a four-core processor. We find that sharing different functional units can save the LUT usage to 31.7% and DSP usage to 75%. We analyze the performance impact of sharing from the simulation of the Embench benchmark program. Our simulation results indicate that for some cases the sharing improves the performance and for other configurations worst-case performance drop is 16.7%.
Running IoT applications on edge computing infrastructures has the benefits of low response times and efficient bandwidth usage. System verification on a testbed is required to deploy IoT applications in production environments. In a testbed, Docker containers are preferable for a smooth transition of tested application programs to production environments. In addition, the round-trip times (RTT) of Docker containers to clients must be ensured, according to the target application's response time requirements. However, in existing testbed systems, the RTTs between Docker containers and clients are not ensured. Thus, we must undergo a large amount of configuration data including RTTs between all pairs of wireless base station nodes and servers to set up a testbed environment. In this paper, we present an edge computing testbed system with simple application programming interfaces (API) for testbed users that ensures RTTs between Docker containers and clients. The proposed system automatically determines which servers to place Docker containers on according to virtual regions and the RTTs specified by the testbed users through APIs. The virtual regions provide reduced size information about the RTTs in a network. In the proposed system, the configuration data size is reduced to one divided by the number of the servers and the command arguments length is reduced to approximately one-third or less, whereas the increased system running time is 4.3s.
This study considered an extension of a sparse regularization method with scaling, especially in thresholding methods that are simple and typical examples of sparse modeling. In this study, in the setting of a non-parametric orthogonal regression problem, we developed and analyzed a thresholding method in which soft thresholding estimators are independently expanded by empirical scaling values. The scaling values have a common hyper-parameter that is an order of expansion of an ideal scaling value to achieve hard thresholding. We simply refer to this estimator as a scaled soft thresholding estimator. The scaled soft thresholding method is a bridge method between soft and hard thresholding methods. This new estimator is indeed consistent with an adaptive LASSO estimator in the orthogonal case; i.e., it is thus an another derivation of an adaptive LASSO estimator. It is a general method that includes soft thresholding and non-negative garrote as special cases. We subsequently derived the degree of freedom of the scaled soft thresholding in calculating the Stein's unbiased risk estimate. We found that it is decomposed into the degree of freedom of soft thresholding and the remainder term connecting to the hard thresholding. As the degree of freedom reflects the degree of over-fitting, this implies that the scaled soft thresholding has an another source of over-fitting in addition to the number of un-removed components. The theoretical result was verified by a simple numerical example. In this process, we also focused on the non-monotonicity in the above remainder term of the degree of freedom and found that, in a sparse and large sample setting, it is mainly caused by useless components that are not related to the target function.
Generative Adversarial Networks (GANs) are one of the most successful learning principles of generative models and were wildly applied to many generation tasks. In the beginning, the gradient penalty (GP) was applied to enforce the discriminator in GANs to satisfy Lipschitz continuity in Wasserstein GAN. Although the vanilla version of the gradient penalty was further modified for different purposes, seeking a better equilibrium and higher generation quality in adversarial learning remains challenging. Recently, DRAGAN was proposed to achieve the local linearity in a surrounding data manifold by applying the noised gradient penalty to promote the local convexity in model optimization. However, we show that their approach will impose a burden on satisfying Lipschitz continuity for the discriminator. Such conflict between Lipschitz continuity and local linearity in DRAGAN will result in poor equilibrium, and thus the generation quality is far from ideal. To this end, we propose a novel approach to benefit both local linearity and Lipschitz continuity for reaching a better equilibrium without conflict. In detail, we apply our synchronized activation function in the discriminator to receive a particular form of noised gradient penalty for achieving local linearity without losing the property of Lipschitz continuity in the discriminator. Experimental results show that our method can reach the superior quality of images and outperforms WGAN-GP, DiracGAN, and DRAGAN in terms of Inception Score and Fréchet Inception Distance on real-world datasets.
Self-review is essential to improving presentation, particularly for novice/unskilled researchers. In general, they could record a video of their presentation, and then check it out for self-review. However, they would be quite uncomfortable due to their appearance and voice in the video. They also struggle with in-depth self-review. To address these issues, we designed a presentation avatar that reproduces presentation made by researchers. The presentation avatar intends to increase self-awareness through self-reviewing. We also designed a checklist to aid in a detailed self-review, which includes points to be reviewed. This paper also demonstrates presentation avatar systems that use a virtual character and a robot, to allow novice/unskilled researchers as learners to self-review their own presentation using the checklist. The results of case studies with the systems indicate that the presentation avatar systems have the potential to promote self-review. In particular, we found that robot avatar promoted engagement in self-reviewing presentation.
We consider network security exercises where students construct virtual networks with User-mode Linux (UML) virtual machines and then execute attack and defense activities on these networks. In an older version of the exercise system, the students accessed the desktop screens of the remote servers running UMLs with Windows applications and then built networks by executing UML commands. However, performing the exercises remotely (e.g., due to the COVID-19 pandemic) resulted in difficulties due to factors such as the dependency of the work environment on specific operating systems, narrow-band networks, as well as issues in providing support for configuring UMLs. In this paper, a novel web-based hands-on system with intuitive and seamless operability and lightweight responsiveness is proposed in order to allow performing the considered exercises while avoiding the mentioned shortcomings. The system provides web pages for editing device layouts and cable connections by mouse operations intuitively, web pages connecting to UML terminals, and web pages for operating X clients running on UMLs. We carried out experiments for evaluating the proposed system on the usability, system performance, and quality of experience. The subjects offered positive assessments on the operability and no negative assessments on the responsiveness. As for command inputs in terminals, the response time was shorter and the traffic was much smaller in comparison with the older system. Furthermore, the exercises using nano required at least 16 kbps bandwidth and ones using wireshark required at least 2048 kbps bandwidth.
Laser Doppler Vibrometers (LDVs) enable the acquisition of remote speech signals by measuring small-scale vibrations around a target. They are now widely used in the fields of information acquisition and national security. However, in remote speech detection, the coherent measurement signal is subject to environmental noise, making detecting and reconstructing speech signals challenging. To improve the detection distance and speech quality, this paper proposes a highly accurate real-time speech measurement method that can reconstruct speech from noisy coherent signals. First, the I/Q demodulation and arctangent phase discrimination are used to extract the phase transformation caused by the acoustic vibration from coherent signals. Then, an innovative smoothness criterion and a novel phase difference-based dynamic bilateral compensation phase unwrapping algorithm are used to remove any ambiguity caused by the arctangent phase discrimination in the previous step. This important innovation results in the highly accurate detection of phase jumps. After this, a further innovation is used to enhance the reconstructed speech by applying an improved waveform-based linear prediction coding method, together with adaptive spectral subtraction. This removes any impulsive or background noise. The accuracy and performance of the proposed method were validated by conducting extensive simulations and comparisons with existing techniques. The results show that the proposed algorithm can significantly improve the measurement of speech and the quality of reconstructed speech signals. The viability of the method was further assessed by undertaking a physical experiment, where LDV equipment was used to measure speech at a distance of 310m in an outdoor environment. The intelligibility rate for the reconstructed speech exceeded 95%, confirming the effectiveness and superiority of the method for long-distance laser speech measurement.
In this paper, we propose a new algorithm to generate Speech-like Emotional Sound (SES). Emotional expressions may be the most important factor in human communication, and speech is one of the most useful means of expressing emotions. Although speech generally conveys both emotional and linguistic information, we have undertaken the challenge of generating sounds that convey emotional information alone. We call the generated sounds “speech-like,” because the sounds do not contain any linguistic information. SES can provide another way to generate emotional response in human-computer interaction systems. To generate “speech-like” sound, we propose employing WaveNet as a sound generator conditioned only by emotional IDs. This concept is quite different from the WaveNet Vocoder, which synthesizes speech using spectrum information as an auxiliary feature. The biggest advantage of our approach is that it reduces the amount of emotional speech data necessary for training by focusing on non-linguistic information. The proposed algorithm consists of two steps. In the first step, to generate a variety of spectrum patterns that resemble human speech as closely as possible, WaveNet is trained with auxiliary mel-spectrum parameters and Emotion ID using a large amount of neutral speech. In the second step, to generate emotional expressions, WaveNet is retrained with auxiliary Emotion ID only using a small amount of emotional speech. Experimental results reveal the following: (1) the two-step training is necessary to generate the SES with high quality, and (2) it is important that the training use a large neutral speech database and spectrum information in the first step to improve the emotional expression and naturalness of SES.
Selecting visually overlapping image pairs without any prior information is an essential task of large-scale structure from motion (SfM) pipelines. To address this problem, many state-of-the-art image retrieval systems adopt the idea of bag of visual words (BoVW) for computing image-pair similarity. In this paper, we present a method for improving the image pair selection using BoVW. Our method combines a conventional vector-based approach and a set-based approach. For the set similarity, we introduce a modified version of the Simpson (m-Simpson) coefficient. We show the advantage of this measure over three typical set similarity measures and demonstrate that the combination of vector similarity and the m-Simpson coefficient effectively reduces false positives and increases accuracy. To discuss the choice of vocabulary construction, we prepared both a sampled vocabulary on an evaluation dataset and a basic pre-trained vocabulary on a training dataset. In addition, we tested our method on vocabularies of different sizes. Our experimental results show that the proposed method dramatically improves precision scores especially on the sampled vocabulary and performs better than the state-of-the-art methods that use pre-trained vocabularies. We further introduce a method to determine the k value of top-k relevant searches for each image and show that it obtains higher precision at the same recall.
Predicting the grasping point accurately and quickly is crucial for successful robotic manipulation. However, to commercially deploy a robot, such as a dishwasher robot in a commercial kitchen, we also need to consider the constraints of limited usable resources. We present a deep learning method to predict the grasp position when using a single suction gripper for picking up objects. The proposed method is based on a shallow network to enable lower training costs and efficient inference on limited resources. Costs are further reduced by collecting data in a custom-built synthetic environment. For evaluating the proposed method, we developed a system that models a commercial kitchen for a dishwasher robot to manipulate symmetric objects. We tested our method against a model-fitting method and an algorithm-based method in our developed commercial kitchen environment and found that a shallow network trained with only the synthetic data achieves high accuracy. We also demonstrate the practicality of using a shallow network in sequence with an object detector for ease of training, prediction speed, low computation cost, and easier debugging.
Graph layouts reveal global or local structures of graph data. However, there are few studies on assisting readers in better reconstructing a graph from a layout. This paper attempts to generate a layout whose edges can be reestablished. We reformulate the graph layout problem as an edge classification problem. The inputs are the vertex pairs, and the outputs are the edge existences. The trainable parameters are the laid-out coordinates of the vertices. We propose a binary classification-based graph layout (BCGL) framework in this paper. This layout aims to preserve the local structure of the graph and does not require the total similarity relationships of the vertices. We implement two concrete algorithms under the BCGL framework, evaluate our approach on a wide variety of datasets, and draw comparisons with several other methods. The evaluations verify the ability of the BCGL in local neighborhood preservation and its visual quality with some classic metrics.
Register pushdown system (RPDS) is an extension of pushdown system (PDS) that has registers for dealing with data values. An LTL model checking method for RPDS with regular valuations has been proposed in previous work; however, the method requires the register automata (RA) used for defining a regular valuation to be backward-deterministic. This paper proposes another approach to the same problem, in which the model checking problem for RPDS is reduced to that problem for PDS by constructing a PDS bisimulation equivalent to a given RPDS. This construction is simpler than the previous model checking method and does not require RAs deterministic or backward-deterministic, and the bisimulation equivalence clearly guarantees the correctness of the reduction. On the other hand, the proposed method requires every RPDS (and RA) to have the freshness property, in which whenever the RPDS updates a register with a data value not stored in any register or the stack top, the value should be fresh. This paper also shows that the model checking problem with regular valuations defined by general RA is undecidable, and thus the freshness constraint is essential in the proposed method.
A method to predict lightning by machine learning analysis of atmospheric electric fields is proposed for the first time. In this study, we calculated an anomaly score with long short-term memory (LSTM), a recurrent neural network analysis method, using electric field data recorded every second on the ground. The threshold value of the anomaly score was defined, and a lightning alarm at the observation point was issued or canceled. Using this method, it was confirmed that 88.9% of lightning occurred while alarming. These results suggest that a lightning prediction system with an electric field sensor and machine learning can be developed in the future.
Accurately describing user behaviors with appropriate sensors is always important when developing computing cost-effective systems. This paper employs datasets recorded for fine-grained reading detection using the J!NS MEME, an eye-wear device with electrooculography (EOG), accelerometer, and gyroscope sensors. We generate models for all possible combinations of the three sensors and employ self-supervised learning and supervised learning in order to gain an understanding of optimal sensor settings. The results show that only the EOG sensor performs roughly as well as the best performing combination of other sensors. This gives an insight into selecting the appropriate sensors for fine-grained reading detection, enabling cost-effective computation.
The recurrent neural network (RNN) has been used in audio and speech processing, such as language translation and speech recognition. Although RNN-based architecture can be applied to speech synthesis, the long computing time is still the primary concern. This research proposes a fast gated recurrent neural network, a fast RNN-based architecture, for speech synthesis based on the minimal gated unit (MGU). Our architecture removes the unit state history from some equations in MGU. Our MGU-based architecture is about twice faster, with equally good sound quality than the other MGU-based architectures.
Although end-to-end based speech recognition research for Mandarin-English code-switching has attracted increasing interests, it remains challenging due to data scarcity. Meta-learning approach is popular with low-resource modeling using high-resource data, but it does not make full use of low-resource code-switching data. Therefore we propose a two-fold cross-validation training framework combined with meta-learning approach. Experiments on the SEAME corpus demonstrate the effects of our method.
In this letter, we present an adaptive weighted transfer subspace learning (AWTSL) method for cross-database speech emotion recognition (SER), which can efficiently eliminate the discrepancy between source and target databases. Specifically, on one hand, a subspace projection matrix is first learned to project the cross-database features into a common subspace. At the same time, each target sample can be represented by the source samples by using a sparse reconstruction matrix. On the other hand, we design an adaptive weighted matrix learning strategy, which can improve the reconstruction contribution of important features and eliminate the negative influence of redundant features. Finally, we conduct extensive experiments on four benchmark databases, and the experimental results demonstrate the efficacy of the proposed method.
The altered fingerprints help criminals escape from police and cause great harm to the society. In this letter, an altered fingerprint detection method is proposed. The method is constructed by two deep convolutional neural networks to train the time-domain and frequency-domain features. A spectral attention module is added to connect two networks. After the extraction network, a feature fusion module is then used to exploit relationship of two network features. We make ablation experiments and add the module proposed in some popular architectures. Results show the proposed method can improve the performance of altered fingerprint detection compared with the recent neural networks.
This paper presents an improved YOLOv3 network, named MSFF-YOLOv3, for precisely detecting variable surface defects of aluminum profiles in practice. First, we introduce a larger prediction scale to provide detailed information for small defect detection; second, we design an efficient attention-guided block to extract more features of defects with less overhead; third, we design a bottom-up pyramid and integrate it with the existing feature pyramid network to construct a twin-tower structure to improve the circulation and fusion of features of different layers. In addition, we employ the K-median algorithm for anchor clustering to speed up the network reasoning. Experimental results showed that the mean average precision of the proposed network MSFF-YOLOv3 is higher than all conventional networks for surface defect detection of aluminum profiles. Moreover, the number of frames processed per second for our proposed MSFF-YOLOv3 could meet real-time requirements.