Transistor count continues to increase for silicon devices following Moore's Law. But the failure of Dennard scaling has brought the computing community to a crossroad where power has become the major limiting factor. Thus future chips can have many cores; but only a fraction of them can be switched on at any point in time. This dark silicon era, where significant fraction of the chip real estate remains dark, has necessitated a fundamental rethinking in architectural designs. In this context, heterogeneous multi-core architectures combining functionality and performance-wise divergent mix of processing cores (CPU, GPU, special-purpose accelerators, and reconfigurable computing) offer a promising option. Heterogeneous multi-cores can potentially provide energy-efficient computation as only the cores most suitable for the current computation need to be switched on. This article presents an overview of the state-of-the-art in heterogeneous multi-core landscape.
In recent printed circuit board (PCB) design, due to the high density of integration, the signal propagation delay or skew has become an important factor for a circuit performance. As the routing delay is proportional to the wire length, the controllability of the wire length is usually focused on. In this research, a heuristic algorithm to get equal-length routing for disordered pins in PCB design is proposed. The approach initially checks the longest common subsequence of source and target pin sets to assign layers for pins. Single commodity flow is then carried out to generate the base routes. Finally, considering target length requirement and available routing region, R-flip and C-flip are adopted to adjust the wire length. The experimental results show that the proposed method is able to obtain the routes with better wire length balance and smaller worst length error in reasonable CPU times.
With rapid progress in semiconductor technology, Network-on-Chip (NoC) becomes an attractive solution for future systems on chip (SoC). The network performance depends critically on the performance of packets routing. The delay of router and packets contention can significantly affect network latency and throughput. As the network becomes more congested, packets will be blocked more frequently. It would result in degrading the network performance. In this article, we propose an innovative dual-switch allocation (DSA) design. By introducing DSA design, we can make utmost use of idle output ports to reduce packets contention delay, meanwhile, without increasing router delay. Experimental results show that our design significantly achieves the performance improvement in terms of throughput and latency at the cost of very little power and area overhead.
This paper introduces an automatic synthesis technique and tool to implement inter-heterogeneous-processor communication for programmable system-on-chips (PSoCs). PSoCs have an ARM-based hard processor system connected to an FPGA fabric. By implementing the soft processors in the FPGA fabric, PSoCs realize heterogeneous multiprocessors. Since the number and type of soft processors are configurable, PSoCs can be various heterogeneous multiprocessors. However, the inter-heterogeneous-processor communications are not supported by single binary operating systems. Proposed method automatically synthesizes the inter-heterogeneous-processor communications at an application layer from a general model description. The case study shows that automatically generated inter-heterogeneous-processor communication exactly runs the system on heterogeneous multiprocessors.
In this paper, we propose the use of a memory system which has a partially reliable scratch-pad memory (SPM). The reliable region of the SPM employing the ECC is higher soft error tolerant but larger energy consumption than the normal region. We propose an allocation method in order to optimize energy consumption while ensuring required reliability. An allocation method about instruction and data to proposed memory system is formulated as integer linear programming, where the solution archives optimal energy consumption and required reliability. Evaluation result shows that the proposed method is effective when overhead for error correction is large.
To balance between cost and performance, and to explore 3D field-programmable gate array (FPGA) with realistic 3D integration processes, we propose spatially distributed and functionally distributed types of 3D FPGA architectures. The functionally distributed architecture consists of two wafers, a logic layer and a routing layer, and is stacked by a face-down process technology. Since vertical wires pass through microbumps, no TSVs are needed. In contrast, the spatially distributed architecture is divided into multiple layers with the same structure, unlike in the functionally distributed type. This architecture can be expanded to more than two layers by stacking multiples of the same die. The goal of this paper is to elucidate the advantages and disadvantages of these two types of 3D FPGAs. According to our evaluation, when only two layers are used, the functionally distributed architecture is more effective. When higher performance is achieved by using more than two layers, the spatially distributed architecture achieves better performance.
An application of the stochastic A/D conversion to multi-bit delta-sigma modulators is considered, and a novel correction technique for D/A converter (DAC) error is proposed. The stochastic A/D conversion can reduce the area of the quantizer and allows large mismatches. The proposed calibration technique corrects DAC errors using a programmable quantizer. The programmable quantizer has a non-linear characteristic that cancels DAC errors. Using this technique, we can decrease the influence of DAC errors without using conventional dynamic element matching. This A/D converter has a non-linear quantization characteristic, so output digital code must be corrected using a programmable encoder. This code correction and setting of the quantization levels are carried out based on calibration data obtained using genetic algorithm.
Dichromats are color-blind persons missing one of the three cone systems. We consider a computer simulation of color confusion for dichromats for any colors on any video device, which transforms color in each pixel into a representative color among the set of its confusion colors. As a guiding principle of the simulation we adopt the proportionality law between the pre-transformed and post-transformed colors, which ensures that the same colors are not transformed to two or more different colors apart from intensity. We show that such a simulation algorithm with the proportionality law is unique for the video displays whose projected gamut onto the plane perpendicular to the color confusion axis in the LMS space is hexagon. Almost all video display including sRGB satisfy this condition and we demonstrate this unique simulation in sRGB video display. As a corollary we show that it is impossible to build an appropriate algorithm if we demand the additivity law, which is mathematically stronger than the proportionality law and enable the additive mixture among post-transformed colors as well as for dichromats.
Mobile phones and video game controllers using gesture recognition technologies enable easy and intuitive operations, such as those in drawing objects. Gesture recognition systems generally require several samples of training data before recognition takes place. However, recognition accuracy deteriorates as time passes since the trajectory of the gestures changes due to fatigue or forgetfulness. We investigated the change in gestures and found that the first several samples of gestures were not suitable for training data. Therefore, we propose two methods of finding appropriate data for training for long-term use. We confirmed that the proposed methods found better training data than the conventional method from the viewpoints of the number of data collected and recognition accuracy.
We propose a new image denoising method with shrinkage. In the proposed method, small blocks in an input image are projected to the space that makes projection coefficients sparse, and the explicitly evaluated sparsity degree is used to control the shrinkage threshold. On average, the proposed method obtained higher quantitative evaluation values (PSNRs and SSIMs) compared with one of the state-of-the-art methods in the field of image denoising. The proposed method removes random noise effectively from natural images while preserving intricate textures.
In this paper, we propose a fast and accurate object detection algorithm based on binary co-occurrence features. In our method, co-occurrences of all the possible pairs of binary elements in a block of binarized HOG are enumerated by logical operations, i.g. circular shift and XOR. This resulted in extremely fast co-occurrence extraction. Our experiments revealed that our method can process a VGA-size image at 64.6fps, that is two times faster than the camera frame rate (30fps), on only a single core of CPU (Intel Core i7-3820 3.60GHz), while at the same time achieving a higher classification accuracy than original (real-valued) HOG in the case of a pedestrian detection task.
Nowadays, the design of the representation of images is one of the most crucial factors in the performance of visual categorization. A common pipeline employed in most of recent researches for obtaining an image representation consists of two steps: the encoding step and the pooling step. In this paper, we introduce the Mahalanobis metric to the two popular image patch encoding modules, Histogram Encoding and Fisher Encoding, that are used for Bag-of-Visual-Word method and Fisher Vector method, respectively. Moreover, for the proposed Fisher Vector method, a close-form approximation of Fisher Vector can be derived with the same assumption used in the original Fisher Vector, and the codebook is built without resorting to time-consuming EM (Expectation-Maximization) steps. Experimental evaluation of multi-class classification demonstrates the effectiveness of the proposed encoding methods.
This paper presents automatic Martian dust storm detection from multiple wavelength data based on decision level fusion. In our proposed method, visual features are first extracted from multiple wavelength data, and optimal features are selected for Martian dust storm detection based on the minimal-Redundancy-Maximal-Relevance algorithm. Second, the selected visual features are used to train the Support Vector Machine classifiers that are constructed on each data. Furthermore, as a main contribution of this paper, the proposed method integrates the multiple detection results obtained from heterogeneous data based on decision level fusion, while considering each classifier's detection performance to obtain accurate final detection results. Consequently, the proposed method realizes successful Martian dust storm detection.
In this work, we proposes a simple yet effective method for improving performance of local feature matching among equirectangular cylindrical images, which brings more stable and complete 3D reconstruction by incremental SfM. The key idea is to exiplictly generate synthesized images by rotating the spherical panoramic images and to detect and describe features only from the less distroted area in the rectified panoramic images. We demonstrate that the proposed method is advantageous for both rotational and translational camera motions compared with the standard methods on the synthetic data. We also demonstrate that the proposed feature matching is beneficial for incremental SfM through the experiments on the Pittsburgh Reserach dataset.
This paper investigates performances of silhouette-based and depth-based gait authentication considering practical sensor settings where sensors are located in an environments afterwards and usually have to be located quite near to people. To realize fair comparison between different sensors and methods, we construct full-body volume of walking people by a multi-camera environment so as to reconstruct virtual silhouette and depth images at arbitrary sensor positions. In addition, we also investigate performances when we have to authenticate between frontal and rear views. Experimental results confirm that the depth-based methods outperform the silhouette-based ones in the realistic situations. We also confirm that by introducing Depth-based Gait Feature, we can authenticate between the frontal and rear views.
Facial part labeling which is parsing semantic components enables high-level facial image analysis, and contributes greatly to face recognition, expression recognition, animation, and synthesis. In this paper, we propose a cost-alleviative learning method that uses a weighted cost function to improve the performance of certain classes during facial part labeling. As the conventional cost function handles the error in all classes equally, the error in a class with a slightly biased prior probability tends not to be propagated. The weighted cost function enables the training coefficient for each class to be adjusted. In addition, the boundaries of each class may be recognized after fewer iterations, which will improve the performance. In facial part labeling, the recognition performance of the eye class can be significantly improved using cost-alleviative learning.
This paper provides a survey of different techniques for measuring semantic similarity and relatedness of word pairs. This covers both knowledge-based approaches exploiting taxonomies like WordNet, and corpus-based approaches which rely on distributional statistics. We introduce these techniques, provide evaluations of their result performance, and discuss their merits and shortcomings. A special focus is on word embeddings, a new technique which recently became popular with the AI community. While word embeddings are not fully understood yet, they show promising results for similarity tasks, and may also be suitable for capturing significantly more complex features like relational similarity.