The ongoing scaling of CMOS technology facilitates the design of systems with continuously increasing functionality but also raises the susceptibility of these systems to reliability issues. These can for example be caused by high power densities and temperatures. At the moment it is still possible to cope with the posed challenges in an affordable manner. But in the future, a combination of design and run-time measures will become necessary in order to guarantee that reliability guidelines are met. Because of complexity reasons, the Electronic System Level (ESL) is gaining importance as starting point of design. Design alternatives are evaluated at ESL with respect to several design objectives, lately also including reliability. In this paper, the most important phenomena threatening the reliability are introduced and the current status of related research work and tools is presented. After that, a high level design space exploration considering performance, energy and reliability trade-offs in multi-core systems is introduced. Finally, it is shown how reliability can be further improved during run-time by the application of a machine learning system.
This paper introduces the concept and technology of Application-domain Specific Instruction-set Processor (ASIP). First, VLSI design trend over the decades is overviewed and processors are shown to be expected one of the main components in the System Level Design. Then, the advantage of ASIP over General Purpose Processor (GPP) and Application Specific Integrated Circuit (ASIC) is illustrated. Next, processor hardware description synthesis technology, application program development tool set generation technology, and processor architecture optimization technology are outlined. Then, as an ASIP development environment example, ASIP Meister is explained. Next, an application of ASIP to medical and healthcare is introduced. Finally, the possibility of ASIP as an important component of Multi Processor SoC (MPSoC) is discussed.
As the complexity of embedded systems grows, design space exploration at a system level plays a more important role than before. In the system-level design, system designers start from describing functionalities of the system as processes and channels, and then decide mapping of them to various Processing Elements (PEs) including processors and dedicated hardware modules. A mapping decision is evaluated by simulation or FPGA-based prototyping. Designers iterate mapping and evaluation until all design requirements are met. We have developed two profilers, a process profiler and a memory profiler, for FPGA-based performance analysis of design candidates. The process profiler records a trace of process activations, while the memory profiler records a trace of channel accesses. According to mapping of processes to PEs, the profilers are automatically configured and instrumented into FPGA-based system prototypes by a system-level design tool that we have developed. Designers therefore need to manually modify neither the system description nor the profilers upon each change of process mapping. In order to demonstrate the effectiveness of our profilers, two case studies are conducted where the profiles are used for design space exploration of AES encryption and MPEG4 decoding systems.
With the growing complexity of consumer embedded products and the improvements in process technology, multiprocessor system-on-chip (MPSoC) architectures have become widespread. These MPSoCs include not only multiple processors but also multiple dedicated hardware accelerators that can be designed from software programs, written in high-level languages like ‘C’, using high-level synthesis tools (HLS). Traditional techniques of HW/SW co-simulation are very slow and time consuming when used for exploring HW/SW partitioning strategies. There is a strong need for methodologies that quickly and accurately estimate the performance of such complex systems. In this paper, we present a system level performance estimation method for exploring the trade-off between hardware and software implementations in such “hybrid” MPSoC architectures. The key feature of our performance estimation is the unified timing model, in the form of a program trace graph (PTG) for both software executions on processors as well as the hardware blocks (finite state machines) synthesized by a HLS tool. The RTL code from the HLS tool is analyzed and its state transition graph is transformed into the PTG, which was originally developed for software timing annotations. These PTGs represent the workload of the computation that is driven by program execution traces in the form of ‘Branch Bitstreams’. Our methodology allows highly accurate performance estimation under the existence of data dependent behavior of software and hardware components.
This paper proposes a low-power ASIP generation method by automatically extracting minimum execution conditions of pipeline registers for clock gating. For highly effective power reduction by clock gating, it is important to create minimum execution conditions, which can shut off redundant clock supplies for registers. To automatically extract the conditions, our proposed method employs micro-operation descriptions (MODs) that specify ASIP architecture. Utilizing MODs through the ASIP generation processes, our proposed method automatically extracts the minimum execution conditions. Experimental results show that the power consumption of the pipeline registers in ASIPs generated with the proposed method is reduced about 80% compared to ASIPs that are not clock gated, and about 60% compared to ASIPs that are clock gated by Power Compiler with negligible delay and area overhead.
This paper presents a semi-automated way to generate control units for complex VLSI hardware designs based on a massive parallel micro-controller. This micro-controller can execute as many instructions in parallel as needed by the hardware design as well as having an unlimited number of input and output ports. Two versions of this control unit are presented in this paper. A generic one, which is generated from a set of parameters given by the designer and an optimized version which parses the control program that will run on the control unit in order to generate an optimized micro-controller. Results show that up to a 60% in area savings can be achieved using the optimized controller unit instead of the generic one. The presented controller was validated using a previously developed SoC design with a FSM based control unit showing that the functionality can be completely replicated at the expense of incurring in a 7.2% and 15.4% area overhead respectively.
In order to reduce the computational complexity of model checking, we can use a subset of first-order logic, called EUF, but the model checking problem using EUF is generally undecidable. In our previous work, we proposed a technique for checking invariant property for an over-approximate set of states including all the reachable states. In this paper, we extend this technique for handling not only invariants but also temporal properties written in computational tree logic with EUF extension. We show that model checking becomes possible for designs which are hard to handle without the proposed technique.
This paper proposes a method to compute delay values in 3-valued fault simulation for test cubes which are test patterns with unspecified values (Xs). Because the detectable delay size of each fault by a test cube is not fixed before assigning logic values to the Xs in the test cube, the proposed method only computes a range of the detectable delay values of the test patterns covered by the test cubes. By using the proposed method, we derive the lowest and the highest test quality of test patterns covered by the test cubes. Furthermore, we also propose a GA (genetic algorithm)-based method to generate fully specified test patterns with high test quality from test cubes. Experimental results for benchmark circuits show the effectiveness of the proposed methods.
In this paper we propose a synthesizable LDPC decoder IP core for WiMax and WiFi applications. Two new techniques are applied in the proposed decoder to improve the decoding performance. Firstly, a high parallelism permutation network (PN) is proposed to perform the circulant shift according to the parity check matrix (PCM) defined in WiMax and WiFi standards. By using the proposed PN, at most, four independent code frames with small code length are decoded concurrently, which largely improves the decoding throughput (2-4 times). Secondly, a fast early stopping criterion specialized for WiMax and WiFi LDPC code is proposed to reduce the average iteration number. Unlike the early works, by utilizing our proposed stopping criterion, the decoding will be stopped when all the information bits of a code frame are corrected even if there are still some errors in redundant part. Experiment results show that, it can reduce up to 20% iteration numbers compared to popular used stopping criterion.
This paper presents a new architecture for high profile intra prediction in H.264/AVC video coding standard. Our goal is to design an Intra prediction engine for 4Kx2K@60fps Ultra High Definition (UHD) Decoder. The proposed architecture can provide very stable throughput, which can predict any H.264 intra prediction mode within 66 cycles. Compared with previous design, this feature can guarantee the whole decoding pipeline to work efficiently. The intra prediction engine is divided into two parallel pipelines, one is used for 4x4 block prediction loops and the other is used to prepare data for MB loops. It can overlap data preparing time with prediction time, which can finish data loading and storing within 2 cycles. Comparing with MB pipeline only architecture, it can achieve more than 3.2 times higher throughput with 29.8K gates cost. The proposed architecture is verified to work at 175MHz for our UHD Decoder by using TSMC 90G.
We have developed an interpretation of modal μ-calculus using min-plus algebra N∞, the set of all natural numbers and infinity ∞. Disjunctions are interpreted by min, and conjunctions by plus. This interpretation allows complex properties, such as the shortest path on a Kripke structure or the number of states that satisfy a specified condition, to be expressed with simple formulas. We defined the semantics of modal μ-calculus on min-plus algebra, and then described a model-checking algorithm for the semantics and its implementation. Although simple iterative computation of the least fixed-point generally does not terminate in N∞, due to abstraction, we made model-checking possible by reducing the least fixed-point computation to the greatest fixed-point computation. Finally, we discuss the relationship between our semantics and the theory of Kripke structures on complete Heyting algebra.
If the given problem instance is partially solved, we want to minimize our effort to solve the problem using that information. In this paper we introduce the measure of entropy, H (S), for uncertainty in partially solved input data S (X) = (X1, . . . , Xk), where X is the entire data set, and each Xi is already solved. We propose a generic algorithm that merges Xi's repeatedly, and finishes when k becomes 1. We use the entropy measure to analyze three example problems, sorting, shortest paths and minimum spanning trees. For sorting Xi is an ascending run, and for minimum spanning trees, Xi is interpreted as a partially obtained minimum spanning tree for a subgraph. For shortest paths, Xi is an acyclic part in the given graph. When k is small, the graph can be regarded as nearly acyclic. The entropy measure, H (S), is defined by regarding pi = ¦Xi¦/¦X¦ as a probability measure, that is, H (S) = -n (p1 log p1 + . . . + pk log pk), where n = ¦X1¦ + . . . + ¦Xk¦. We show that we can sort the input data S (X) in O (H (S)) time, and that we can complete the minimum cost spanning tree in O (m + H (S)) time, where m in the number of edges. Then we solve the shortest path problem in O (m + H (S)) time. Finally we define dual entropy on the partitioning process, whereby we give the time bounds on a generic quicksort and the shortest path problem for another kind of nearly acyclic graphs.
Document clustering is the process of partitioning a set of unlabeled documents into clusters such that documents within each cluster share some common concepts. To analyze the clusters easily, it is convenient to represent the concepts using some key terms. However, by using terms as features, text data is represented in a very high-dimensional vector space, and the computational cost is high. Note that the text data are of high sparsity, and not all weights in the centers are important for classification. Based on this observation, we propose in this study a comparative advantage-based clustering algorithm which can find out the relative strength between clusters, as well as keep and enlarge their strength. Since the vectors are represented by term frequency, the clustering results are more comprehensible compared with dimensionality reduction methods. Experimental results show that the proposed algorithm can keep the characteristic of k-means algorithm, but the computational cost is much lower. Moreover, we also found that the proposed method has a higher chance of getting better results.
We sometimes meet an experiment in which its rate constants cannot be determined in this experiment only; in this case, it is called an underdetermined experiment. One of methods to overcome underdetermination is to combine results of multiple experiments. Multiple experiments give rise to a large number of parameters and variables to analyze, and usually even have a complicated solution with multiple solutions, which situation is unknown to us beforehand. These two difficulties: underdetermination and multiple solutions, lead to confusion as to whether rate constants can intrinsically be determined through experiment or not. In order to analyze such experiments, we use ‘prime ideal decomposition’ to decompose a solution into simpler solutions. It is, however, hard to decompose a set of polynomials with a large number of parameters and variables. Exemplifying a bio-imaging problem, we propose one tip and one technique using ‘resultant’ from a biological viewpoint.
As demand for high fidelity multimedia content has soared, content distribution has emerged as a critical application. Large multimedia files require effective content distribution services such as content distribution networks (CDNs). A recent trend in CDN development is the use of peer-to-peer (P2P) techniques. P2P-based CDNs have several advantages over conventional non-P2P-based CDNs in scalability, fault resilience, and cost-effectiveness. Unfortunately, P2P-based content distribution poses a crucial problem in that update propagation is quite difficult to accomplish. This is because peers cannot obtain a global view of replica locations on the network. There are still several issues in conventional approaches to update propagation. They degrade the scalability, the fault resilience, and the cost-effectiveness of P2P-based content distribution, they also consume the network bandwidth, or take a long time. In this paper, we propose the speculative update, which quickly propagates an update to replicas with less bandwidth consumption in a pure P2P fashion. The speculative update enables a fast update propagation on structured P2P-based CDNs. Each server attempts to determine the directions in which there will be replicas with a high probability and speculatively relays update messages in those directions. Simulation results demonstrate that our mechanism quickly propagates an update to replicas with less bandwidth consumption. The The speculative update completes update propagation as fast as the simple gossip-based update propagation even with up to 69% fewer messages per second. Compared to the convergence-guaranteed random walk, the speculative update completes an update propagation faster by up to 92%.
We propose a type inference algorithm for a polymorphic type system which provides improved error messages. While the standard type inference algorithms often produce unnecessarily long or incomplete error messages, our algorithm provides relevant and complete information for type errors. It is relevant in the sense that all the program points and types in the output of our algorithm contribute to some type error, and is complete in the sense that, for each type error, our algorithm identifies not only two conflicting types, but also all types which conflict with each other. The latter property is particularly useful for debugging programs with lists or case branches. Our algorithm keeps track of the set of program points that are relevant to each type. To achieve completeness, we introduce a new type variable which represents a conflict among two or more incompatible types, and extend the unification algorithm to handle the special type variable appropriately. Finally, we argue that our algorithm is more efficient than those in the literature when there are more than two conflicting types in the given expression.
For fast ε-similarity search, various index structures have been proposed. Yi, et al. proposed a concept multi-modality support and suggested inequalities by which ε-similarity search by L1, L2 and L∞ norm can be realized. We proposed an extended inequality which allows us to realize ε-similarity search by arbitrary Lp norm using an index based on Lq norm. In these investigations a search radius of a norm is converted into that of other norm. In this paper, we propose an index structure which allows search by arbitrary Lp norm, called mm-GNAT (multi-modality support GNAT), with the extention of ranges of GNAT, instead of extending the search radius. The index structure is based on GNAT (Geometric Near-neighbor Access Tree). We show that ε-similarity search by arbitrary Lp norm is realized on mm-GNAT. In addition, we performed search experiments on mm-GNAT with artificial data and music data. The results show that the search by arbitrary Lp norm is realized and the index structure has better search performance than Yi's method except for search by L2 norm.
Array CGH is a useful technology for detecting copy number aberrations in genome-wide scale. We study the problem of detecting differentially aberrant genomic regions in two or more groups of CGH arrays and estimating the statistical significance of those regions. An important property of array CGH data is that there are spatial correlations among probes, and we need to take this fact into consideration when we develop a computational algorithm for array CGH data analysis. In this paper we first discuss three difficult issues underlying this problem, and then introduce nearest-neighbor multivariate test in order to alleviate these difficulties. Our proposed approach has three advantages. First, it can incorporate the spatial correlation among probes. Second, genomic regions with different sizes can be analyzed in a common ground. And finally, the computational cost can be considerably reduced with the use of a simple trick. We demonstrate the effectiveness of our approach through an application to previously published array CGH data set on 75 malignant lymphoma patients.
A protein expresses various functions by interacting with chemical compounds. Protein function is clarified by protein structure analysis and the obtained knowledge has been stated in a number of documents. Extracting the function information and constructing the database are useful for various application fields such as drug discovery, understanding of life phenomenon, and so on. However, it is impractical to extract the function information manually from a number of documents for constructing the database, which strongly provide motivation to study automatic extraction of the function information. Extraction of protein function information is considered as a classification problem, namely, whether each sentence from the target document includes the function information or not is determined. Typically, in the case of addressing such a classification problem, a classifier is learned using the training data previously given. However, the accuracy is not high when the training data is not large enough. In such a case, we attempt to improve the accuracy of classification by extending the training data. Effective sentences for getting high accuracy are selected from the reference data aside from the training data set, and added to the training data. In order to select such effective sentences, we introduce the reliability of temporary labels assigned to sentences in the reference data. Sentences with low reliability temporary labels are presented to users, assigned true labels as users' feedback, and added to the training data. Additionally, a classifier is learned by the training data with sentences with high reliability temporary labels. By iterating this process, we attempt to improve the accuracy steadily. In the experiment, compared with the related approach, the accuracy is higher when the iteration steps of feedbacks and the number of sentences returned by users' feedback are small. Thus, it is confirmed that the training data is appropriately extended based on users' feedback by the proposed method. In addition, this result serves a purpose of reducing users' load.
We propose a technique called short-term principal component analysis (ST-PCA) to analyze motion capture (MoCap) data of realistic movements in a high dimensional time series. Our ST-PCA method is successfully applied to motion beat induction, which is an important aspect in human perception. ST-PCA performs PCA in a sliding window to locally extract the major variance of the movement into the coordinates of the first principal component, thus accurately determining the desired motion beats. Our approach differs from conventional methods in that we estimate the motion beats by analyzing the motion signals as a whole rather than individually in each channel. Moreover, our algorithm is carefully designed in terms of the three characteristics of MoCap data: hierarchical structure, spatial correlation, and temporal coherence. Experimental results demonstrate that the proposed method outputs much more accurate motion beats in a wide range of motion categories, including complicated dances, than current state-of-the-art alternatives.
Synthetic focusing methods, which superimpose pre-acquired multi-view images onto post-defined surfaces, are intended to simulate a camera with a large aperture and variable focus. We developed a synthetic focusing system with increased flexibility. The flexibility comes from independent control of the optical focuses of the input multi-view image. We arranged liquid lenses in a compact 8×8 array, in which focal lengths can be changed independently using electric signals, and placed a video camera behind the lens array to capture multi-view images. We found that if the optical and synthetic focuses coincide with each other, we can obtain images with more natural blurring effects. Furthermore, we can also set the optical focuses to multiple levels to increase the target depths of the synthetic focusing.
A common task of tone mapping algorithms is to reproduce high dynamic range images (HDR) on low dynamic range (LDR) display devices such as printers and monitors. We present a new tone mapping algorithm for the display of HDR images that was inspired by the adaptive process of the human visual system. The proposed algorithm is based on center/surround Retinex processing. Our method has two novel aspects. The input luminance image is first compressed by a global tone mapping curve. The curvature of the compression curve is adapted locally based on the pseudo-Hilbert scan technique, so it can provide a better overall impression before the subsequent local processing. Second, the local details are enhanced according to a non-linear adaptive spatial filter (Gaussian filter), whose shape (filter variance) is adapted to the high-contrast edges of the image. The proposed method takes advantage of the properties of both global and local processing while overcoming their respective disadvantages. Therefore, the algorithm can preserve visibility and contrast impression of high dynamic range scenes in standard display devices. We tested the proposed method on a variety of HDR images and also compared it to previous research. The results indicated that our method was effective for displaying images with high visual quality.
In this paper, we propose a novel method for a robot to detect robot-directed speech: to distinguish speech that users speak to a robot from speech that users speak to other people or to themselves. The originality of this work is the introduction of a multimodal semantic confidence (MSC) measure, which is used for domain classification of input speech based on the decision on whether the speech can be interpreted as a feasible action under the current physical situation in an object manipulation task. This measure is calculated by integrating speech, object, and motion confidence with weightings that are optimized by logistic regression. Then we integrate this measure with gaze tracking and conduct experiments under conditions of natural human-robot interactions. Experimental results show that the proposed method achieves a high performance of 94% and 96% in average recall and precision rates, respectively, for robot-directed speech detection.
The goal of this paper is to link a bridge between social relationship and cultural variation to predict conversants' non-verbal behaviors. This idea serves as a basis of establishing a parameter based socio-cultural model, which determines non-verbal expressive parameters that specify the shapes of agent's nonverbal behaviors in HAI. As the first step, a comparative corpus analysis is done for two cultures in two specific social relationships. Next, by integrating the cultural and social parameters factors with the empirical data from corpus analysis, we establish a model that predicts posture. The predictions from our model successfully demonstrate that both cultural background and social relationship moderate communicative non-verbal behaviors.
There are many studies aimed at using port-scan traffic data for fast and accurate detection of rapidly spreading worms. This paper proposes two new methods for reducing the traffic data to a simplified form comprising of significant components of smaller dimensionality. (1) Dimension reduction via Principal Component Analysis (PCA), widely used as a tool in exploratory data analysis, enables estimation of how uniformly the sensors are distributed over the reduced coordinate system. PCA gives a scatter plot for the sensors, which helps to detect abnormal behavior in both the source address space and the destination port space. (2) One of the significant applications of PCA is to reduce the number of sensors without losing the accuracy of estimation. Our proposed method based on PCA allows redundant sensors to be discarded and the number of packets estimated even when half of the sensors are unavailable with accuracy of less than 3% of the total number of packets. In addition to our proposals, we report on experiments that use the Internet Scan Data Acquisition System (ISDAS) distributed observation data from the Japan Computer Emergency Response Team (JPCERT).
Creating security policy for SELinux is difficult because access rules often exceed 10,000 and elements in rules such as permissions and types are understandable only for SELinux experts. The most popular way to facilitate creating security policy is refpolicy which is composed of macros and sample configurations. However, describing and verifying refpolicy based configurations is difficult because complexities of configuration elements still exist, using macros requires expertise and there are more than 100,000 configuration lines. The memory footprint of refpolicy which is around 5MB by default, is also a problem for resource constrained devices. We propose a system called SEEdit which facilitates creating security policy by a higher level language called SPDL and SPDL tools. SPDL reduces the number of permissions by integrated permissions and removes type configurations. SPDL tools generate security policy configurations from access logs and tool user's knowledge about applications. Experimental results on an embedded system and a PC system show that practical security policies are created by SEEdit, i.e., describing configurations is semi-automated, created security policies are composed of less than 500 lines of configurations, 100 configuration elements, and the memory footprint in the embedded system is less than 500KB.
The purpose of this paper is to propose a quantitative approach for the effective and efficient assessment of risks related to information security. Though there are already several other approaches proposed to measure information security (IS) related risk, they are either inapplicable to real enterprises' IT landscapes or are of a qualitative nature, i.e. based on subjective decisions of the implementation team and thus could suffer from a significant degree of speculation. In contrast, our approach is based on objective statistical data, provides quantitative results and can be easily applied to any enterprise of any industry or any non-profit organization. An example of the application of the proposed approach to a real enterprise is also provided. The only prerequisite for the proposed methodology is a sufficient amount of incidents statistics collected under conditions described later in this paper. The reason for such research is that performing of IS related risk assessment is one of the procedures required to manage information security. And the process of IS management has recently become one of the highest concerns for most organizations and enterprises. It is caused not only by the growth of hackers' activity but also because of increasing legal requirements and compliance issues.