Multi-Processor Systems-on-Chips (MPSoCs) emerge as the predominant platform in embedded real-time applications. A large variety of ubiquitous services should be implemented by embedded systems in a cost- and power-efficient way, yet providing a maximum degree of performance, usability and dependability. By using a scalable Network-on-Chip (NoC) architecture which replaces the traditional point-to-point and bus connections in conjunction with performant IP cores it is possible to use the available performance to consolidate functionality on a single MPSoC platform. But especially when uncritical best-effort applications (e.g., entertainment) and critical applications (e.g., pedestrian detection, electronic stability control) are combined on the same architecture (mixed-criticality), validation faces new challenges. Due to complex resource sharing in MPSoCs the timing behavior becomes more complex and requires new analysis methods. Additionally, applications that may exhibit multiple behaviors corresponding to different operating modes (e.g., initialization mode, fault-recovery mode) need to be also considered in the design of mixed-critical MPSoCs. In this paper, challenges in the design of mixed-critical systems are discussed and formal analysis solutions which consider shared resources, NoC communication, multi-mode applications and their reliabilities are proposed.
Delay testing is one of key processes in production test to ensure high quality and high reliability for logic circuits. Test escape missing defective chips can be reduced by introducing delay testing. On the other hand, we need to concern yield loss caused by delay testing, i.e., over-testing. Many methods and techniques have been developed to solve problems on delay testing. In this paper, we introduce fundamental techniques of delay testing and survey recent problems and solutions. Especially we focus on techniques to enhance test quality, to avoid over-testing, and to make test design efficient by treating circuits described at register transfer level.
We propose a novel method to generate partial products for reduced area parallel multipliers. Our method reduces the total number of partial product bits of parallel multiplication by about half. We call partial products generated by our method Compound Partial Products (CPPs). Each CPP has four candidate values: zero, a part of the multiplicand, a part of the multiplier and a part of the sum of the operands. Our method selects one from the four candidates according to a pair of a multiplicand bit and a multiplier bit. Multipliers employing the CPPs are approximately 30% smaller than array multipliers without radix-4 Booth's method, and approximately up to 10% smaller than array multipliers with radix-4 Booth's method. We also propose an acceleration method of the multipliers using CPPs.
Since target applications running on an embedded processor are much limited in embedded systems, we can optimize its cache configuration based on the number of sets, block size, and associativities. An extremely fast cache configuration simulation method, CRCB (Configuration Reduction approach by the Cache Behavior), has been recently proposed which can calculate cache hit/miss counts accurately for possible cache configurations when the three parameters above are changed. The CRCB method assumes LRU-based (Least Recently Used-based) cache but many recent processors use FIFO-based (First In First Out-based) cache or PLRU-based (Pseudo LRU-based) cache due to its hardware cost. In this paper, we propose exact and fast L1 cache configuration simulation algorithms for embedded applications that use PLRU or FIFO as a cache replacement policy. Firstly, we prove that the CRCB method can be applied not only to LRU but also to other cache replacement policies including FIFO and PLRU. Secondly, we prove several properties for FIFO- and PLRU-based caches and we propose associated cache simulation algorithms which can simulate simultaneously more than one cache configurations with different cache associativities accurately for FIFO or PLRU. Finally, many experimental results demonstrate that our cache configuration simulation algorithms obtain accurate cache hit/miss counts and run up to 249 times faster than a conventional cache simulator.
In this paper, we present a prototype MIPS R3000 processor, which integrates the fine-grained power gating technique into its functional units. To reduce the leakage power consumption, functional units, such as multiplier and divider can be power-gated individually according to the workload of the execution program. The prototype chip - Geyser-1 has been implemented with Fujitsu's 65nm CMOS technology; and to facilitate the design process with fine-grained power gating, a fully automated design flow has also been proposed. Comprehensive real-chip evaluations have been performed to verify the leakage reduction efficiency. According the evaluation results with benchmark programs, the fine-grained power gating can reduce the power of the processor by 5% at 25°C and 23% at 80°C.
In recent application-specific integrated circuit design, using transparent latches as storage elements has been intensively studied, since designs using latches (latch-based design) has a large potential to improve the performance yield. However, latch-based design is prone to violate the hold constraint because it is difficult for latches to hold the output data when input switches during the transparent state. To address the hold problem, this paper proposes a novel hold guarantee framework in latch-based design, based on the lifetime extension approach. Since the lifetime extension-based design suffers from an increase in registers due to the strict sharing condition, another method referred to as minimum-delay compensation (MDC) is introduced to accelerate the register sharing. The excessive use of MDC increases the total design cost on the contrary, thereby, this paper formulates the MDC cost (especially, area cost) minimization problem under the specified number of registers, and reveals the computational complexities of the problem. To tackle the problem, two algorithms are proposed: the left-edge-based algorithm and the integer linear programming-based algorithm. They are applied to benchmark circuits, and experimental results showed that the proposed framework can reduce the area cost compared to conventional design with keeping the hold constraint.
Message sequence charts (MSCs) and high-level MSCs (HMSCs) have been standardized to model interactions of parallel processes as message exchanges. We can flexibly express parallel behaviors with MSCs, but alternatively, it is possible to put unintended orders of messages into the MSCs. This paper focuses on detection of such unintended orders as discord. We propose an encoding scheme in which the analysis of an HMSC is converted into a boolean SAT problem. Experimental results show that our approach achieves efficient analysis of HMSCs which have a large number of processes or a large size of graphs. This contributes efficient analysis of specification on complex interactions.
This paper proposes simultaneous virtual-machine logging and replay. It performs logging and replay simultaneously on the same machine through the use of two virtual machines, one for the primary execution and the other for the backup execution. While the primary execution produces the execution history, the backup execution consumes the history by replaying it. The size of an execution log can be limited to a certain size; thus, huge storage devices becomes unnecessary. We developed such a logging and replaying feature in a VMM. It can log and replay the execution of the Linux operating system. Our experiment results show the overhead of the primary execution is only fractional.
This paper describes our approach to making the Gandalf Virtual Machine Monitor (VMM) interruptible. Gandalf is designed to be a lightweight VMM for use in embedded systems. Hardware interrupts are directly notified to its guest operating system (OS) kernel without the interventions of the VMM, and the VMM only processes the exceptions caused by the guest kernel. Since the VMM processes those exceptions with interrupts disabled, the detailed performance analysis using PMC (Performance Monitoring Counters) revealed that the time duration while the interrupts are disabled is rather long. By making Gandalf interruptible, we are able to make VMM based systems more suitable for embedded systems. We analyzed the requirements for making Gandalf interruptible, and designed and implemented mechanisms to achieve this. The experimental results show that making Gandalf interruptible significantly reduces a duration of execution time with interrupts disabled while it does not impact performance.
This paper proposes SPLiT (Scalable Performance Library Tool) as the methodology to improve performance of applications on multicore processors through CPU and cache optimizations on the fly. SPLiT is designed to relieve the difficulty of the performance optimization of parallel applications on multicore processors. Therefore, all programmers have to do to benefit from SPLiT is to add a few library calls to let SPLiT know which part of the application should be analyzed. This simple but compelling optimization library contributes to enrich pervasive servers on a multicore processor, which is a strong candidate for an architecture of information appliances in the near future. SPLiT analyzes and predicts application behaviors based on CPU cycle counts and cache misses. According to the analysis and predictions, SPLiT tries to allocate processes and threads sharing data onto the same physical cores in order to enhance cache efficiency. SPLiT also tries to separate cache effective codes from the codes with more cache misses for the purpose of the avoidance of cache pollutions, which result in performance degradation. Empirical experiments assuming web applications validated the efficiency of SPLiT and the performance of the web application is improved by 26%.
This paper describes an investigation on the impact of selecting a specific defect type in software inspection. We conducted a controlled experiment concerning the effect of selecting a defect type on software inspection. In the experiment, 32 practitioners were randomly assigned to three groups performing (a) inspection without a selected defect type, (b) inspection with a selected defect type determined at the beginning of inspection, and (c) inspection with a selected defect type determined at the beginning of inspection and at every defect detection, verification of whether the detected defect is categorized as the selected defect type. The results of the experiment show that specifying the selected defect type increases the proportion of defects detected of the selected type to other detected defects. The difference between the results of groups (a) and (b) indicates that in ordinary inspections, the procedure employed by group (b) should provide higher efficiency.
Designated Verifier Signature (DVS) guarantees that only a verifier designated by a signer can verify the “validity of a signature”. In this paper, we propose a new variant of DVS; Proxiable Designated Verifier Signature (PDVS) where the verifier can commission a third party (i.e., the proxy) to perform some process of the verification. In the PDVS system, the verifier can reduce his computational cost by delegating some process of the verification without revealing the validity of the signature to the proxy. In all DVS systems, the validity of a signature means that a signature satisfies both properties that (1) the signature is judged “accept” by a decision algorithm and (2) the signature is confirmed at it is generated by the signer. So in the PDVS system, the verifier can commission the proxy to check only the property (1). In the proposed PDVS model, we divide verifier's secret keys into two parts; one is a key for performing the decision algorithm, and the other is a key for generating a dummy signature, which prevents a third party from convincing the property (2). We also define security requirements for the PDVS, and propose a PDVS scheme which satisfies all security requirements we define.
Let G be an additive group generated by an element G of prime order r. The discrete logarithm problem with auxiliary input (DLPwAI) is a problem to find α on inputs G, αG, αdG ∈ G for a positive integer d dividing r-1. The infeasibility of DLPwAI ensures the security of some pairing-based cryptographic schemes. In 2006, Cheon proposed an algorithm for solving DLPwAI which works better than conventional algorithms. In this paper, we report our experimental results of Cheon's algorithm on a pairing-friendly elliptic curve defined over GF(3127). Moreover, based on our experimental results, we estimate the required cost of Cheon's algorithm to solve DLPwAI on some pairing-friendly elliptic curves over a finite field of characteristic 3. Our estimation implies that DLPwAI on a part of pairing-friendly curves can be solved at reasonable cost when the optimal parameter d is chosen.
The reachability problem for an initial term, a goal term, and a rewrite system is to decide whether the initial term is reachable to goal one by the rewrite system or not. The innermost reachability problem is to decide whether the initial term is reachable to goal one by innermost reductions of the rewrite system or not. A context-sensitive term rewriting system (CS-TRS) is a pair of a term rewriting system and a mapping that specifies arguments of function symbols and determines rewritable positions of terms. In this paper, we show that both reachability for right-linear right-shallow CS-TRSs and innermost reachability for shallow CS-TRSs are decidable. We prove these claims by presenting algorithms to construct a tree automaton accepting the set of terms reachable from a given term by (innermost) reductions of a given CS-TRS.
GMount is a high-performance distributed file system with locality-aware metadata lookups and small installation effort. GMount organizes computer nodes in a decentralized hierarchical overlay to unify separate local file systems into a global shared namespace and achieve locality-aware metadata lookups. GMount offers not only better performance when application executes with considerable data access locality, but also the ability to effortlessly and rapidly enable data sharing among clusters, clouds, and supercomputers. This paper presents performance evaluation of the latest GMount implementation by using both micro-benchmark and real-world data-intensive applications. Experimental results demonstrate that GMount has highly scalable metadata and I/O operation performance when data access locality is common, and the performance of GMount is practically useful for routine data-intensive computing practice.
There are several ways to represent 3D scene information. One popular way is based on N-view plus N-depth representation. In applications based on this representation, efficient compression of both view and depth is important. In this paper, we present a depth coding method that uses depth downsampling and a novel depth upsample filter that uses the color view in the depth upsampling. Our method of depth down- and up-sampling is able to reconstruct clear object boundaries in the upsampled depth maps, and, therefore, we can obtain a better coding efficiency. Our experimental results show that the proposed depth resampling filter, used in combination with a standard state-of-the-art video encoder, can increase both the coding efficiency and rendering quality, particularly at lower bit rates.
Recently, Automated Trust Negotiation (ATN) has played an important role for two participants who want to automatically establish a trust relationship between each other in an open system so that they can receive some services or information. For example, when a person wants to buy a product from a website he will need to know that the website can be trusted or not. In this scheme, both parties (i.e., the person and the website, in this example) exchange their credentials and access control policies to each other automatically to ensure that the required policies of each party are met; following which the trust negotiation is established. Our proposed scheme allows both parties to learn whether or not, they agree to establish a trust relationship. After the scheme was performed, no policy was disclosed to each other. In this paper, we provide some building blocks used to construct our proposed scheme and describe the basic ideas for hiding access control policies and for implementing a conditional transfer. We also define the steps of how our protocol works with a numerical example. Moreover, we evaluate our scheme in terms of the computation cost by a mathematical analysis and the implementation using binary tree model of credentials and policies. Finally, we show that our scheme can be securely performed.
A botnet is a network of compromised computers infected with malware that is controlled remotely via public communications media. Many attempts at botnet detection have been made including heuristics analyses of traffic. In this study, we propose a new method for identifying independent botnets in the CCC Dataset 2009, the log of download servers observed by distributed honeypots, by applying the technique of Principal Component Analysis. Our main results include distinguishing four independent botnets when a year is divided into five phases.
Aiming to achieve sensing coverage for given Areas of Interest (AoI) over time at low cost in a People-Centric Sensing manner, we propose a concept of (α, T)-coverage of a target field where each point in the field is sensed by at least one mobile node with the probability of at least α during time period T. Our goal is to achieve (α, T)-coverage of a given AoI by a minimal set of mobile nodes. In this paper, we propose two algorithms: inter-location algorithm that selects a minimal number of mobile nodes from nodes inside the AoI considering the distance between them and inter-meeting-time algorithm that selects nodes regarding the expected meeting time between the nodes. To cope with the case that there is an insufficient number of nodes inside the AoI, we propose an extended algorithm which regards nodes inside and outside the AoI. To improve the accuracy of the proposed algorithms, we also propose an updating mechanism which adapts the number of selected nodes based on their latest locations during the time period T. In our simulation-based performance evaluation, our algorithms achieved (α, T)-coverage with good accuracy for various values of α, T, AoI size, and moving probability.
This paper proposes a video communication assist system using a companion robot in coordination with the user's conversational attitude toward the communication. In order to maintain a conversation and to achieve comfortable communication, it is necessary to provide the user's attitude-aware assistance. First, the system estimates the user's conversational state by a machine learning method. Next, the robot appropriately expresses its active listening behaviors, such as nodding and gaze turns, to compensate for the listener's attitude when she/he is not really listening to another user's speech, the robot shows communication-evoking behaviors (topic provision) to compensate for the lack of a topic, and the system switches the camera images to create an illusion of eye-contact, corresponding to the current context of the user's attitude. From empirical studies and a demonstration experiment, i) both the robot's active listening behaviors and the switching of the camera image compensate for the other person's attitude, ii) elderly people prefer long intervals between the robot's behaviors, and iii) the topic provision function is effective for awkward silences.