IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Online ISSN : 1745-1337
Print ISSN : 0916-8508
E100.A 巻, 7 号
選択された号の論文の35件中1~35を表示しています
Special Section on Design Methodologies for System on a Chip
  • Makoto Ikeda
    2017 年 E100.A 巻 7 号 p. 1362
    発行日: 2017/07/01
    公開日: 2017/07/01
    ジャーナル 認証あり
  • Hiroshi SAITO, Masashi IMAI, Tomohiro YONEDA
    原稿種別: PAPER
    2017 年 E100.A 巻 7 号 p. 1363-1373
    発行日: 2017/07/01
    公開日: 2017/07/01
    ジャーナル 認証あり

    In this paper, we propose a redundant task allocation method for multi-core systems based on the Duplication with Temporary Triple-Modular Redundancy and Reconfiguration (DTTR) scheme. The proposed method determines task allocation of a given task graph to a given multi-core system model from task scheduling in given fault patterns. Fault patterns defined in this paper consist of a set of faulty cores and a set of surviving cores. To optimize the average failure rate of the system, task scheduling minimizes the execution time of the task graph preserving the property of the DTTR scheme. In addition, we propose a selection method of fault patterns to be scheduled to reduce the task allocation time. In the experiments, at first, we evaluate the proposed selection method of fault patterns in terms of the task allocation time. Then, we compare the average failure rate among the proposed method, a task allocation method which packs tasks into particular cores as much as possible, a task allocation method based on Simulated Annealing (SA), a task allocation method based on Integer Linear Programming (ILP), and a task allocation method based on task scheduling without considering the property of the DTTR scheme. The experimental results show that task allocation by the proposed method results in nearly the same average failure rate by the SA based method with shorter task allocation time.

  • Moon Gi SEOK, Tag Gon KIM, Daejin PARK
    原稿種別: PAPER
    2017 年 E100.A 巻 7 号 p. 1374-1383
    発行日: 2017/07/01
    公開日: 2017/07/01
    ジャーナル 認証あり

    The rapid prototyping of a mixed-signal system-on-chip (SoC) has been enabled by reusing predesigned intellectual properties (IPs) and by integrating newly designed IP into the top design of SoC. The IPs have been designed on various hardware description levels, which leads to challenges in simulations that evaluate the prototyping. One traditional solution is to convert these heterogeneous IP models into equivalent models, that are described in a single description language. This conversion approach often requires manual rewriting of existing IPs, and this results in description loss during the model projection due to the absence of automatic conversion tools. The other solutions are co-simulation/emulation approaches that are based on the coupling of multiple simulators/emulators through connection modules. The conventional methods do not have formal theoretical backgrounds and an explicit interface for integrating the simulator into their solutions. In this paper, we propose a general co-simulation approach based on the high-level architecture (HLA) and a newly-defined programming language interface for interoperation (PLI-I) between heterogeneous IPs as a formal simulator interface. Based on the proposed PLI-I and HLA, we introduce formal procedures of integration and interoperation. To reduce integration costs, we split these procedures into two parts: a reusable common library and an additional model-dependent signal-to-event (SE) converter to handle differently abstracted in/out signals between the coupled IPs. During the interoperation, to resolve the different time-advance mechanisms and increase computation concurrency between digital and analog simulators, the proposed co-simulation approach performs an advanced HLA-based synchronization using the pre-simulation concepts. The case study shows the validation of interoperation behaviors between the heterogeneous IPs in mixed-signal SoC design, the reduced design effort in integrating, and the synchronization speedup using the proposed approach.

  • Shanlin XIAO, Tsuyoshi ISSHIKI, Dongju LI, Hiroaki KUNIEDA
    原稿種別: PAPER
    2017 年 E100.A 巻 7 号 p. 1384-1395
    発行日: 2017/07/01
    公開日: 2017/07/01
    ジャーナル 認証あり

    Object detection is at the heart of nearly all the computer vision systems. Standard off-the-shelf embedded processors are hard to meet the trade-offs among performance, power consumption and flexibility required by object detection applications. Therefore, this paper presents an Application Specific Instruction set Processor (ASIP) for object detection using AdaBoost-based learning algorithm with Haar-like features as weak classifiers. Algorithm optimizations are employed to reduce memory bandwidth requirements without losing reliability. In the proposed ASIP, Single Instruction Multiple Data (SIMD) architecture is adopted for fully exploiting data-level parallelism inherent to the target algorithm. With adding pipeline stages, application-specific hardware components and custom instructions, the AdaBoost algorithm is accelerated by a factor of 13.7x compared to the optimized pure software implementation. Compared with ARM946 and TMS320C64+, our ASIP shows 32x and 7x better throughput, 10x and 224x better area efficiency, 6.8x and 18.8x better power efficiency, respectively. Furthermore, compared to hard-wired designs, evaluation results show an advantage of the proposed architecture in terms of chip area efficiency while maintain a reliable performance and achieve real-time object detection at 32fps on VGA video.

  • Rei UENO, Naofumi HOMMA, Takafumi AOKI, Sumio MORIOKA
    原稿種別: PAPER
    2017 年 E100.A 巻 7 号 p. 1396-1408
    発行日: 2017/07/01
    公開日: 2017/07/01
    ジャーナル 認証あり

    This paper presents an automatic hierarchical formal verification method for arithmetic circuits over Galois fields (GFs) which are dedicated digital circuits for GF arithmetic operations used in cryptographic processors. The proposed verification method is based on a combination of a word-level computer algebra procedure with a bit-level PPRM (Positive Polarity Reed-Muller) expansion procedure. While the application of the proposed verification method is not limited to cryptographic processors, these processors are our important targets because complicated implementation techniques, such as field conversions, are frequently used for side-channel resistant, compact and low power design. In the proposed method, the correctness of entire datapath is verified over GF(2m) level, or word-level. A datapath implementation is represented hierarchically as a set of components' functional descriptions over GF(2m) and their wiring connections. We verify that the implementation satisfies a given total-functional specification over GF(2m), by using an automatic algebraic method based on the Gröbner basis and a polynomial reduction. Then, in order to verify whether each component circuit is correctly implemented by combination of GF(2) operations, i.e. logic gates in bit-level, we use our fast PPRM expansion procedure which is customized for handling large-scale Boolean expressions with many variables. We have applied the proposed method to a complicated AES (Advanced Encryption Standard) circuit with a masking countermeasure against side-channel attack. The results show that the proposed method can verify such practical circuit automatically within 4 minutes, while any single conventional verification methods fail within a day or even more.

  • Theint Theint THU, Jimpei HAMAMURA, Rie SOEJIMA, Yuichiro SHIBATA, Kiy ...
    原稿種別: PAPER
    2017 年 E100.A 巻 7 号 p. 1409-1417
    発行日: 2017/07/01
    公開日: 2017/07/01
    ジャーナル 認証あり

    Field Programmable Gate Array (FPGA) based robust model fitting enjoys immense popularity in image processing because of its high efficiency. This paper focuses on the tradeoff analysis of real-time FPGA implementation of robust circle and ellipse estimations based on the random sample consensus (RANSAC) algorithm, which estimates parameters of a statistical model from a data set of feature points which contains outliers. In particular, this paper mainly highlights implementation alternatives for solvers of simultaneous equations and compares Gauss-Jordan elimination and Cramer's rule by changing matrix size and arithmetic processes. Experimental evaluation shows a Cramer's rule approach coupled with long integer arithmetic can reduce most hardware resources without unacceptable degradation of estimation accuracy compared to floating point versions.

  • Toshiki HIGASHI, Hiroyuki OCHI
    原稿種別: PAPER
    2017 年 E100.A 巻 7 号 p. 1418-1426
    発行日: 2017/07/01
    公開日: 2017/07/01
    ジャーナル 認証あり

    This paper proposes 0-1-A-Ā LUT, a new programmable logic using atom switches, and a delay-optimal mapping algorithm for it. Atom switch is a non-volatile memory device of very small geometry which is fabricated between metal layers of a VLSI, and it can be used as a switch device of very small on-resistance and parasitic capacitance. While considerable area reduction of Look Up Tables (LUTs) used in conventional Field Programmable Gate Arrays (FPGAs) has been achieved by simply replacing each SRAM element with a memory element using a pair of atom switches, our 0-1-A-Ā LUT achieves further area and delay reduction. Unlike the conventional atom-switch-based LUT in which all k input signals are fed to a MUX, one of input signals is fed to the switch array, resulting area reduction due to the reduced number of inputs of the MUX from 2k to 2k-1, as well as delay reduction due to reduced fanout load of the input buffers. Since the fanout of this input buffers depends on the mapped logic function, this paper also proposes technology mapping algorithms to select logic function of fewer number of fanouts of input buffers to achieve further delay reduction. From our experiments, the circuit delay using our k-LUT is 0.94% smaller in the best case compared with using the conventional atom-switch-based k-LUT.

  • Kento HASEGAWA, Masao YANAGISAWA, Nozomu TOGAWA
    原稿種別: PAPER
    2017 年 E100.A 巻 7 号 p. 1427-1438
    発行日: 2017/07/01
    公開日: 2017/07/01
    ジャーナル 認証あり

    Due to the increase of outsourcing by IC vendors, we face a serious risk that malicious third-party vendors insert hardware Trojans very easily into their IC products. However, detecting hardware Trojans is very difficult because today's ICs are huge and complex. In this paper, we propose a hardware-Trojan classification method for gate-level netlists to identify hardware-Trojan infected nets (or Trojan nets) using a support vector machine (SVM) or a neural network (NN). At first, we extract the five hardware-Trojan features from each net in a netlist. These feature values are complicated so that we cannot give the simple and fixed threshold values to them. Hence we secondly represent them to be a five-dimensional vector and learn them by using SVM or NN. Finally, we can successfully classify all the nets in an unknown netlist into Trojan ones and normal ones based on the learned classifiers. We have applied our machine-learning-based hardware-Trojan classification method to Trust-HUB benchmarks. The results demonstrate that our method increases the true positive rate compared to the existing state-of-the-art results in most of the cases. In some cases, our method can achieve the true positive rate of 100%, which shows that all the Trojan nets in an unknown netlist are completely detected by our method.

  • Koki IGAWA, Masao YANAGISAWA, Nozomu TOGAWA
    原稿種別: PAPER
    2017 年 E100.A 巻 7 号 p. 1439-1451
    発行日: 2017/07/01
    公開日: 2017/07/01
    ジャーナル 認証あり

    In this paper, we propose a floorplan aware high-level synthesis algorithm with body biasing for delay variation compensation, which minimizes the average leakage energy of manufactured chips. In order to realize floorplan-aware high-level synthesis, we utilize huddle-based distributed register architecture (HDR architecture). HDR architecture divides the chip area into small partitions called a huddle and we can control a body bias voltage for every huddle. During high-level synthesis, we iteratively obtain expected leakage energy for every huddle when applying a body bias voltage. A huddle with smaller expected leakage energy contributes to reducing expected leakage energy of the entire circuit more but can increase the latency. We assign control-data flow graph (CDFG) nodes in non-critical paths to the huddles with larger expected leakage energy and those in critical paths to the huddles with smaller expected leakage energy. We expect to minimize the entire leakage energy in a manufactured chip without increasing its latency. Experimental results show that our algorithm reduces the average leakage energy by up to 39.7% without latency and yield degradation compared with typical-case design with body biasing.

  • Yutaka MASUDA, Takao ONOYE, Masanori HASHIMOTO
    原稿種別: PAPER
    2017 年 E100.A 巻 7 号 p. 1452-1463
    発行日: 2017/07/01
    公開日: 2017/07/01
    ジャーナル 認証あり

    Software-based error detection techniques, which includes error detection mechanism (EDM) transformation, are used for error localization in post-silicon validation. This paper evaluates the performance of EDM for timing error localization with a noise-aware logic simulator and 65-nm test chips assuming the following two EDM usage scenarios; (1) localizing a timing error occurred in the original program, and (2) localizing as many potential timing errors as possible. Simulation results show that the EDM transformation customized for quick error detection cannot locate electrical timing errors in the original program in the first scenario, but it detects 86% of non-masked errors potential bugs in the second scenario, which mean the EDM performance of detecting electrical timing errors affecting execution results is high. Hardware measurement results show that the EDM detects 25% of original timing errors and 56% of non-masked errors. Here, these hardware measurement results are not consistent with the simulation results. To investigate the reason, we focus on the following two differences between hardware and simulation; (1) design of power distribution network, and (2) definition of timing error occurrence frequency. We update the simulation setup for filling the difference and re-execute the simulation. We confirm that the simulation and the chip measurement results are consistent.

  • Shumpei MORITA, Song BIAN, Michihiro SHINTANI, Masayuki HIROMOTO, Taka ...
    原稿種別: PAPER
    2017 年 E100.A 巻 7 号 p. 1464-1472
    発行日: 2017/07/01
    公開日: 2017/07/01
    ジャーナル 認証あり

    Replacement of highly stressed logic gates with internal node control (INC) logics is known to be an effective way to alleviate timing degradation due to NBTI. We propose a path clustering approach to accelerate finding effective replacement gates. Upon the observation that there exist paths that always become timing critical after aging, critical path candidates are clustered to select representative path in each cluster. With efficient data structure to further reduce timing calculation, INC logic optimization has first became tractable in practical time. Through the experiments using a processor, 171x speedup has been demonstrated while retaining almost the same level of mitigation gain.

  • Takeshi IHARA, Toshiyuki HONGO, Atsushi TAKAHASHI, Chikaaki KODAMA
    原稿種別: PAPER
    2017 年 E100.A 巻 7 号 p. 1473-1480
    発行日: 2017/07/01
    公開日: 2017/07/01
    ジャーナル 認証あり

    Self-Aligned Quadruple Patterning (SAQP) is an important manufacturing technique for sub 14nm technology node. Although various routing algorithms for SAQP have been proposed, it is not easy to find a dense SAQP compliant routing pattern efficiently. Even though a grid for SAQP compliant routing pattern was proposed, it is not easy to find a valid routing pattern on the grid. The routing pattern of SAQP on the grid consists of three types of routing. Among them, third type has turn prohibition constraint on the grid. Typical routing algorithms often fail to find a valid routing for third type. In this paper, a simple directed grid-graph for third type is proposed. Valid SAQP compliant two dimensional routing patterns are found effectively by utilizing the proposed directed grid-graph. Experiments show that SAQP compliant routing patterns are found efficiently by our proposed method.

  • Yosuke KAKIUCHI, Kiyoharu HAMAGUCHI
    原稿種別: PAPER
    2017 年 E100.A 巻 7 号 p. 1481-1487
    発行日: 2017/07/01
    公開日: 2017/07/01
    ジャーナル 認証あり

    Verification of logic designs has been a long-standing bottleneck in the process of hardware design, where its automation and improvement of efficiency has demanding needs. Mainly simulation-based verification has been used for this purpose, and recently, coverage-driven verification has been widely used, of which target is improvement of some metric called coverage. Our target is the metric called toggle coverage. To find input patterns which cause some toggles on each signal, a SAT solver could be used, but this is computationally costly. In this paper, we study the effect of combination of random simulation and usage of a SAT solver. In particular, we use a SAT solver which can find multiple “diverse” solutions. With this solver, we can avoid generating similar patterns, which are unlikely to improve coverage. The experimental results show that, a small number of calls of a SAT solver can improve entire toggle coverage effectively, compared with simple random simulation.

  • Masayuki ARAI, Kazuhiko IWASAKI
    原稿種別: PAPER
    2017 年 E100.A 巻 7 号 p. 1488-1495
    発行日: 2017/07/01
    公開日: 2017/07/01
    ジャーナル 認証あり

    Shrinking feature sizes and higher levels of integration in semiconductor device manufacturing technologies are increasingly causing the gap between defect levels estimated in the design stage and reported ones for fabricated devices. In this paper, we propose a unified weighted fault coverage approach that includes both bridge and open faults, considering the critical area as the incident rate of each fault. We then propose a test pattern reordering scheme that incorporates our weighted fault coverage with an aim to reduce test costs. Here we apply a greedy algorithm to reorder test patterns generated by the bridge and stuck-at automatic test pattern generator (ATPG), evaluating the relationship between the number of patterns and the weighted fault coverage. Experimental results show that by applying this reordering scheme, the number of test patterns was reduced, on average, by approximately 50%. Our results also indicate that relaxing coverage constraints can drastically reduce test pattern set sizes to a level comparable to traditional 100% coverage stuck-at pattern sets, while targeting the majority of bridge faults and keeping the defect level to no more than 10 defective parts per milion (DPPM) with a 99% manufacturing yield.

  • Takahiro YAMAMOTO, Ittetsu TANIGUCHI, Hiroyuki TOMIYAMA, Shigeru YAMAS ...
    原稿種別: LETTER
    2017 年 E100.A 巻 7 号 p. 1496-1499
    発行日: 2017/07/01
    公開日: 2017/07/01
    ジャーナル 認証あり

    Approximate computing is considered as a promising approach to design of power- or area-efficient digital circuits. This paper proposes a systematic methodology for design and worst-case accuracy analysis of approximate array multipliers. Our methodology systematically designs a series of approximate array multipliers with different area, delay, power and accuracy characteristics so that an LSI designer can select the one which best fits to the requirements of her/his applications. Our experiments explore the trade-offs among area, delay, power and accuracy of the approximate multipliers.

  • Yining XU, Ittetsu TANIGUCHI, Hiroyuki TOMIYAMA
    原稿種別: LETTER
    2017 年 E100.A 巻 7 号 p. 1500-1502
    発行日: 2017/07/01
    公開日: 2017/07/01
    ジャーナル 認証あり

    Task mapping is one of the most important design processes in embedded manycore systems. This paper proposes a static task mapping technique for manycore real-time systems. The technique minimizes the number of cores while satisfying deadline constraints of individual tasks.

  • Kana SHIMADA, Shogo KITANO, Ittetsu TANIGUCHI, Hiroyuki TOMIYAMA
    原稿種別: LETTER
    2017 年 E100.A 巻 7 号 p. 1503-1505
    発行日: 2017/07/01
    公開日: 2017/07/01
    ジャーナル 認証あり

    Task scheduling is one of the most important processes in the design of multicore computing systems. This paper presents a technique for scheduling of malleable tasks. Our scheduling technique decides not only the execution order of the tasks but also the number of cores assigned to the individual tasks, simultaneously. We formulate the scheduling problem as an integer linear programming (ILP) problem, and the optimal schedule can be obtained by solving the ILP problem. Experiments using a standard task-set suite clarify the strength of this work.

  • Junghoon OH, Mineo KANEKO
    原稿種別: LETTER
    2017 年 E100.A 巻 7 号 p. 1506-1510
    発行日: 2017/07/01
    公開日: 2017/07/01
    ジャーナル 認証あり

    This letter proposes a heuristic algorithm to select check variables, which are points of comparison for error detection, for soft-error tolerant datapaths. Our soft-error tolerance scheme is based on check-and-retry computation and an efficient resource management named speculative resource sharing (SRS). Starting with the smallest set of check variables, the proposed algorithm repeats to add new check variable one by one incrementally and find the minimum latency solution among the series of generated solutions. During the process, each new check variable is selected so that the opportunity of SRS is enlarged. Experimental results show that improvements in latency are achieved compared with the choice of the smallest set of check variables.

  • Yusuke HIBINO, Hirofumi IKEO, Nagisa ISHIURA
    原稿種別: LETTER
    2017 年 E100.A 巻 7 号 p. 1511-1512
    発行日: 2017/07/01
    公開日: 2017/07/01
    ジャーナル 認証あり

    This letter presents a test suite CF3 designed to find bugs in arithmetic optimizers of C compilers. It consists of 13,720 test programs containing all the expression patterns covering all the permutations of 3 operators from 14 operators. CF3 detected more than 70 errors in GCC 4.2-4.5 within 2 hours.

Regular Section
feedback
Top