-
HIDEHARU AMANO
2013Volume E96.DIssue 12 Pages
2513
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
-
Xi ZHANG, Chuanyi LIU, Zhenyu LIU, Dongsheng WANG
Article type: PAPER
2013Volume E96.DIssue 12 Pages
2514-2523
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
As the number of concurrently running applications on the chip multiprocessors (CMPs) is increasing, efficient management of the shared last-level cache (LLC) is crucial to guarantee overall performance. Recent studies have shown that cache partitioning can provide benefits in throughput, fairness and quality of service. Most prior arts apply true Least Recently Used (LRU) as the underlying cache replacement policy and rely on its stack property to work properly. However, in commodity processors, pseudo-LRU policies without stack property are commonly used instead of LRU for their simplicity and low storage overhead. Therefore, this study sets out to understand whether LRU-based cache partitioning techniques can be applied to commodity processors. In this work, we instead propose a cache partitioning mechanism for two popular pseudo-LRU policies: Not Recently Used (NRU) and Binary Tree (BT). Without the help of true LRU's stack property, we propose a profiling logic that applies curve approximation methods to derive the hit curve (hit counts under varied way allocations) for an application. We then propose a hybrid partitioning mechanism, which mitigates the gap between the predicted hit curve and the actual statistics. Simulation results demonstrate that our proposal can improve throughput by 15.3% on average and outperforms the
stack-estimate proposal by 12.6% on average. Similar results can be achieved in weighted speedup. For the cache configurations under study, it requires less than 0.5% storage overhead compared to the last-level cache. In addition, we also show that profiling mechanism with only one true LRU ATD achieves comparable performance and can further reduce the hardware cost by nearly two thirds compared with the hybrid mechanism.
View full abstract
-
Shouyi YIN, Rui SHI, Leibo LIU, Shaojun WEI
Article type: PAPER
2013Volume E96.DIssue 12 Pages
2524-2535
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
Coarse-grained Reconfigurable Architecture (CGRA) is a parallel computing platform that provides both high performance of hardware and high flexibility of software. It is becoming a promising platform for embedded and mobile applications. Since the embedded and mobile devices are usually battery-powered, improving battery lifetime becomes one of the primary design issues in using CGRAs. In this paper, we propose a battery-aware task-mapping method to optimize energy consumption and improve battery lifetime. The proposed method mainly addresses two problems: task partitioning and task scheduling when mapping applications onto CGRA. The task partitioning and scheduling are formulated as a joint optimization problem of minimizing the energy consumption. The nonlinear effects of real battery are taken into account in problem formulation. Using the insights from the problem formulation, we design the task-mapping algorithm. We have used several real-world benchmarks to test the effectiveness of the proposed method. Experiment results show that our method can dramatically lower the energy consumption and prolong the battery-life.
View full abstract
-
Noboru TANABE, Atsushi OHTA
Article type: PAPER
2013Volume E96.DIssue 12 Pages
2536-2544
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
Most of scientists except computer scientists do not want to make efforts for performance tuning with rewriting their MPI applications. In addition, the number of processing elements which can be used by them is increasing year by year. On large-scale parallel systems, the number of accumulated messages on a message buffer tends to increase in some of their applications. Since searching message queue in MPI is time-consuming, system side scalable acceleration is needed for those systems. In this paper, a support function named LHS (Limited-length Head Separation) is proposed. Its performance in searching message buffer and hardware cost are evaluated. LHS accelerates searching message buffer by means of switching location to store limited-length heads of messages. It uses the effects such as increasing hit rate of cache on host with partial off-loading to hardware. Searching speed of message buffer when the order of message reception is different from the receiver's expectation is accelerated 14.3 times with LHS on FPGA-based network interface card (NIC) named DIMMnet-2. This absolute performance is 38.5 times higher than that of IBM BlueGene/P although the frequency is 8.5times slower than BlueGene/P. LHS has higher scalability than ALPU in the performance per frequency. Since these results are obtained with partially on loaded linear searching on old Pentium®4, performance gap will increase using state of art CPU. Therefore, LHS is more suitable for larger parallel systems. The discussions for adopting proposed method to state of art processors and systems are also presented.
View full abstract
-
Ahmadou Dit Adi CISSE, Michihiro KOIBUCHI, Masato YOSHIMI, Hidetsugu I ...
Article type: PAPER
2013Volume E96.DIssue 12 Pages
2545-2554
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
Silicon photonics Network-on-Chips (NoCs) have emerged as an attractive solution to alleviate the high power consumption of traditional electronic interconnects. In this paper, we propose a fully optical ring NoC that combines static and dynamic wavelength allocation communication mechanisms. A different wavelength-channel is statically allocated to each destination node for light weight communication. Contention of simultaneous communication requests from multiple source nodes to the destination is solved by a token based arbitration for the particular wavelength-channel. For heavy load communication, a multiwavelength-channel is available by requesting it in execution time from source node to a special node that manages dynamic allocation of the shared multiwavelength-channel among all nodes. We combine these static and dynamic communication mechanisms in a same network that introduces selection techniques based on message size and congestion information. Using a photonic NoC simulator based on Phoenixsim, we evaluate our architecture under uniform random, neighbor, and hotspot traffic patterns. Simulation results show that our proposed fully optical ring NoC presents a good performance by utilizing adequate static and dynamic channels based on the selection techniques. We also show that our architecture can reduce by more than half, the energy consumption necessary for arbitration compared to hybrid photonic ring and mesh NoCs. A comparison with several previous works in term of architecture hardware cost shows that our architecture can be an attractive cost-performance efficient interconnection infrastructure for future SoCs and CMPs.
View full abstract
-
Takashi KITAMURA, Keishi OKAMOTO
Article type: PAPER
2013Volume E96.DIssue 12 Pages
2555-2564
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
In this paper, we propose and implement an automated route planning framework for milk-run transport logistics by applying model checking techniques. First, we develop a formal specification framework for milk-run transport logistics. The framework adopts LTL (Linear Temporal Logic), a language based on temporal logics, as a specification language for users to be able to flexibly and formally specify complex delivery requirements for trucks. Then by applying the bounded semantics of LTL, the framework then defines the notion of “optimal truck routes”, which mean truck routes on a given route map that satisfy given delivery requirements (specified by LTL) with the minimum cost. We implement the framework as an automated route planner using the NuSMV model checker, a state-of-the-art bounded model checker. The automated route planner, given route map and delivery requirements, automatically finds optimal trucks routes on the route map satisfying the given delivery requirements. The feasibility of the implementation design is investigated by analysing its computational complexity and by showing experimental results.
View full abstract
-
Putthiphong KIRDPIPAT, Sakchai THIPCHAKSURAT
Article type: PAPER
2013Volume E96.DIssue 12 Pages
2565-2574
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
Route discovery process is a major mechanism in the most routing protocols in Mobile Ad Hoc Network (MANET). Routing overhead is one of the problems caused by broadcasting the route discovery packet. To reduce the routing overhead, the location-based routing schemes have been proposed. In this paper, we propose our scheme called Location-based Routing scheme with Adaptive Request Zone (LoRAReZ). In LoRAReZ scheme, the size of expected zone is set adaptively depending on the distance between source and destination nodes. Computer simulation has been conducted to show the effectiveness of our propose scheme. We evaluate the performances of LoRAReZ scheme in the terms of packet delivery fraction (PDF), routing overhead, average end-to-end delay, throughput, packet collision, average hop count, average route setup time, and power consumption. We compare those performance metrics with those of Location Aided Routing (LAR) and Location Aware Routing Protocol with Dynamic Adaptation of Request Zone (LARDAR) protocols. The simulation results show that LoRAReZ can provide all the better performances among those of LAR and LARDAR schemes.
View full abstract
-
Keisuke IWAI, Naoki NISHIKAWA, Takakazu KUROKAWA
Article type: PAPER
2013Volume E96.DIssue 12 Pages
2575-2586
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
Many-core computer systems with GPUs are coming into mainstream use from high-end computing, including supercomputers, to embedded processors. Consequently, the implementation of cryptographic methods on GPGPU is also becoming popular because of such systems' performance. However, many factors affect the performance of GPUs. To cope with this problem, we developed a new translator, HiCrypt, which can generate an optimized GPGPU program written in both of CUDA and OpenCL from a cipher program written in standard C language with directives. Users must annotate only variables and an encoding/decoding function, which are characteristics of cipher programs, with directives. To evaluate the translator, five representative cipher programs are translated into CUDA and OpenCL programs by the translator. Generated programs perform high throughput almost identical to hand optimized programs for all five cipher programs. HiCrypt will contribute to development and evaluate of new and various symmetric block ciphers using GPGPU.
View full abstract
-
Jun CHAI, Mei WEN, Nan WU, Dafei HUANG, Jing YANG, Xing CAI, Chunyuan ...
Article type: PAPER
2013Volume E96.DIssue 12 Pages
2587-2595
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
This paper presents a study of the applicability of clusters of GPUs to high-resolution 3D simulations of cardiac electrophysiology. By experimenting with representative cardiac cell models and ODE solvers, in association with solving the monodomain equation, we quantitatively analyze the obtainable computational capacity of GPU clusters. It is found that for a 501×501×101 3D mesh, which entails a 0.1mm spatial resolution, a 128-GPU cluster only needs a few minutes to carry out a 100,000-time-step cardiac excitation simulation that involves a four-variable cell model. Even higher spatial and temporal resolutions are achievable for such simplified mathematical models. On the other hand, our experiments also show that a dramatically larger cluster of GPUs is needed to handle a very detailed cardiac cell model.
View full abstract
-
Yasuaki ITO, Koji NAKANO
Article type: PAPER
2013Volume E96.DIssue 12 Pages
2596-2603
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
This paper presents a GPU (Graphics Processing Units) implementation of dynamic programming for the optimal polygon triangulation. Recently, GPUs can be used for general purpose parallel computation. Users can develop parallel programs running on GPUs using programming architecture called CUDA (Compute Unified Device Architecture) provided by NVIDIA. The optimal polygon triangulation problem for a convex polygon is an optimization problem to find a triangulation with minimum total weight. It is known that this problem for a convex
n-gon can be solved using the dynamic programming technique in
O(
n3) time using a work space of size
O(
n2). In this paper, we propose an efficient parallel implementation of this
O(
n3)-time algorithm on the GPU. In our implementation, we have used two new ideas to accelerate the dynamic programming. The first idea (adaptive granularity) is to partition the dynamic programming algorithm into many sequential kernel calls of CUDA, and to select the best parameters for the size and the number of blocks for each kernel call. The second idea (sliding and mirroring arrangements) is to arrange the working data for coalesced access of the global memory in the GPU to minimize the memory access overhead. Our implementation using these two ideas solves the optimal polygon triangulation problem for a convex 8192-gon in 5.57 seconds on the NVIDIA GeForce GTX 680, while a conventional CPU implementation runs in 1939.02 seconds. Thus, our GPU implementation attains a speedup factor of 348.02.
View full abstract
-
Fumihiko INO, Shinta NAKAGAWA, Kenichi HAGIHARA
Article type: PAPER
2013Volume E96.DIssue 12 Pages
2604-2616
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
This paper presents a stream programming framework, named GPU-chariot, for accelerating stream applications running on graphics processing units (GPUs). The main contribution of our framework is that it realizes efficient software pipelines on multi-GPU systems by enabling out-of-order execution of CPU functions, kernels, and data transfers. To achieve this out-of-order execution, we apply a runtime scheduler that not only maximizes the utilization of system resources but also encapsulates the number of GPUs available in the system. In addition, we implement a load-balancing capability to flow data efficiently through multiple GPUs. Furthermore, a callback interface enables overlapping execution of functions in third-party libraries. By using kernels with different performance bottlenecks, we show that our out-of-order execution is up to 20% faster than in-order execution. Finally, we conduct several case studies on a 4-GPU system and demonstrate the advantages of GPU-chariot over a manually pipelined code. We conclude that GPU-chariot can be useful when developing stream applications with software pipelines on multiple GPUs and CPUs.
View full abstract
-
Akihiko KASAGI, Koji NAKANO, Yasuaki ITO
Article type: PAPER
2013Volume E96.DIssue 12 Pages
2617-2625
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
The Discrete Memory Machine (DMM) is a theoretical parallel computing model that captures the essence of the shared memory access of GPUs. Bank conflicts should be avoided for maximizing the bandwidth of the shared memory access. Offline permutation of an array is a task to copy all elements in array
a into array
b along a permutation given in advance. The main contribution of this paper is to implement a conflict-free permutation algorithm on the DMM in a GPU. We have also implemented straightforward permutation algorithms on the GPU. The experimental results for 1024 double (64-bit) numbers on NVIDIA GeForce GTX-680 show that the straightforward permutation algorithm takes 247.8 ns for the random permutation and 1684ns for the worst permutation that involves the maximum bank conflicts. Our conflict-free permutation algorithm runs in 167ns for any permutation including the random permutation and the worst permutation, although it performs more memory accesses. It follows that our conflict-free permutation is 1.48 times faster for the random permutation and 10.0 times faster for the worst permutation.
View full abstract
-
Koji NAKANO
Article type: PAPER
2013Volume E96.DIssue 12 Pages
2626-2634
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
The main contribution of this paper is to show optimal parallel algorithms to compute the sum, the prefix-sums, and the summed area table on two memory machine models, the Discrete Memory Machine (DMM) and the Unified Memory Machine (UMM). The DMM and the UMM are theoretical parallel computing models that capture the essence of the shared memory and the global memory of GPUs. These models have three parameters, the number
p of threads, and the width
w of the memory, and the memory access latency
l. We first show that the sum of
n numbers can be computed in $O({n\over w}+{nl\over p}+l\log n)$ time units on the DMM and the UMM. We then go on to show that $\Omega({n\over w}+{nl\over p}+l\log n)$ time units are necessary to compute the sum. We also present a parallel algorithm that computes the prefix-sums of
n numbers in $O({n\over w}+{nl\over p}+l\log n)$ time units on the DMM and the UMM. Finally, we show that the summed area table of size $\sqrt{n}\times\sqrt{n}$ can be computed in $O({n\over w}+{nl\over p}+l\log n)$ time units on the DMM and the UMM. Since the computation of the prefix-sums and the summed area table is at least as hard as the sum computation, these parallel algorithms are also optimal.
View full abstract
-
Tianyang DONG, Jianwei SHI, Jing FAN, Ling ZHANG
Article type: PAPER
2013Volume E96.DIssue 12 Pages
2635-2644
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
Rule engine technologies have been widely used in the development of enterprise information systems. However, these rule-based systems may suffer the problem of low performance, when there is a large amount of facts data to be matched with the rules. The way of cluster or grid to construct rule engines can flexibly expand system processing capability by increasing cluster scale, and acquire shorter response time. In order to speed up pattern matching in rule engine, a double hash filter approach for alpha network, combined with beta node indexing, is proposed to improve Rete algorithm in this paper. By using fact type node in Rete network, a hash map about ‘fact type - fact type node’ is built in root node, and hash maps about ‘attribute constraint - alpha node’ are constructed in fact type nodes. This kind of double hash mechanism can speed up the filtration of facts in alpha network. Meanwhile, hash tables with the indexes calculated through fact objects, are built in memories of beta nodes, to avoid unnecessary iteration in the join operations of beta nodes. In addition, rule engine based on this improved Rete algorithm is applied in the enterprise information systems. The experimental results show that this method can effectively speed up the pattern matching, and significantly decrease the response time of the application systems.
View full abstract
-
Masaki KOHANA, Shusuke OKAMOTO, Atsuko IKEGAMI
Article type: PAPER
2013Volume E96.DIssue 12 Pages
2645-2653
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
This paper describes a near-optimal allocation method for web-based multi-player online role-playing games (MORPGs), which must be able to cope with a large number of users and high frequency of user requests. Our previous work introduced a dynamic data reallocation method. It uses multiple web servers and divides the entire game world into small blocks. Each ownership of block is allocated to a web server. Additionally, the ownership is reallocated to the other web server according to the user's requests. Furthermore, this block allocation was formulated as a combinational optimization problem. And a simulation based experiment with an exact algorithm showed that our system could achieve 31% better than an ad-hoc approach. However, the exact algorithm takes too much time to solve a problem when the problem size is large. This paper proposes a meta-heuristic approach based on a tabu search to solve a problem quickly. A simulation result shows that our tabu search algorithm can generate solutions, whose average correctness is only 1% different from that of the exact algorithm. In addition, the average calculation time for 50 users on a system with five web servers is about 25.67 msec while the exact algorithm takes about 162 msec. An evaluation for a web-based MORPG system with our tabu search shows that it could achieve 420 users capacity while 320 for our previous system.
View full abstract
-
Hui ZHAO, Shuqiang YANG, Hua FAN, Zhikun CHEN, Jinghu XU
Article type: PAPER
2013Volume E96.DIssue 12 Pages
2654-2662
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
Scheduling plays a key role in MapReduce systems. In this paper, we explore the efficiency of an MapReduce cluster running lots of independent and continuously arriving MapReduce jobs. Data locality and load balancing are two important factors to improve computation efficiency in MapReduce systems for data-intensive computations. Traditional cluster scheduling technologies are not well suitable for MapReduce environment, there are some in-used schedulers for the popular open-source Hadoop MapReduce implementation, however, they can not well optimize both factors. Our main objective is to minimize total flowtime of all jobs, given it's a strong NP-hard problem, we adopt some effective heuristics to seek satisfied solution. In this paper, we formalize the scheduling problem as job selection problem, a load balance aware job selection algorithm is proposed, in task level we design a strict data locality tasks scheduling algorithm for map tasks on map machines and a load balance aware scheduling algorithm for reduce tasks on reduce machines. Comprehensive experiments have been conducted to compare our scheduling strategy with well-known Hadoop scheduling strategies. The experimental results validate the efficiency of our proposed scheduling strategy.
View full abstract
-
Takahiro HIROFUCHI, Mauricio TSUGAWA, Hidemoto NAKADA, Tomohiro KUDOH, ...
Article type: PAPER
2013Volume E96.DIssue 12 Pages
2663-2674
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
Wide-area VM migration is a technology with potential to aid IT services recovery since it can be used to evacuate virtualized servers to safe locations upon a critical disaster. However, the amount of data involved in a wide-area VM migration is substantially larger compared to VM migrations within LAN due to the need to transfer virtualized storage in addition to memory and CPU states. This increase of data makes it challenging to relocate VMs under a limited time window with electrical power. In this paper, we propose a mechanism to improve live storage migration across WAN. The key idea is to reduce the amount of data to be transferred by proactively caching virtual disk blocks to a backup site during regular VM operation. As a result of pre-cached disk blocks, the proposed mechanism can dramatically reduce the amount of data and consequently the time required to live migrate the entire VM state. The mechanism was evaluated using a prototype implementation under different workloads and network conditions, and we confirmed that it dramatically reduces the time to complete a VM live migration. By using the proposed mechanism, it is possible to relocate a VM from Japan to the United States in just under 40 seconds. This relocation would otherwise take over 1500 seconds, demonstrating that the proposed mechanism was able to reduce the migration time by 97.5%.
View full abstract
-
Ryousei TAKANO, Hidemoto NAKADA, Takahiro HIROFUCHI, Yoshio TANAKA, To ...
Article type: PAPER
2013Volume E96.DIssue 12 Pages
2675-2683
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
A virtual machine (VM) migration is useful for improving flexibility and maintainability in cloud computing environments. However, VM monitor (VMM)-bypass I/O technologies, including PCI passthrough and SR-IOV, in which the overhead of I/O virtualization can be significantly reduced, make VM migration impossible. This paper proposes a novel and practical mechanism, called Symbiotic Virtualization (SymVirt), for enabling migration and checkpoint/restart on a virtualized cluster with VMM-bypass I/O devices, without the virtualization overhead during normal operations. SymVirt allows a VMM to cooperate with a message passing layer on the guest OS, then it realizes VM-level migration and checkpoint/restart by using a combination of a user-level dynamic device configuration and coordination of distributed VMMs. We have implemented the proposed mechanism on top of QEMU/KVM and the Open MPI system. All PCI devices, including Infiniband, Ethernet, and Myrinet, are supported without implementing specific para-virtualized drivers; and it is not necessary to modify either of the MPI runtime and applications. Using the proposed mechanism, we demonstrate reactive and proactive FT mechanisms on a virtualized Infiniband cluster. We have confirmed the effectiveness using both a memory intensive micro benchmark and the NAS parallel benchmark.
View full abstract
-
Naoya MAKI, Ryoichi SHINKUMA, Tatsuro TAKAHASHI
Article type: PAPER
2013Volume E96.DIssue 12 Pages
2684-2695
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
Our prior papers proposed a traffic engineering scheme to further localize traffic in peer-assisted content delivery networks (CDNs). This scheme periodically combines the content files and allows them to obtain the combined content files while keeping the price unchanged from the single-content price in order to induce altruistic clients to download content files that are most likely to contribute to localizing network traffic. However, the selection algorithm in our prior work determined which and when content files should be combined according to the cache states of all clients, which is a kind of unrealistic assumption in terms of computational complexity. This paper proposes a new concept of virtual local server to reduce the computational complexity. We could say that the source server in our mechanism has a virtual caching network inside that reflects the cache states of all clients in the ‘actual’ caching network and combines content files based on the virtual caching network. In this paper, without determining virtual caching network according to the cache states of all clients, we approximately estimated the virtual caching network from the cache states of the virtual local server of the local domain, which is the aggregated cache state of only altruistic clients in a local domain. Furthermore, we proposed a content selection algorithm based on a virtual caching network. In this paper, we used news life-cycle model as a content model that had the severe changes in cache states, which was a striking instance of dynamic content models. Computer simulations confirmed that our proposed algorithm successfully localized network traffic.
View full abstract
-
Ervianto ABDULLAH, Satoshi FUJITA
Article type: PAPER
2013Volume E96.DIssue 12 Pages
2696-2703
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
Recently Peer-to-Peer Content Delivery Networks (P2P CDNs) have attracted considerable attention as a cost-effective way to disseminate digital contents to paid users in a scalable and dependable manner. However, due to its peer-to-peer nature, it faces threat from “colluders” who paid for the contents but illegally share them with unauthorized peers. This means that the detection of colluders is a crucial task for P2P CDNs to preserve the right of contents holders and paid users. In this paper, we propose two colluder detection schemes for P2P CDNs. The first scheme is based on the reputation collected from all peers participating in the network and the second scheme improves the quality of colluder identification by using a technique which is well known in the field of system level diagnosis. The performance of the schemes is evaluated by simulation. The simulation results indicate that even when 10% of authorized peers are colluders, our schemes identify all colluders without causing misidentifications.
View full abstract
-
Fang ZUO, Wei ZHANG
Article type: PAPER
2013Volume E96.DIssue 12 Pages
2704-2712
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
In P2P applications, networks are formed by devices belonging to independent users. Therefore, routing hotspots or routing congestions are typically created by an unanticipated new event that triggers an unanticipated surge of users to request streaming service from some particular nodes; and a challenging problem is how to provide incentive mechanisms to allocation bandwidth more fairly in order to avoid congestion and other short backs for P2P QoS. In this paper, we study P2P bandwidth game — the bandwidth allocation in P2P networks. Unlike previous works which focus either on routing or on forwarding, this paper investigates the game theoretic mechanism to incentivize node's real bandwidth demands and propose novel method that avoid congestion proactively, that is, prior to a congestion event. More specifically, we define an incentive-compatible pricing vector explicitly and give theoretical proofs to demonstrate that our mechanism can provide incentives for nodes to tell the true bandwidth demand. In order to apply this mechanism to the P2P distribution applications, we evaluate our mechanism by NS-2 simulations. The simulation results show that the incentive pricing mechanism can distribute the bandwidth fairly and effectively and can also avoid the routing hotspot and congestion effectively.
View full abstract
-
Ryusuke UEDERA, Satoshi FUJITA
Article type: PAPER
2013Volume E96.DIssue 12 Pages
2713-2719
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
In this paper, we propose a new buffer map notification scheme for Peer-to-Peer Video-on-Demand systems (P2P VoDs) which support VCR operations such as fast-forward, fast-backward, and seek. To enhance the fluidity of such VCR operations, we need to refine the size of each piece as small as possible. However, such a refinement significantly degrades the performance of buffer map notification schemes with respect to the overhead, piece availability and the efficiency of resource utilizations. The basic idea behind our proposed scheme is to use a piece-based buffer map with a segment-based buffer map in a complementary manner. The result of simulations indicates that the proposed scheme certainly increases the accuracy of the information on the piece availability in the neighborhood with a sufficiently low cost, which reduces the intermittent waiting time of each peer by more than 40% even under a situation in which 50% of peers conduct the fast-forward operation over a range of 30% of the entire video.
View full abstract
-
Cheol-Ho HONG, Chuck YOO
Article type: LETTER
2013Volume E96.DIssue 12 Pages
2720-2723
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
In this paper, we propose a synchronization-aware VM scheduler for parallel applications in Xen. The proposed scheduler prevents threads from waiting for a significant amount of time during synchronization. For this purpose, we propose an identification scheme that can identify the threads that have awaited other threads for a long time. In this scheme, a detection module that can infer the internal status of guest OSs was developed. We also present a scheduling policy that can accelerate bottlenecks of concurrent VMs. We implemented our VM scheduler in the recent Xen hypervisor with para-virtualized Linux-based operating systems. We show that our approach can improve the performance of concurrent VMs by up to 43% as compared to the credit scheduler.
View full abstract
-
Yan-Tsung PENG, Fan-Chieh CHENG, Shanq-Jang RUAN
Article type: LETTER
2013Volume E96.DIssue 12 Pages
2724-2725
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
Display devices play image files, of which contrast enhancement methods are usually employed to bring out visual details to achieve better visual quality. However, applied to high resolution images, the contrast enhancement method entails high computation costs mostly due to histogram computations. Therefore, this letter proposes a parallel histogram calculation algorithm using the column histograms and difference histograms to reduce histogram computations. Experimental results show that the proposed algorithm is effective for histogram-based image contrast enhancement.
View full abstract
-
Guifang SHAO, Wupeng HONG, Tingna WANG, Yuhua WEN
Article type: PAPER
Subject area: Fundamentals of Information Systems
2013Volume E96.DIssue 12 Pages
2726-2732
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
An improved genetic algorithm is employed to optimize the structure of (C
60)
N (N≤25) fullerene clusters with the lowest energy. First, crossover with variable precision, realized by introducing the hamming distance, is developed to provide a faster search mechanism. Second, the bit string mutation and feedback mutation are incorporated to maintain the diversity in the population. The interaction between C
60 molecules is described by the Pacheco and Ramalho potential derived from first-principles calculations. We compare the performance of the Improved GA (IGA) with that of the Standard GA (SGA). The numerical and graphical results verify that the proposed approach is faster and more robust than the SGA. The second finite differential of the total energy shows that the (C
60)
N clusters with N=7, 13, 22 are particularly stable. Performance with the lowest energy is achieved in this work.
View full abstract
-
Yasuhiro TAJIMA
Article type: PAPER
Subject area: Fundamentals of Information Systems
2013Volume E96.DIssue 12 Pages
2733-2742
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
We show teachability of a subclass of simple deterministic languages. The subclass we define is called stack uniform simple deterministic languages. Teachability is derived by showing the query learning algorithm for this language class. Our learning algorithm uses membership, equivalence and superset queries. Then, it terminates in polynomial time. It is already known that simple deterministic languages are polynomial time query learnable by context-free grammars. In contrast, our algorithm guesses a hypothesis by a stack uniform simple deterministic grammar, thus our result is strict teachability of the subclass of simple deterministic languages. In addition, we discuss parameters of the polynomial for teachability. The “thickness” is an important parameter for parsing and it should be one of parameters to evaluate the time complexity.
View full abstract
-
Hiroshi MATSUURA
Article type: PAPER
Subject area: Fundamentals of Information Systems
2013Volume E96.DIssue 12 Pages
2743-2752
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
There is a well known Steiner tree algorithm called minimum-cost paths heuristic (MPH), which is used for many multicast network operations and is considered a benchmark for other Steiner tree algorithms. MPH's average case time complexity is
O(
m(
l+
nlog
n)), where
m is the number of end nodes,
n is the number of nodes, and
l is the number of links in the network, because MPH has to run Dijkstra's algorithm as many times as the number of end nodes. The author recently proposed a Steiner tree algorithm called branch-based multi-cast (BBMC), which produces exactly the same multicast tree as MPH in a constant processing time irrespective of the number of multicast end nodes. However, the theoretical result for the average case time complexity of BBMC was expressed as
O(log
m(
l+
nlog
n)) and could not accurately reflect the above experimental result. This paper proves that the average case time complexity of BBMC can be shortened to
O(
l+
nlog
n), which is independent of the number of end nodes, when there is an upper limit of the node degree, which is the number of links connected to a node. In addition, a new parameter
β is applied to BBMC, so that the multicast tree created by BBMC has less links on it. Even though the tree costs increase due to this parameter, the tree cost increase rates are much smaller than the link decrease rates.
View full abstract
-
Hao ZHANG, Hiroki MATSUTANI, Yasuhiro TAKE, Tadahiro KURODA, Hideharu ...
Article type: PAPER
Subject area: Computer System
2013Volume E96.DIssue 12 Pages
2753-2764
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
We propose low-power techniques for wireless three-dimensional Network-on-Chips (wireless 3-D NoCs), in which the connections among routers on the same chip are wired while the routers on different chips are connected wirelessly using inductive-coupling. The proposed low-power techniques stop the clock and power supplies to the transmitter of the wireless vertical links only when their utilizations are higher than the threshold. Meanwhile, the whole wireless vertical link will be shut down when the utilization is lower than the threshold in order to reduce the power consumption of wireless 3-D NoCs. This paper uses an on-demand method, in which the dormant data transmitter or the whole vertical link will be activated as long as a flit comes. Full-system many-core simulations using power parameters derived from a real chip implementation show that the proposed low-power techniques reduce the power consumption by 23.4%-29.3%, while the performance overhead is less than 2.4%.
View full abstract
-
Lei GUO, Yuhua TANG, Yong DOU, Yuanwu LEI, Meng MA, Jie ZHOU
Article type: PAPER
Subject area: Computer System
2013Volume E96.DIssue 12 Pages
2765-2775
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
The effective bandwidth of the dynamic random-access memory (DRAM) for the alternate row-wise/column-wise matrix access (AR/CMA) mode, which is a basic characteristic in scientific and engineering applications, is very low. Therefore, we propose the window memory layout scheme (WMLS), which is a matrix layout scheme that does not require transposition, for AR/CMA applications. This scheme maps one row of a logical matrix into a rectangular memory window of the DRAM to balance the bandwidth of the row- and column-wise matrix access and to increase the DRAM IO bandwidth. The optimal window configuration is theoretically analyzed to minimize the total number of no-data-visit operations of the DRAM. Different WMLS implementationsare presented according to the memory structure of field-programmable gata array (FPGA), CPU, and GPU platforms. Experimental results show that the proposed WMLS can significantly improve DRAM bandwidth for AR/CMA applications. achieved speedup factors of 1.6× and 2.0× are achieved for the general-purpose CPU and GPU platforms, respectively. For the FPGA platform, the WMLS DRAM controller is custom. The maximum bandwidth for the AR/CMA mode reaches 5.94 GB/s, which is a 73.6% improvement compared with that of the traditional row-wise access mode. Finally, we apply WMLS scheme for Chirp Scaling SAR application, comparing with the traditional access approach, the maximum speedup factors of 4.73X, 1.33X and 1.56X can be achieved for FPGA, CPU and GPU platform, respectively.
View full abstract
-
Boseon YU, Hyunduk KIM, Wonik CHOI, Dongseop KWON
Article type: PAPER
Subject area: Data Engineering, Web Information Systems
2013Volume E96.DIssue 12 Pages
2776-2785
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
Recently, various research efforts have been conducted to develop strategies for accelerating multi-dimensional query processing using the graphics processing units (GPUs). However, well-known multi-dimensional access methods such as the R-tree, B-tree, and their variants are hardly applicable to GPUs in practice, mainly due to the characteristics of a hierarchical index structure. More specifically, the hierarchical structure not only causes frequent transfers of small volumes of data but also provides limited opportunity to exploit the advanced data parallelism of GPUs. To address these problems, we propose an approach that uses GPUs as a buffer. The main idea is that object entries in recently visited leaf nodes are buffered in the global memory of GPUs and processed by massive parallel threads of the GPUs. Through extensive performance studies, we observed that the proposed approach achieved query performance up to five times higher than that of the original R-tree.
View full abstract
-
Xiang WANG, Yan JIA, Ruhua CHEN, Hua FAN, Bin ZHOU
Article type: PAPER
Subject area: Artificial Intelligence, Data Mining
2013Volume E96.DIssue 12 Pages
2786-2794
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
Text categorization, especially short text categorization, is a difficult and challenging task since the text data is sparse and multidimensional. In traditional text classification methods, document texts are represented with “Bag of Words (BOW)” text representation schema, which is based on word co-occurrence and has many limitations. In this paper, we mapped document texts to Wikipedia concepts and used the Wikipedia-concept-based document representation method to take the place of traditional BOW model for text classification. In order to overcome the weakness of ignoring the semantic relationships among terms in document representation model and utilize rich semantic knowledge in Wikipedia, we constructed a semantic matrix to enrich Wikipedia-concept-based document representation. Experimental evaluation on five real datasets of long and short text shows that our approach outperforms the traditional BOW method.
View full abstract
-
Quoc Huy DO, Seiichi MITA, Keisuke YONEDA
Article type: PAPER
Subject area: Artificial Intelligence, Data Mining
2013Volume E96.DIssue 12 Pages
2795-2804
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
This paper proposes a novel practical path planning framework for autonomous parking in cluttered environments with narrow passages. The proposed global path planning method is based on an improved Fast Marching algorithm to generate a path while considering the moving forward and backward maneuver. In addition, the Support Vector Machine is utilized to provide the maximum clearance from obstacles considering the vehicle dynamics to provide a safe and feasible path. The algorithm considers the most critical points in the map and the complexity of the algorithm is not affected by the shape of the obstacles. We also propose an autonomous parking scheme for different parking situation. The method is implemented on autonomous vehicle platform and validated in the real environment with narrow passages.
View full abstract
-
Yan LI, Zhen QIN, Weiran XU, Heng JI, Jun GUO
Article type: PAPER
Subject area: Pattern Recognition
2013Volume E96.DIssue 12 Pages
2805-2813
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
Text sentiment classification aims to automatically classify subjective documents into different sentiment-oriented categories (e.g. positive/negative). Given the high dimensionality of features describing documents, how to effectively select the most useful ones, referred to as sentiment-bearing features, with a lack of sentiment class labels is crucial for improving the classification performance. This paper proposes an unsupervised sentiment-bearing feature selection method (USFS), which incorporates sentiment discriminant analysis (SDA) into sentiment strength calculation (SSC). SDA applies traditional linear discriminant analysis (LDA) in an unsupervised manner without losing local sentiment information between documents. We use SSC to calculate the overall sentiment strength for each single feature based on its affinities with some sentiment priors. Experiments, performed using benchmark movie reviews, demonstrated the superior performance of USFS.
View full abstract
-
Jing-Chao LI, Yi-Bing LI, Shouhei KIDERA, Tetsuo KIRIMOTO
Article type: PAPER
Subject area: Pattern Recognition
2013Volume E96.DIssue 12 Pages
2814-2819
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
As a consequence of recent developments in communications, the parameters of communication signals, such as the modulation parameter values, are becoming unstable because of time-varying SNR under electromagnetic conditions. In general, it is difficult to classify target signals that have time-varying parameters using traditional signal recognition methods. To overcome this problem, this study proposes a novel recognition method that works well even for such time-dependent communication signals. This method is mainly composed of feature extraction and classification processes. In the feature extraction stage, we adopt Shannon entropy and index entropy to obtain the stable features of modulated signals. In the classification stage, the interval gray relation theory is employed as suitable for signals with time-varying parameter spaces. The advantage of our method is that it can deal with time-varying SNR situations, which cannot be handled by existing methods. The results from numerical simulation show that the proposed feature extraction algorithm, based on entropy characteristics in time-varying SNR situations,offers accurate clustering performance, and the classifier, based on interval gray relation theory, can achieve a recognition rate of up to 82.9%, even when the SNR varies from -10 to -6 dB.
View full abstract
-
Huawei TIAN, Yao ZHAO, Zheng WANG, Rongrong NI, Lunming QIN
Article type: PAPER
Subject area: Image Processing and Video Processing
2013Volume E96.DIssue 12 Pages
2820-2829
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
With the rapid development of multi-view video coding (MVC) and light field rendering (LFR), Free-View Television (FTV) has emerged as new entrainment equipment, which can bring more immersive and realistic feelings for TV viewers. In FTV broadcasting system, the TV-viewer can freely watch a realistic arbitrary view of a scene generated from a number of original views. In such a scenario, the ownership of the multi-view video should be verified not only on the original views, but also on any virtual view. However, capacities of existing watermarking schemes as copyright protection methods for LFR-based FTV are only one bit, i.e., presence or absence of the watermark, which seriously impacts its usage in practical scenarios. In this paper, we propose a robust multi-bit watermarking scheme for LFR-based free-view video. The direct-sequence code division multiple access (DS-CDMA) watermark is constructed according to the multi-bit message and embedded into DCT domain of each view frame. The message can be extracted bit-by-bit from a virtual frame generated at an arbitrary view-point with a correlation detector. Furthermore, we mathematically prove that the watermark can be detected from any virtual view. Experimental results also show that the watermark in FTV can be successfully detected from a virtual view. Moreover, the proposed watermark method is robust against common signal processing attacks, such as Gaussian filtering, salt & peppers noising, JPEG compression, and center cropping.
View full abstract
-
Xinyuan CAI, Chunheng WANG, Baihua XIAO, Yunxue SHAO
Article type: PAPER
Subject area: Image Recognition, Computer Vision
2013Volume E96.DIssue 12 Pages
2830-2838
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
Face verification is the task of determining whether two given face images represent the same person or not. It is a very challenging task, as the face images, captured in the uncontrolled environments, may have large variations in illumination, expression, pose, background,
etc. The crucial problem is how to compute the similarity of two face images. Metric learning has provided a viable solution to this problem. Until now, many metric learning algorithms have been proposed, but they are usually limited to learning a linear transformation. In this paper, we propose a nonlinear metric learning method, which learns an explicit mapping from the original space to an optimal subspace using deep Independent Subspace Analysis (ISA) network. Compared to the linear or kernel based metric learning methods, the proposed deep ISA network is a deep and local learning architecture, and therefore exhibits more powerful ability to learn the nature of highly variable dataset. We evaluate our method on the Labeled Faces in the Wild dataset, and results show superior performance over some state-of-the-art methods.
View full abstract
-
Qieshi ZHANG, Sei-ichiro KAMATA
Article type: PAPER
Subject area: Image Recognition, Computer Vision
2013Volume E96.DIssue 12 Pages
2839-2849
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
This paper proposes an improved color barycenter model (CBM) and its separation for automatic road sign (RS) detection. The previous version of CBM can find out the colors of RS, but the accuracy is not high enough for separating the magenta and blue regions and the influence of number with the same color are not considered. In this paper, the improved CBM expands the barycenter distribution to cylindrical coordinate system (CCS) and takes the number of colors at each position into account for clustering. Under this distribution, the color information can be represented more clearly for analyzing. Then aim to the characteristic of barycenter distribution in CBM (CBM-BD), a constrained clustering method is presented to cluster the CBM-BD in CCS. Although the proposed clustering method looks like conventional
K-means in some part, it can solve some limitations of
K-means in our research. The experimental results show that the proposed method is able to detect RS with high robustness.
View full abstract
-
Tsuyoshi TASAKI, Akihisa MORIYA, Aira HOTTA, Takashi SASAKI, Haruhiko ...
Article type: PAPER
Subject area: Multimedia Pattern Processing
2013Volume E96.DIssue 12 Pages
2850-2856
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
A novel depth perception control method for a monocular head-up display (HUD) in a car has been developed, which is called the dynamic perspective method. The method changes a size and a position of the HUD image such as arrow for depth perception and achieves a depth perception position of 120 [m] within an error of 30% in a simulation. However, it is difficult to achieve an accurate depth perception in the real world because of car vibration. To solve this problem, we focus on a property, namely, that people complement hidden images by previous continuously observed images. We hide the image on the HUD when the car is vibrated very much. We aim to point at the accurate depth position by using see-through HUD images while having users complement the hidden image positions based on the continuous images before car vibration. We developed a car that detects big vibration by an acceleration sensor and is equipped with our monocular HUD. Our new method pointed at the depth position more accurately than the previous method, which was confirmed by t-test.
View full abstract
-
Tetsuya KANDA, Yuki MANABE, Takashi ISHIO, Makoto MATSUSHITA, Katsuro ...
Article type: LETTER
Subject area: Software Engineering
2013Volume E96.DIssue 12 Pages
2857-2859
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
It is not always easy for an Android user to choose the most suitable application for a particular task from the great number of applications available. In this paper, we propose a semi-automatic approach to extract feature names from Android applications. The case study verifies that we can associate common sequences of Android API calls with feature names.
View full abstract
-
Xiao XIA, Xinye LIN, Xiaodong WANG, Xingming ZHOU, Deke GUO
Article type: LETTER
Subject area: Information Network
2013Volume E96.DIssue 12 Pages
2860-2864
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
To facilitate the discovery of mobile apps in personal devices, we present the
personalized live homescreen system. The system mines the usage patterns of mobile apps, generates personalized predictions, and then makes apps available at users' hands whenever they want them. Evaluations have verified the promising effectiveness of our system.
View full abstract
-
Yuelei XIAO, Yumin WANG, Liaojun PANG, Shichong TAN
Article type: LETTER
Subject area: Information Network
2013Volume E96.DIssue 12 Pages
2865-2869
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
To solve the problems of the existing trusted network access protocols for Wireless Local Area Network (WLAN) mesh networks, we propose a new trusted network access protocol for WLAN mesh networks, which is abbreviated as WMN-TNAP. This protocol implements mutual user authentication and Platform-Authentication between the supplicant and Mesh Authenticator (MA), and between the supplicant and Authentication Server (AS) of a WLAN mesh network, establishes the key management system for the WLAN mesh network, and effectively prevents the platform configuration information of the supplicant, MA and AS from leaking out. Moreover, this protocol is proved secure based on the extended Strand Space Model (SSM) for trusted network access protocols.
View full abstract
-
YoungLok PARK, MyungKeun YOON
Article type: LETTER
Subject area: Dependable Computing
2013Volume E96.DIssue 12 Pages
2870-2872
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
When attackers compromise a client system, they can steal user input. We propose a distributed one-time keyboard system to prevent information leakage via keyboard typing. We define the problem of secure keyboard arrangement over distributed multi-devices and channels. An analytical model is proposed for the optimal keyboard layout.
View full abstract
-
Kyoung-Soo HAN
Article type: LETTER
Subject area: Artificial Intelligence, Data Mining
2013Volume E96.DIssue 12 Pages
2873-2876
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
Pieces of personal information, such as personal names and relationships, are crucial in text mining applications. Obituaries are good sources for this kind of information. This study proposes an effective method for extracting various facts about people from obituary Web pages. Experiments show that the proposed method achieves high performance in terms of recall and precision.
View full abstract
-
Kihong KIM, SeongOun HWANG
Article type: LETTER
Subject area: Artificial Intelligence, Data Mining
2013Volume E96.DIssue 12 Pages
2877-2881
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
Robot covering problem has gained attention as having the most promising applications in our real life. Previous spanning tree coverage algorithm addressed this problem well in a static environment, but not in a dynamic one. In this paper, we present and analyze our algorithm workable in a dynamic environment with less shadow areas.
View full abstract
-
Ahmed BOUDISSA, Joo Kooi TAN, Hyoungseop KIM, Takashi SHINOMIYA, Seiji ...
Article type: LETTER
Subject area: Pattern Recognition
2013Volume E96.DIssue 12 Pages
2882-2887
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
This paper introduces a simple algorithm for pedestrian detection on low resolution images. The main objective is to create a successful means for real-time pedestrian detection. While the framework of the system consists of edge orientations combined with the local binary patterns (LBP) feature extractor, a novel way of selecting the threshold is introduced. Using the mean-variance of the background examples this threshold improves significantly the detection rate as well as the processing time. Furthermore, it makes the system robust to uniformly cluttered backgrounds, noise and light variations. The test data is the INRIA pedestrian dataset and for the classification, a support vector machine with a radial basis function (RBF) kernel is used. The system performs at state-of-the-art detection rates while being intuitive as well as very fast which leaves sufficient processing time for further operations such as tracking and danger estimation.
View full abstract
-
Ji-Hyun SONG, Sangmin LEE
Article type: LETTER
Subject area: Speech and Hearing
2013Volume E96.DIssue 12 Pages
2888-2891
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
In this paper, we propose a novel voice activity detection (VAD) algorithm based on the generalized normal-Laplace (GNL) distribution to provide enhanced performance in adverse noise environments. Specifically, the probability density function (PDF) of a noisy speech signal is represented by the GNL distribution; the variance of the speech and noise of the GNL distribution are estimated using higher-order moments. After in-depth analysis of estimated variances, a feature that is useful for discrimination between speech and noise at low SNRs is derived and compared to a threshold to detect speech activity. To consider the inter-frame correlation of speech activity, the result from the previous frame is employed in the decision rule of the proposed VAD algorithm. The performance of our proposed VAD algorithm is evaluated in terms of receiver operating characteristics (ROC) and detection accuracy. Results show that the proposed method yields better results than conventional VAD algorithms.
View full abstract
-
Baeksop KIM, Jiseong KIM, Jungmin SO
Article type: LETTER
Subject area: Image Processing and Video Processing
2013Volume E96.DIssue 12 Pages
2892-2895
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
This letter presents a scheme to improve the running time of exemplar-based image inpainting, first proposed by Criminisi et al. In the exemplar-based image inpainting, a patch that contains unknown pixels is compared to all the patches in the known region in order to find the best match. This is very time-consuming and hinders the practicality of Criminisi's method to be used in real time. We show that a simple bounding algorithm can significantly reduce number of distance calculations, and thus the running time. Performance of the bounding algorithm is affected by the order of patches that are compared, as well as the order of pixels in a patch. We present pixel and patch ordering schemes that improve the performance of bounding algorithms. Experiments with well-known images used in inpainting literature show that the proposed reordering scheme can reduce running time of the bounding algorithm up to 50%.
View full abstract
-
Wen ZHOU, Chunheng WANG, Baihua XIAO, Zhong ZHANG, Yunxue SHAO
Article type: LETTER
Subject area: Image Recognition, Computer Vision
2013Volume E96.DIssue 12 Pages
2896-2899
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
Recognizing human action in complex scenes is a challenging problem in computer vision. Some action-unrelated concepts, such as camera position features, could significantly affect the appearance of local spatio-temporal features, and therefore the performance of low-level features based methods degrades. In this letter, we define the action-unrelated concept: the position of camera as high-level features. We observe that they can serve as a prior to local spatio-temporal features for human action recognition. We encode this prior by modeling interactions between spatio-temporal features and camera position features. We infer camera position features from local spatio-temporal features via these interactions. The parameters of this model are estimated by a new max-margin algorithm. We evaluate the proposed method on KTH, IXMAS and Youtube actions datasets. Experimental results show the effectiveness of the proposed method.
View full abstract
-
Chunxiao LIU, Guijin WANG, Xinggang LIN
Article type: LETTER
Subject area: Image Recognition, Computer Vision
2013Volume E96.DIssue 12 Pages
2900-2903
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
Learning an appearance model for person re-identification from multiple images is challenging due to the corrupted images caused by occlusion or false detection. Furthermore, different persons may wear similar clothes, making appearance feature less discriminative. In this paper, we first introduce the concept of multiple instance to handle corrupted images. Then a novel pairwise comparison based multiple instance learning framework is proposed to deal with visual ambiguity, by selecting robust features through pairwise comparison. We demonstrate the effectiveness of our method on two public datasets.
View full abstract
-
Lingyu LIANG, Lianwen JIN
Article type: LETTER
Subject area: Computer Graphics
2013Volume E96.DIssue 12 Pages
2904-2907
Published: December 01, 2013
Released on J-STAGE: December 01, 2013
JOURNAL
FREE ACCESS
We propose a new face relighting method using an illuminance template generated from a single reference portrait. First, the reference is wrapped according to the shape of the target. Second, we employ a new spatially variant edge-preserving smoothing filter to remove the facial identity and texture details of the wrapped reference, and obtain the illumination template. Finally, we relight the target with the template in CIELAB color space. Experiments show the effectiveness of our method for both grayscale and color faces taken from different databases, and the comparisons with previous works demonstrate a better relighting effect produced by our method.
View full abstract