IPSJ Online Transactions
Online ISSN : 1882-6660
ISSN-L : 1882-6660
Volume 5
Displaying 1-23 of 23 articles from this issue
  • Satoshi Iwata, Kenji Kono
    Article type: High Reliability
    Subject area: Regular Paper
    2012 Volume 5 Pages 1-12
    Published: 2012
    Released on J-STAGE: January 26, 2012
    JOURNAL FREE ACCESS
    Performance anomalies in web applications are becoming a huge problem and the increasing complexity of modern web applications has made it much more difficult to identify their root causes. The first step toward hunting for root causes is to narrow down suspicious components that cause performance anomalies. However, even this is difficult when several performance anomalies simultaneously occur in a web application; we have to determine if their root causes are the same or not. We propose a novel method that helps us narrow down suspicious components called performance anomaly clustering, which clusters anomalies based on their root causes. If two anomalies are clustered together, they are affected by the same root cause. Otherwise, they are affected by different root causes. The key insight behind our method is that anomaly measurements that are negatively affected by the same root cause deviate similarly from standard measurements. We compute the similarity in deviations from the non-anomalous distribution of measurements, and cluster anomalies based on this similarity. The results from case studies, which were conducted using RUBiS, which is an auction prototype modeled after eBay.com, are encouraging. Our clustering method output clusters crucial in the search for root causes. Guided by the clustering results, we searched for components exclusively used by each cluster and successfully determined suspicious components, such as the Apache web server, Enterprise Beans, and methods in Enterprise Beans. The root causes we found were shortages in network connections, inadequate indices in the database, and incorrect issues with SQLs, and so on.
    Download PDF (976K)
  • Yuan He, Hiroki Matsutani, Hiroshi Sasaki, Hiroshi Nakamura
    Article type: Interconnection Network
    Subject area: Regular Paper
    2012 Volume 5 Pages 13-20
    Published: 2012
    Released on J-STAGE: January 26, 2012
    JOURNAL FREE ACCESS
    The three-dimensional Network-on-Chip (3D NoC) is an emerging research topic exploring the network architecture of 3D ICs that stack several wafers or dies. As such topics being extensively studied, it is found negative impacts of 3D NoC's vertical interconnects are raising concerns considering their footprint sizes and routability degradation. In our evaluation, we found such vertical bandwidth limitation can dramatically degrade system performance by up to 2.3×. Since such limitations come from physical design constraints, to mitigate performance degradation, we have no other choice but to reduce the amount of communication data on-chip, especially for those data moving vertically. In this paper, therefore, we carry out a study of data compression on 3D NoC architectures with a comprehensive set of scientific workloads. Firstly, we propose an adaptive data compression scheme for 3D NoCs, taking account of the vertical bandwidth limitation and data compressibility. Secondly, we evaluate our proposal on a 3D NoC platform and we observe that the compressibility based adaptive compression is very useful against incompressible data while the location-based adaptive compression is more effective with more layers for the 3D NoC. Thirdly, we find that in a bandwidth limited situation like a CMP with 3D NoCs having multiple connected layers, adaptive data compression with location-based control or with both compressibility and location based control is very promising if the number of layers grows.
    Download PDF (2276K)
  • Hikaru Horie, Masato Asahara, Hiroshi Yamada, Kenji Kono
    Article type: Distributed System
    Subject area: Regular Paper
    2012 Volume 5 Pages 21-33
    Published: 2012
    Released on J-STAGE: March 27, 2012
    JOURNAL FREE ACCESS
    This paper presents MashCache, a scalable client-side architecture for taming harsh effects of flash crowds. Like previous systems, MashCache extensively makes use of client resources to cache and share frequently accessed contents. To design MashCache, we carefully investigated the quantitative analysis of flash crowds in the wild and found out that flash crowds have three good features that can be used to simplify the overall design of MashCache. First, flash crowds start up very slowly. MashCache has enough time to prepare for the coming flash crowds. Second, flash crowds last for a short period. Since the accessed content is rarely updated in this short period, MashCache can employ a simple mechanism for cache consistency. Finally, the clients issue the same requests on the target web sites. This feature simplifies the cache management of MashCache because the cache explosion can be avoided. By using these good features of flash crowds, MashCache advances the state-of-the-art of P2P based caching systems for flash crowds; MashCache is a pure P2P system that combines 1) aggressive caching, 2) query-origin key, 3) two-phase delta consistency, and 4) carefully designed cache meta data. MashCache has another advantage. Since it works completely on client-side, MashCache can tolerate flash crowds even if the external services the mashup depends on are not equipped with flash-crowd-resistant mechanisms. In the experiments with up to 2,500 emulated clients, a prototype of MashCache reduces the number of requests to the original web servers by 98.2% with moderate overheads.
    Download PDF (1221K)
  • Takahiro Hirofuchi, Hidemoto Nakada, Satoshi Itoh, Satoshi Sekiguchi
    Article type: Virtualization
    Subject area: Regular Paper
    2012 Volume 5 Pages 34-46
    Published: 2012
    Released on J-STAGE: March 27, 2012
    JOURNAL FREE ACCESS
    Dynamic consolidation of virtual machines (VMs) through live migration is a promising technology for IaaS datacenters. VMs are dynamically packed onto fewer server nodes, thereby eliminating excessive power consumption. Existing studies on VM consolidation, however, are based on precopy live migration, which requires dozens of seconds to switch the execution hosts of VMs. It is difficult to optimize VM locations quickly on sudden load changes, resulting in serious violations of VM performance criteria. In this paper, we propose an advanced VM consolidation system exploiting postcopy live migration, which greatly alleviates performance degradation. VM locations are reactively optimized in response to ever-changing resource usage. Sudden overloading of server nodes are promptly resolved by quickly switching the execution hosts of VMs. We have developed a prototype of our consolidation system and evaluated its feasibility through experiments. We confirmed that our consolidation system achieved a higher degree of performance assurance than using precopy migration. Our micro benchmark program, designed for the metric of performance assurance, showed that performance degradation was only 12% or less, even for memory-intensive workloads, which was less than half the level of using precopy live migration. The SPECweb benchmark showed that performance degradation was approximately 10%, which was greatly alleviated from the case of using precopy live migration (21%).
    Download PDF (1556K)
  • Daniel Sangorrin, Shinya Honda, Hiroaki Takada
    Article type: Virtualization
    Subject area: Regular Paper
    2012 Volume 5 Pages 47-58
    Published: 2012
    Released on J-STAGE: March 27, 2012
    JOURNAL FREE ACCESS
    Virtualization solutions aimed at the consolidation of a real-time operating system (RTOS) and a general-purpose operating system (GPOS) onto the same platform are gaining momentum as high-end embedded systems increase their computation power. Among them, the most extended approach for scheduling both operating systems consists of executing the GPOS only when the RTOS becomes idle. Although this approach can guarantee the real-time performance of the RTOS tasks and interrupt handlers, the responsiveness of GPOS time-sensitive activities is negatively affected when the RTOS contains compute-bound activities executing with low priority. In this paper, we modify a reliable hardware-assisted dual-OS virtualization technique to implement an integrated scheduling architecture where the execution priority level of the GPOS and RTOS activities can be mixed with high granularity. The evaluation results show that the proposed approach is suitable for enhancing the responsiveness of the GPOS time-sensitive activities without compromising the reliability and real-time performance of the RTOS.
    Download PDF (1211K)
  • Kazuya Yamakita, Hiroshi Yamada, Kenji Kono
    Article type: Virtualization
    Subject area: Regular Paper
    2012 Volume 5 Pages 59-70
    Published: 2012
    Released on J-STAGE: March 27, 2012
    JOURNAL FREE ACCESS
    Although operating systems (OSes) are crucial to achieving high availability of computer systems, modern OSes are far from bug-free. Rebooting the OS is simple, powerful, and sometimes the only remedy for kernel failures. Once we accept reboot-based recovery as a fact of life, we should try to ensure that the downtime caused by reboots is as short as possible. This paper presents “phase-based” reboots that shorten the downtime caused by reboot-based recovery. The key idea is to divide a boot sequence into phases. The phase-based reboot reuses a system state in the previous boot if the next boot reproduces the same state. A prototype of the phase-based reboot was implemented on Xen 3.4.1 running para-virtualized Linux 2.6.18. Experiments with the prototype show that it successfully recovered from kernel transient failures inserted by a fault injector, and its downtime was 34.3% to 93.6% shorter than that of the normal reboot-based recovery.
    Download PDF (794K)
  • Tomoharu Ugawa, Hideya Iwasaki, Taiichi Yuasa
    Article type: Regular Papers
    Subject area: Regular Paper
    2012 Volume 5 Pages 71-78
    Published: 2012
    Released on J-STAGE: April 10, 2012
    JOURNAL FREE ACCESS
    Mark sweep garbage collection (GC) is usually implemented using a mark stack for a depth first search that marks all objects reachable from the root set. However, the required size of the mark stack depends on the application, and its upper bound is proportional to the size of the heap. It is not acceptable in most systems to reserve memory for such a large mark stack. To avoid unacceptable memory overhead, some systems limit the size of the mark stack. If the mark stack overflows, the system scans the entire heap to find objects that could not be pushed due to overflow and traverses their children. Since the scanning takes a long time, this technique is inefficient for applications that are likely to cause overflows. In this research, we propose a technique to record rough locations of objects that failed to be pushed so that they can be found without scanning the entire heap. We use a technique similar to the card table of mostly concurrent GC to record rough locations.
    Download PDF (355K)
  • Munehiro Takimoto
    Article type: Regular Papers
    Subject area: Regular Paper
    2012 Volume 5 Pages 79-86
    Published: 2012
    Released on J-STAGE: April 10, 2012
    JOURNAL FREE ACCESS
    Partial dead code elimination (PDE) is a powerful code optimization technique that extends dead code elimination based on code motion. PDE eliminates assignments that are dead on some execution paths and alive on others. Hence, it can not only eliminate partially dead assignments but also move loop-invariant assignments out of loops. These effects are achieved by interleaving dead code elimination and code sinking. Hence, it is important to capture second-order effects between them, which can be reflected by repetitions. However, this process is costly. This paper proposes a technique that applies PDE to each assignment on demand. Our technique checks the safety of each code motion so that no execution path becomes longer. Because checking occurs on a demand-driven basis, the checking range may be restricted. In addition, because it is possible to check whether an assignment should be inserted at the blocking point of the code motion by performing a demand-driven analysis, PDE analysis can be localized to a restricted region. Furthermore, using the demand-driven property, our technique can be applied to each statement in a reverse postorder for a reverse control flow graph, allowing it to capture many second-order effects. We have implemented our technique as a code optimization phase and compared it with previous studies in terms of optimization and execution costs of the target code. As a result, our technique is as efficient as a single application of PDE and as effective as multiple applications of PDE.
    Download PDF (584K)
  • Katsuaki Ikegami, Kenjiro Taura
    Article type: Regular Papers
    Subject area: Regular Paper
    2012 Volume 5 Pages 87-95
    Published: 2012
    Released on J-STAGE: April 10, 2012
    JOURNAL FREE ACCESS
    Message passing model is a popular programming model in which users explicitly write both “send” and “receive” commands in programs and locate shared data on each process's local memory address. Consequently, it is difficult to write a program with complicated algorithms using a message passing model. This problem can be solved using a partitioned global address space (PGAS) model, which can provide virtual global address space and users can effortlessly write programs with complex data sharing. The PGAS model hides the network communication as implicit global memory access. This leads to better programmability, but there can be additional network communication overhead compared to a message passing model. We can reduce the overhead if programs read or write global memory in bulk, but this complicates writing programs. This paper presents the programming language and its runtime to achieve both the programmability and the performance with automatic communication aggregation. The programmer can write global memory accesses in the normal memory access style; then, the compiler and runtime aggregate the communication. In particular, the most time-consuming network accesses are placed in loops, and therefore, this paper suggests how the compiler detects global memory accesses in loops and aggregates them and how to implement that idea.
    Download PDF (873K)
  • Akihisa Yamada, Keiichirou Kusakari, Toshiki Sakabe, Masahiko Sakai, N ...
    Article type: Regular Papers
    Subject area: Regular Paper
    2012 Volume 5 Pages 96-104
    Published: 2012
    Released on J-STAGE: April 10, 2012
    JOURNAL FREE ACCESS
    Dynamically typed languages such as Scheme are widely adopted because of their rich expressiveness. However, there is the drawback that dynamic typing cannot detect runtime errors at compile time. In this paper, we propose a type system which enables static detection of runtime errors. The key idea of our approach is to introduce a special type, called the error type, for expressions that cause runtime errors. The proposed type system brings out the benefit of the error type with set-theoretic union, intersection and complement types, recursive types, parametric polymorphism and subtyping. While existing type systems usually ensure that evaluation never causes runtime errors for typed expressions, our system ensures that evaluation always causes runtime errors for expressions typed with the error type. Likewise, our system also ensures that evaluation never causes errors for expressions typed with any type that does not contain the error type. Under the usual definition of subtyping, it is difficult to syntactically prove the soundness of our type system. We redefine subtyping by introducing the notion of intransitive subtyping, and syntactically prove the soundness under the new definition.
    Download PDF (280K)
  • Takayuki Koai, Makoto Tatsuta
    Article type: Regular Papers
    Subject area: Regular Paper
    2012 Volume 5 Pages 105-113
    Published: 2012
    Released on J-STAGE: April 10, 2012
    JOURNAL FREE ACCESS
    Substitution Theorem is a new theorem in untyped lambda calculus, which was proved in 2006. This theorem states that for a given lambda term and given free variables in it, the term becomes weakly normalizing when we substitute arbitrary weakly normalizing terms for these free variables, if the term becomes weakly normalizing when we substitute a single arbitrary weakly normalizing term for these free variables. This paper formalizes and verifies this theorem by using the higher-order theorem prover HOL. A control path, which is the key notion in the proof, explicitly uses names of bound variables in lambda terms, and it is defined only for lambda terms without bound variable renaming. The lambda terms without bound variable renaming are formalized by using the HOL package based on contextual alpha-equivalence. The verification uses 10119 lines of HOL code and 326 lemmas.
    Download PDF (174K)
  • Mahito Sugiyama, Kentaro Imajo, Keisuke Otaki, Akihiro Yamamoto
    Article type: Regular Papers
    Subject area: Regular Paper
    2012 Volume 5 Pages 114-123
    Published: 2012
    Released on J-STAGE: June 19, 2012
    JOURNAL FREE ACCESS
    To date, enormous studies have been devoted to investigate biochemical functions of receptors, which have crucial roles for signal processing in organisms. Ligands are key tools in experiments since receptor specificity with respect to them enables us to control activity of receptors. However, finding ligands is difficult; choosing ligand candidates relies on expert knowledge of biologists and conducting test experiments in vivo or in vitro has a high cost. Here we investigate the ligand finding problem with a machine learning approach by formalizing the problem as multi-label classification mainly discussed in the area of preference learning. We develop in this paper a new algorithm LIFT (Ligand FInding via Formal ConcepT Analysis) for multi-label classification, which can treat ligand data in databases in a semi-supervised manner. The key to LIFT is to achieve clustering by putting an original dataset on lattices using the data analysis technique of Formal Concept Analysis (FCA), followed by obtaining the preference for each label using the lattice structure. Experiments using real data of ligands and receptors in the IUPHAR database show that LIFT effectively solves the task compared to other machine learning algorithms.
    Download PDF (419K)
  • Hajime Morita, Tetsuya Sakai, Manabu Okumura
    Article type: Research Papers
    Subject area: Regular Paper
    2012 Volume 5 Pages 124-129
    Published: 2012
    Released on J-STAGE: July 02, 2012
    JOURNAL FREE ACCESS
    We propose a new method for query-oriented extractive multi-document summarization. To enrich the information need representation of a given query, we build a co-occurrence graph to obtain words that augment the original query terms. We then formulate the summarization problem as a Maximum Coverage Problem with Knapsack Constraints based on word pairs rather than single words. Our experiments with the NTCIR ACLIA question answering test collections show that our method achieves a pyramid F3-score of up to 0.313, a 36% improvement over a baseline using Maximal Marginal Relevance.
    Download PDF (286K)
  • Ramon Francisco Mejia, Yuichi Kaji, Hiroyuki Seki
    Article type: Research Papers
    Subject area: Regular Paper
    2012 Volume 5 Pages 130-138
    Published: 2012
    Released on J-STAGE: July 02, 2012
    JOURNAL FREE ACCESS
    Monochrome two-dimensional barcodes are rapidly becoming a de-facto standard for distributing digital data through printed medium because of their small cost and portability. Increasing the data density of these barcodes improves the flexibility and effectiveness of existing applications of barcodes, and has the potential to create novel means for data transmission and conservation. However, printing and scanning equipment introduce uncontrollable effects on image at very refined level, and the effects are beyond the scope of the error control mechanisms of existing barcode schemes. To realize high-density barcodes, it is essential to develop novel symbology and error control mechanisms which can manage these kinds of effects and provide practical reliability. To tackle this problem, this paper studies the communication channel defined by high-density barcodes, and proposes several error control techniques to increase the robustness of the barcode scheme. Some of these techniques convert the peculiar behavior of printing equipment to the well-studied model of additive white Gaussian (AWGN) channel. The use of low-density parity codes is also investigated, as they perform much better than conventional Reed-Solomon codes especially for AWGN channels. Through experimental evaluation, it is shown that the proposed error control techniques can be essential components in realizing high-density barcodes.
    Download PDF (1514K)
  • Haruna Nishiwaki, Tomoharu Ugawa, Seiji Umatani, Masahiro Yasugi, Taii ...
    Article type: Regular Papers
    Subject area: Regular Paper
    2012 Volume 5 Pages 139-144
    Published: 2012
    Released on J-STAGE: August 22, 2012
    JOURNAL FREE ACCESS
    A static analysis tool has been developed for finding common mistakes in programs that use the Java Native Interface (JNI). Specific rules in JNI are not caught by C++ and other compilers, and this tool is aimed at rules about references to Java objects, which are passed to native methods as local references. Local references become invalid when the native method returns. To keep them valid after the return, the programmer should convert them into global references. If they are not converted, the garbage collector may malfunction and may, for example, fail to mark referenced objects. The developed static analysis tool finds assignments of local references to locations other than local variables such as global variables and structure fields. The tool was implemented as a plug-in for Clang, a compiler front-end for the LLVM. Application of this tool to native Android code demonstrated its effectiveness.
    Download PDF (184K)
  • Yasushi Miyata, Motoki Obata, Tomoya Ohta, Hiroyasu Nishiyama
    Article type: Regular Papers
    Subject area: Regular Paper
    2012 Volume 5 Pages 145-155
    Published: 2012
    Released on J-STAGE: August 22, 2012
    JOURNAL FREE ACCESS
    Server memory size is continuously growing. To accelerate business data processing of Java-based enterprise systems, the use of large memory is required. One example of such use is the object cache for accelerating read access of a large volume of data. However, Java incorporates a garbage collection (GC) mechanism for reclaiming unused objects. A typical GC algorithm requires finding references from old objects to young objects for identifying unused objects. This means that enlarging the heap memory increases the time for finding references. We propose a GC time reduction algorithm for large object cache systems, which eliminates the need for finding the references from a specific object cache region. This algorithm premises to treat cached objects as immutable objects that only allow READ and REMOVE operations. It also divides the object cache in two regions. The first is a closed region, which contains only immutable objects. The other is an unclosed region, which contains mutable objects that have survived GC. Filling an unclosed region changes the region to closed. When modifying an immutable object, the object is copied to the unclosed region and the modification is applied to the copied object. This restricts references from the object cache region to the region for young objects and excludes changes to the objects in the closed region. Experimental evaluation showed that the proposed algorithm can reduce GC time by 1/4 and improve throughput by 40% compared to traditional generational GC algorithms.
    Download PDF (848K)
  • Hidekazu Tadokoro, Kenichi Kourai, Shigeru Chiba
    Article type: Regular Papers
    Subject area: Virtualization
    2012 Volume 5 Pages 156-166
    Published: 2012
    Released on J-STAGE: August 27, 2012
    JOURNAL FREE ACCESS
    Infrastructure as a Service (IaaS) provides virtual machines (VMs) to the users and its system administrators often manage the user VMs using privileged VMs called the management VM. However, the administrators are not always trustworthy from users' point of view. If the administrators allow outside attackers to intrude in the management VM, the attackers can easily steal sensitive information from user VMs' memory. In this paper, we propose VMCrypt, which preserves the data secrecy of VMs' memory using the trusted virtual machine monitor. VMCrypt provides a dual memory view: a normal view for a user VM and an encrypted view for the management VM. The encrypted view prevents sensitive information from leaking to the management VM. To support the existing management software for para-virtualization, VMCrypt exceptionally provides a normal view to the management VM only for several memory regions, which are automatically identified and maintained during the life cycle of a user VM. We have implemented VMCrypt in Xen and our experimental results show that the downtime due to live migration was still less than one second.
    Download PDF (707K)
  • Keisuke Watanabe, Susumu Nishimura
    Article type: Regular Papers
    Subject area: Regular Paper
    2012 Volume 5 Pages 167-176
    Published: 2012
    Released on J-STAGE: September 06, 2012
    JOURNAL FREE ACCESS
    We present a game semantics for an Algol-like language with shared variable parallelism. On contrary to deterministic sequential programs, whose semantics can be characterized by observing termination behaviors, it is crucial for parallel programs to observe not only termination but also divergence, because of nondeterministic scheduling of parallel processes. In order to give a more appropriate foundation for modeling parallelism, we base our development on Harmer's game semantics, which concerns not only may-convergence but also must-convergence for a nondeterministic programming language EIA. The game semantics for the Algol-like parallel language is shown to be fully abstract, which indicates that the parallel command of our Algol-like language adds no extra power than nondeterminism provided by EIA. We also sketch how the equivalence of two parallel programs can be reasoned about based on the game semantical interpretation.
    Download PDF (317K)
  • Yasuto Arakaki, Hayaru Shouno, Kazuyuki Takahashi, Takashi Morie
    Article type: Regular Papers
    Subject area: Regular Paper
    2012 Volume 5 Pages 177-185
    Published: 2012
    Released on J-STAGE: October 03, 2012
    JOURNAL FREE ACCESS
    For the detection of generic objects in the field of image processing, histograms of orientation gradients (HOG) is discussed for these years. The performance of the classification system using HOG shows a good result. However, the performance of using HOG descriptor would be influenced by the detecting object size. In order to overcome this problem, we introduce a kind of hierarchy inspired from the convolution-net, which is a model of our visual processing system in the brain. The hierarchical HOG (H-HOG) integrates several scales of HOG descriptors in its architecture, and represents the input image as the combinatorial of more complex features rather than that of the orientation gradients. We investigate the H-HOG performance and compare with the conventional HOG. In the result, we obtain the better performance rather than the conventional HOG. Especially the size of representation dimension is much smaller than the conventional HOG without reducing the detecting performance.
    Download PDF (1446K)
  • Jun Kitazono, Toshiaki Omori, Toru Aonishi, Masato Okada
    Article type: Regular Papers
    Subject area: Regular Paper
    2012 Volume 5 Pages 186-191
    Published: 2012
    Released on J-STAGE: October 03, 2012
    JOURNAL FREE ACCESS
    With developments in optical imaging over the past decade, statistical methods for estimating dendritic membrane resistance from observed noisy signals have been proposed. In most of previous studies, membrane resistance over a dendritic tree was assumed to be constant, or membrane resistance at a point rather than that over a dendrite was investigated. Membrane resistance, however, is actually not constant over a dendrite. In a previous study, a method was proposed in which membrane resistance value is expressed as a non-constant function of position on dendrite, and parameters of the function are estimated. Although this method is effective, it is applicable only when the appropriate function is known. We propose a statistical method, which does not express membrane resistance as a function of position on dendrite, for estimating membrane resistance over a dendrite from observed membrane potentials. We use the Markov random field (MRF) as a prior distribution of the membrane resistance. In the MRF, membrane resistance is not expressed as a function of position on dendrite, but is assumed to be smoothly varying along a dendrite. We apply our method to synthetic data to evaluate its efficacy, and show that even when we do not know the appropriate function, our method can accurately estimate the membrane resistance.
    Download PDF (266K)
  • Hiroyuki Shindo, Tsutomu Hirao, Jun Suzuki, Akinori Fujino, Masaaki Na ...
    Article type: Research Papers
    Subject area: Regular Paper
    2012 Volume 5 Pages 192-202
    Published: 2012
    Released on J-STAGE: October 03, 2012
    JOURNAL FREE ACCESS
    Itemset mining is one of the most essential tasks in the field of data mining. In this paper, we focus on minimum length as a mining measure for closed itemset mining. That is, our task is formalized as follows: Given a database and user-specified minimum length threshold, find all closed itemsets whose length is at least the minimum length. Closed itemset mining based on the minimum length threshold is preferable when it is difficult for users to determine the appropriate minimum support value. For our task, we propose TripleEye: an efficient algorithm of closed itemset mining that is based on the intersection of transactions in a database. Our algorithm utilizes the information of inclusion relations between itemsets to avoid the generation of duplicate itemsets and reduce the computational cost of intersection. During the mining procedure, the information of inclusion relations is maintained in a novel tree structure called Ordered Inclusion Tree. Experiments show that our algorithm dramatically reduces the computational cost, compared against naive intersection-based algorithm. Our algorithm also achieves up to twice the running speed of conventional algorithms given dense databases.
    Download PDF (1064K)
  • Damien Eklou, Yasuhito Asano, Masatoshi Yoshikawa
    Article type: Research Papers
    Subject area: Regular Paper
    2012 Volume 5 Pages 203-213
    Published: 2012
    Released on J-STAGE: October 03, 2012
    JOURNAL FREE ACCESS
    With the huge amount of data on the Web, looking for desired information can be a time-consuming task. Wikipedia is a helpful tool because it is the largest, most-popular general reference site on the Internet. Most search engines rank Wikipedia pages among the top listed results. However, because many articles on Wikipedia are manually updated by users, several articles lack information and must be upgraded. That necessary information for updates can sometimes be found on the Web. Uprooting this information from the Web involves a time-consuming process of reading, analyzing and summarizing the information for the user. To support the user search process and to help Wikipedia contributors in the updating process of articles, we propose a method of finding valuable complementary information related to the Web. Experiments showed that our method was quite effective in retrieving important complementary information from Web pages.
    Download PDF (531K)
  • Jiyi Li, Qiang Ma, Yasuhito Asano, Masatoshi Yoshikawa
    Article type: Research Papers
    Subject area: Regular Paper
    2012 Volume 5 Pages 214-222
    Published: 2012
    Released on J-STAGE: October 03, 2012
    JOURNAL FREE ACCESS
    With the recent rapid growth of social image hosting websites, such as Flickr, it is easier to construct a large database with social tagged images. We propose an unsupervised approach for automatic ranking social images to improve content-based social image retrieval. We construct an image-tag relationship graph model with both social images and tags. The approach extracts visual and textual information and combines them for ranking by propagating them through the graph links with an optimized mutual reinforcement process. We conduct experiments showing that our approach can successfully use social tags for ranking and improving content-based social image search results, and performs better than other approaches.
    Download PDF (5018K)
feedback
Top