Information and Media Technologies
Online ISSN : 1881-0896
ISSN-L : 1881-0896
Volume 2, Issue 1
Displaying 1-35 of 35 articles from this issue
Hardware and Devices
  • Keitaro Yamashita, Toshiya Inada, Steven Deane, Paul Collins, Satoshi ...
    2007 Volume 2 Issue 1 Pages 1-5
    Published: 2007
    Released on J-STAGE: March 15, 2007
    JOURNAL FREE ACCESS
    We have developed two 1.8" QVGA (222dpi) transflective a-Si AMLCDs with integrated gate drivers. One is optimized for a very low power normal operating mode. The other can switch to an arbitrary partial scan mode. Gate driver integration results in narrow, symmetric margins, reduced cost, and fewer interconnections. Also, we minimized the number of connections to the display while retaining some to achieve very significant power reduction.
    Download PDF (464K)
Computing
  • Takashi Amisaki, Shin-ichi Fujiwara
    2007 Volume 2 Issue 1 Pages 6-16
    Published: 2007
    Released on J-STAGE: March 15, 2007
    JOURNAL FREE ACCESS
    This paper reports a protein-simulation grid that uses grid remote procedure calls (GridRPCs)to a special-purpose cluster machine for molecular dynamics simulations. The grid was implemented using Ninf-G, Torque, LAM, and the Globus Toolkit. To avoid the inefficiency of a single GridRPC session using all the nodes of the cluster, we designed the grid so that it works efficiently when multiple GridRPC sessions share the cluster. This was done by putting the dedicated nodes(PCs with special computation boards)under the management of the Torque system, thus enabling the manager to dynamically configure a cluster with the requested number of dedicated nodes. In addition, a new job type was added to the Globus toolkit and new backend procedure was added to Ninf-G. The Ninf-G stub was separated from processes that actually perform the force evaluation on the dedicated nodes. Simulations for two proteins gave promising results. Simulations performed using a four-node cluster and a 100-Mbps LAN for GridRPC sessions were 4.6-17.0 times faster than the same simulation performed on the local client PC, while their communication overhead was less than 20% of total execution time. Even when the the four-node cluster machine was shared between two distinct simulations of proteins, the two GridRPC communications did not interfere with each other. This showed the efficacy of multiple GridRPC sessions.
    Download PDF (331K)
  • Kento Aida, Yoshiaki Futakata, Tomotaka Osumi
    2007 Volume 2 Issue 1 Pages 17-30
    Published: 2007
    Released on J-STAGE: March 15, 2007
    JOURNAL FREE ACCESS
    This paper proposes a parallel branch and bound algorithm that efficiently runs on the Grid. The proposed algorithm is parallelized with the hierarchical master-worker paradigm in order to efficiently compute fine-grain tasks on the Grid. The hierarchical algorithm performs master-worker computing in two levels, computing among PC clusters on the Grid and that among computing nodes in each PC cluster, and reduces communication overhead by localizing frequent communication in tightly coupled computing resources, or a PC cluster. On each PC cluster, granularity of tasks dispatched to computing nodes is adaptively adjusted to obtain the best performance. The algorithm is implemented on the Grid testbed by using GridRPC middleware, Ninf-G and Ninf. In the implementation, communication among PC clusters is securely performed via Ninf-G using the Grid Security Infrastructure, and fast communication in each PC cluster is performed via Ninf. The experimental results showed that parallelization with the hierarchical master-worker paradigm using combination of Ninf-G and Ninf effectively utilized computing resources on the Grid in order to run a fine-grain application. The results also showed that the adaptive task granularity control automatically gave the same or better performance compared to performance with manual control.
    Download PDF (443K)
  • Ta Quoc Viet, Tsutomu Yoshinaga
    2007 Volume 2 Issue 1 Pages 31-39
    Published: 2007
    Released on J-STAGE: March 15, 2007
    JOURNAL FREE ACCESS
    This study proposes asynchronous MPI, a simple and effective parallel programming model for SMP clusters, to reimplement the High PerformanceLinpack benchmark. The proposed model forces processors of an SMP node to work in different phases, thereby avoiding unneccessary communication and computation bottlenecks. As a result, we can achieve significant improvements in performance with a minimal programming effort. In comparison with a de-facto flat MPI solution, our algorithm can yield a 20.6% performance improvement for a 16-node cluster of Xeon dual-processor SMPs.
    Download PDF (383K)
  • Fuminori Adachi, Takashi Washio, Hiroshi Motoda
    2007 Volume 2 Issue 1 Pages 40-52
    Published: 2007
    Released on J-STAGE: March 15, 2007
    JOURNAL FREE ACCESS
    This paper proposes a novel approach to discover dynamic laws and models represented by simultaneous time differential equations including hidden states from time series data measured in an objective process. This task has not been addressed in the past work though it is essentially important in scientific discovery since any behaviors of objective processes emerge in time evolution. The promising performance of the proposed approach is demonstrated through the analysis of synthetic data.
    Download PDF (320K)
  • Yuki Chiba, Takahito Aoto, Yoshihito Toyama
    2007 Volume 2 Issue 1 Pages 53-67
    Published: 2007
    Released on J-STAGE: March 15, 2007
    JOURNAL FREE ACCESS
    We propose a framework in this paper for transforming programs with templates based on term rewriting. The programs are given by term rewriting systems. We discuss how to validate the correctness of program transformation within our framework. We introduce a notion of developed templates and a simple method of constructing such templates without explicit use of induction. We then show that in any transformation of programs using the developed templates, their correctness can be verified automatically. The correctness of program transformation within our framework is discussed based on operational semantics. We also present some examples of program transformations in our framework.
    Download PDF (391K)
  • Hijiri Maeno, Md. Altaf-Ul-Amin, Yoko Shinbo, Ken Kurokawa, Naotake Og ...
    2007 Volume 2 Issue 1 Pages 68-78
    Published: 2007
    Released on J-STAGE: March 15, 2007
    JOURNAL FREE ACCESS
    Gene classification based on orthologous relations is an important problem to understand species-universal or species-specific conservation of genes in genomes associated with phenotype in species. In the present study, we proposed a classification system of genes based on configuration of networks concerning bidirectional best-hit relations (called orthologous relation group), which makes it possible to compare multiple genomes. We have applied this method to five Bacillus species (B. subtilis, B. anthracis, B. cereus, B. halodurans, andB. thuringiensis). With regards to the the five species, 4, 776 orthologous relation groups have been obtained, and those are classified into 113 isomorphic groups. An isomorphic group may contain only orthologs or a combination of orthologs and paralogs. Gene functions and the conservativeness are discussed in view of configuration of orthologous relation groups.
    Download PDF (871K)
  • Yuki Kato, Hiroyuki Seki, Tadao Kasami
    2007 Volume 2 Issue 1 Pages 79-88
    Published: 2007
    Released on J-STAGE: March 15, 2007
    JOURNAL FREE ACCESS
    Many attempts have so far been made at modeling RNA secondary structure by formal grammars. In a grammatical approach, secondary structure prediction can be viewed as parsing problem. However, there may be many different derivation trees for an input sequence. Thus, it is necessary to have a method of extracting biologically realistic derivation trees among them. One solution to this problem is to extend a grammar to a probabilistic model and find the most likely derivation tree, and another is to take free energy minimization into account. One simple formalism for describing RNA folding is context-free grammars(CFGs), but it is known that CFGs cannot represent pseudoknots. Therefore, several formal grammars have been proposed for modeling RNA pseudoknotted structure. In this paper, we focus on multiple context-free grammars (MCFGs), which are natural extension of CFGs and can represent pseudoknots, and extend MCFGs to a probabilistic model called stochastic MCFG (SMCFG). We present a polynomial time parsing algorithm for finding the most probable derivation tree, which is applicable to RNA secondary structure prediction including pseudoknots. Also, we propose a probability parameter estimation algorithm based on the EM (expectation maximization) algorithm. Finally, we show some experimental results on RNA pseudoknot prediction using the SMCFG parsing algorithm, which show good prediction accuracy.
    Download PDF (344K)
  • Md. Ahaduzzaman Munna, Takenao Ohkawa
    2007 Volume 2 Issue 1 Pages 89-97
    Published: 2007
    Released on J-STAGE: March 15, 2007
    JOURNAL FREE ACCESS
    We are developing PROFESS, a system to assist with the extraction of protein functional site information from the literature related to protein structural analysis. In this system, the sentences with functional information are first extracted. This paper proposes the complementary use of the protein structure data, keywords and patterns to extract the target sentences. In the proposed method, the sentences in the literature are expressed in vector using these three features, which are learnt by the SVM. As the accuracy of the SVM depends on the number of effective vector elements, we propose a method to automatically extract patterns to add as new vector elements and obtain a higher value in accuracy. There is a problem of matching of the patterns to the sentences when any proper noun tag is expressed adjacent to residue tag. We defined two rules to eliminate these unnecessary tags so that the patterns can match to the sentences. The proposed method was applied to five documents related to structural analysis of protein for extracting sentences with protein functional information, where eight literatures were used for the feedback for each of the experiment literatures. The average recall value and F value were 0.96 and 0.69, respectively. It was confirmed that the increase of the number of the vector elements lead to a higher performance in the sentence extraction.
    Download PDF (484K)
  • Hisashi Tuji, Md. Altaf-Ul-Amin, Masanori Arita, Hirokazu Nishio, Yoko ...
    2007 Volume 2 Issue 1 Pages 98-108
    Published: 2007
    Released on J-STAGE: March 15, 2007
    JOURNAL FREE ACCESS
    A Protein-Protein Interaction network, what we call a PPI network is considered as an important source of information for prediction of protein functions. However, it is quite difficult to analyze such networks for their complexity. We expected that if we could develop a good visualizing method for PPI networks, we could predict protein functions visually because of the close relation between protein functions and protein interactions. Previously, we proposed one, which is based on clustering concepts, by extracting clusters defined as relatively densely connected group of nodes. But the results of visualization of a network differ very much depending on the clustering algorithm. Therefore, in this paper, we compare the outcome of two different clustering algorithms, namely DPClus and Newman algorithms, by applying them to a PPI network, and point out some advantages and limitations of both.
    Download PDF (804K)
  • Yukako Tohsato
    2007 Volume 2 Issue 1 Pages 109-114
    Published: 2007
    Released on J-STAGE: March 15, 2007
    JOURNAL FREE ACCESS
    Comparative analyses of the metabolic networks among different species provide important information regarding the evolution of organisms as well as pharmacological targets. In this paper, a method is proposed for comparing metabolic networks based on enzymatic reactions within different species. Specifically, metabolic networks are handled as sets of enzymatic reactions. Based on the presence or absence of metabolic reactions, the metabolic network of an organism is represented by a bit string comprised of the digits “1” and “0, ” called the “reaction profile.” Then, the degree of similarity between bit strings is defined, followed by clustering of metabolic networks by different species. By applying our method to the metabolic networks of 33 representative organisms selected from bacteria, archaea, and eukaryotes in the MetaCyc database, a phylogenetic tree was reconstructed that represents the similarity of metabolic network based on metabolic phenotypes.
    Download PDF (218K)
  • Yasuyuki Tomita, Hiroyuki Asano, Hideo Izawa, Mitsuhiro Yokota, Takesh ...
    2007 Volume 2 Issue 1 Pages 115-133
    Published: 2007
    Released on J-STAGE: March 15, 2007
    JOURNAL FREE ACCESS
    Multifactorial diseases, such as lifestyle-related diseases, for example, cancer, diabetes mellitus, and myocardial infarction, are believed to be caused by the complex interactions between various environmental factors on a polygenic basis. In addition, it is believed that genetic risk factors for the same disease differ on an individual basis according to their susceptible environmental factors. In the present study, to predict the development of myocardial infarction (MI) and classify the subjects into personally optimum development patterns, we have extracted risk factor candidates (RFCs) that comprised a state that is a derivative form of polymorphisms and environmental factors using a statistical test. We then selected the risk factors using a criterion for detecting personal group (CDPG), which is defined in the present study. By using CDPG, we could predict the development of MI in blinded subjects with an accuracy greater than 75%. In addition, the risk percentage for MI was higher with an increase in the number of selected risk factors in the blinded data. Since sensitivity using the CDPG was high, it can be an effective and useful tool in preventive medicine and its use may provide a high quality of life and reduce medical costs.
    Download PDF (585K)
  • Shigeyuki Oba, Nobumoto Tomioka, Miki Ohira, Shin Ishii
    2007 Volume 2 Issue 1 Pages 134-143
    Published: 2007
    Released on J-STAGE: March 15, 2007
    JOURNAL FREE ACCESS
    The recently developed array-based comparative genomic hybridization(array CGH) technique measures DNA copy number aberrations that occur as causes or consequences of cell diseases such as cancers. Conventional array CGH analysis classifies DNA copy number aberrations into three categories: no significant change, significant gain, and significant loss. However, recent improvements in microarray measurement precision enable more quantitative analysis of copy number aberrations. We propose a method, called comb fitting, that extracts a quantitative interpretation from array CGH data. We also propose modifications that allow us to apply comb fitting to cases featuring heterogeneity of local aberrations in DNA copy numbers. By using comb fitting, we can correct the baseline of the fluorescence ratio data measured by array CGH and simultaneously translate them into the amount of changed copy numbers for each small part of the chromosome, such as 0, ±1, ±2, ···. Comb fitting is applicable even when a considerable amount of contamination by normal cells exists and when heterogeneity in the ploidy number cannot be neglected.
    Download PDF (1341K)
  • Takao Shimayoshi, Kazuhiro Komurasaki, Akira Amano, Takeshi Iwashita, ...
    2007 Volume 2 Issue 1 Pages 144-153
    Published: 2007
    Released on J-STAGE: March 15, 2007
    JOURNAL FREE ACCESS
    The development of physiological cell models to support the understanding of biological mechanisms gains increasingly importance. Due to the complexity of biological systems, whole cell models, which are composed of many imported component models of functional elements, get quite complex, making modifications difficult. Here, we propose a method to enhance structural changes of cell models, employing the markup languages of CellML and our original PMSML (Physiological Model Structure Markup Language), in addition to a new ontology for cell physiological modelling, the Cell Model Ontology. In particular, a method to make references from CellML files to the ontology and a method to assist with manipulation of model structures using PMSML together with the Cell Model Ontology are reported. Using these methods two software utilities, an interactive ontology ID assigner, the CellML Ontologizer, and a graphical cell model editor, the Cell Structure Editor, are implemented. Experimental results proved that the proposed method and the implemented software are useful for the modification of physiological models.
    Download PDF (809K)
  • Takeshi Ogasawara, Hideaki Komatsu, Toshio Nakatani
    2007 Volume 2 Issue 1 Pages 154-162
    Published: 2007
    Released on J-STAGE: March 15, 2007
    JOURNAL FREE ACCESS
    We propose a new algorithm that is effective for objects that are shared among threads but are not contended for in SMP environments. We can remove the overhead of the serialization between lock and other non-lock operations and avoid the latency of complex atomic operations in most cases. We established the safety of the algorithm by using a software tool called Spin. The experimental results from our benchmarking on an SMP machine using Intel Xeon processors revealed that our algorithm could significantly improve efficiency by 80% on average compared to using complex atomic instruction.
    Download PDF (498K)
  • Jiahong Wang, Yoshiaki Asanuma, Eiichiro Kodama, Toyoo Takata, Jie Li
    2007 Volume 2 Issue 1 Pages 163-177
    Published: 2007
    Released on J-STAGE: March 15, 2007
    JOURNAL FREE ACCESS
    Sequential pattern mining is a useful technique used to discover frequent subsequences as patterns in a sequence database. Depending on the application, sequence databases vary by number of sequences, number of individual items, average length of sequences, and average length of potential patterns. In addition, to discover the necessary patterns in a sequence database, the support threshold may be set to different values. Thus, for a sequential pattern-mining algorithm, responsiveness should be achieved for all of these factors. For that purpose, we propose a candidate-driven pattern-growth sequential pattern-mining algorithm called FSPM (Fast Sequential Pattern Mining). A useful property of FSPM is that the sequential patterns concerning a user-specified item can be mined directly. Extensive experimental results show that, in most cases FSPM outperforms existing algorithms. An analytical performance study shows that it is the inherent potentiality of FSPM that makes it more effective.
    Download PDF (691K)
  • Nobutaka Suzuki
    2007 Volume 2 Issue 1 Pages 178-190
    Published: 2007
    Released on J-STAGE: March 15, 2007
    JOURNAL FREE ACCESS
    Finding an edit script between data has played an important role in data retrieval and data transformation. So far many methods for finding an edit script between two XML documents have been proposed, but few studies on finding an edit script between an XML document and a DTD have been made. In this paper, we first present a polynomial-time algorithm for finding an edit script between an XML document and a DTD, which is optimum under some restrictions on operations. We next prove the correctness of the algorithm.
    Download PDF (594K)
Media (processing) and Interaction
  • Toshiki Iso, Hiroki Suzuki, Atsuki Tomioka, Shoji Kurakake
    2007 Volume 2 Issue 1 Pages 191-199
    Published: 2007
    Released on J-STAGE: March 15, 2007
    JOURNAL FREE ACCESS
    We introduce the “Hands-Free Video Phone System” that uses a visual sensor network. The system makes it easy for a user in his house to take a video phone call without disturbing his current activity. This system uses fisheyes lens cameras as visual sensors because they offer both sensing and image generation and are cheaper than other types of sensors for tracking people such as pressure sensors or infrared sensors. Its key advance is an algorithm that detects user location based on disparity maps of multiple stereo cameras, and selects the best shot camera based on criteria that consider optical properties. We describe a prototype system and a field trial that confirms the feasibility of the system in an experimental house.
    Download PDF (2437K)
  • Hideki Hirakawa
    2007 Volume 2 Issue 1 Pages 200-228
    Published: 2007
    Released on J-STAGE: March 15, 2007
    JOURNAL FREE ACCESS
    Preference Dependency Grammar (PDG) is a framework for the morphological, syntactic and semantic analysis of natural language sentences. PDG gives packed shared data structures for encompassing the various ambiguities in each levels of sentence analysis with preference scores and a method for calculating the most plausible interpretation of a sentence. This paper proposes the Graph Branch Algorithm for computing the optimum dependency tree (the most plausible interpretation of a sentence) from a scored dependency forest which is a packed shared data structure encompassing all possible dependency trees (interpretations) of a sentence. The graph branch algorithm adopts the branch and bound principle for managing arbitrary arc co-occurrence constraints including the single valence occupation constraint which is a basic semantic constraint in PDG. This paper also reports the experiment using English texts showing the computational complexity and behavior of the graph branch algorithm.
    Download PDF (609K)
  • Xinyu Deng, Jun-ichi Nakamura
    2007 Volume 2 Issue 1 Pages 229-251
    Published: 2007
    Released on J-STAGE: March 15, 2007
    JOURNAL FREE ACCESS
    This paper describes the microplanner of the SILK system which can generate texts appropriate for intermediate non-native users on discourse level. Four factors (i.e. nucleus position, between-text-span punctuation, embedded discourse markers and punctuation pattern) are regarded to affect the readability at discourse level. It is the preferences among these factors that decide the readability. Since the number of possible combinations of the preferences is huge, we use Genetic Algorithm to solve such a problem. We adopt two methods to evaluate the system: one is evaluating the reliability of the SILK system by analysing how often it re-generates corpus texts, another is judging readability by human subjects. The evaluation results show that the system is reliable and the generation results are appropriate for intermediate non-native speakers on discourse level.
    Download PDF (272K)
  • Masaki Murata, Toshiyuki Kanamaru, Tamotsu Shirado, Hitoshi Isahara
    2007 Volume 2 Issue 1 Pages 252-278
    Published: 2007
    Released on J-STAGE: March 15, 2007
    JOURNAL FREE ACCESS
    Patent processing is important in various fields such as industry, business, and law. We used F-terms (Schellner 2002) to classify patent documents using the k-nearest neighborhood method. Because the F-term categories are fine-grained, they are useful when we classify patent documents. We clarified the following three points using experiments: i) which variations of the k-nearest neighborhood method are the best for patent classification, ii) which methods of calculating similarity are the best for patent classification, and iii) from which regions of a patent terms should be extracted. In our experiments, we used the patent data used in the F-term categorization task in the NTCIR-5 Patent Workshop (NTCIR committee 2005; Iwayama, Fujii, and Kando 2005). We found that the method of adding the scores of k extracted documents to classify patent documents was the most effective among the variations of the k-nearest neighborhood method used in this study. We also found that SMART (Singhal, Buckley, and Mitra 1996; Singhal, Choi, Hindle, and Pereira 1997), which is known to be effective in information retrieval, was the most effective method of calculating similarity. Finally, when extracting terms, we found that using the abstract and claim regions together was the best method among all the combinations of using abstract, claim, and description regions. The results were confirmed using a statistical test. Moreover, we experimented with changing the amount of training data and found that we obtained better performance when we used more data, which was limited to that provided in the NTCIR-5 Patent Workshop.
    Download PDF (460K)
  • Tetsuro Kitahara, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hir ...
    2007 Volume 2 Issue 1 Pages 279-291
    Published: 2007
    Released on J-STAGE: March 15, 2007
    JOURNAL FREE ACCESS
    This paper presents a new technique for recognizing musical instruments in polyphonic music. Since conventional musical instrument recognition in polyphonic music is performed notewise, i.e., for each note, accurate estimation of the onset time and fundamental frequency (F0) of each note is required. However, these estimations are generally not easy in polyphonic music, and thus estimation errors severely deteriorated the recognition performance. Without these estimations, our technique calculates the temporal trajectory of instrument existence probabilities for every possible F0. The instrument existence probability is defined as the product of a nonspecific instrument existence probabilitycalculated using the PreFEst and a conditional instrument existence probability calculated using hidden Markov models. The instrument existence probability is visualized as a spectrogram-like graphical representation called the instrogram and is applied to MPEG-7 annotation and instrumentation-similarity-based music information retrieval. Experimental results from both synthesized music and real performance recordings have shown that instrograms achieved MPEG-7 annotation (instrument identification) with a precision rate of 87.5% for synthesized music and 69.4% for real performances on average and that the instrumentation similarity measure reflected the actual instrumentation better than an MFCC-based measure.
    Download PDF (960K)
  • Tetsuji Kuboyama, Kouichi Hirata, Hisashi Kashima, Kiyoko F. Aoki-Kino ...
    2007 Volume 2 Issue 1 Pages 292-299
    Published: 2007
    Released on J-STAGE: March 15, 2007
    JOURNAL FREE ACCESS
    Learning from tree-structured data has received increasing interest with the rapid growth of tree-encodable data in the World Wide Web, in biology, and in other areas. Our kernel function measures the similarity between two trees by counting the number of shared sub-patterns called tree q-grams, and runs, in effect, in linear time with respect to the number of tree nodes. We apply our kernel function with a support vector machine (SVM) to classify biological data, the glycans of several blood components. The experimental results show that our kernel function performs as well as one exclusively tailored to glycan properties.
    Download PDF (390K)
  • Shin-ichi Minato, Kimihito Ito
    2007 Volume 2 Issue 1 Pages 300-308
    Published: 2007
    Released on J-STAGE: March 15, 2007
    JOURNAL FREE ACCESS
    In this paper, we present a method of finding symmetric items in a combinatorial item set database. The techniques for finding symmetric variables in Boolean functions have been studied for long time in the area of VLSI logic design, and the BDD (Binary Decision Diagram) -based methods are presented to solve such a problem. Recently, we have developed an efficient method for handling databases using ZBDDs (Zero-suppressed BDDs), a particular type of BDDs. In our ZBDD-based data structure, the symmetric item sets can be found efficiently as well as for Boolean functions. We implemented the program of symmetric item set mining, and applied it to actual biological data on the amino acid sequences of influenza viruses. We found a number of symmetric items from the database, some of which indicate interesting relationships in the amino acid mutation patterns. The result shows that our method is helpful for extracting hidden interesting information in real-life databases.
    Download PDF (630K)
  • Shin-ichi Minato, Hiroki Arimura
    2007 Volume 2 Issue 1 Pages 309-316
    Published: 2007
    Released on J-STAGE: March 15, 2007
    JOURNAL FREE ACCESS
    Frequent item set mining is one of the fundamental techniques for knowledge discovery and data mining. In the last decade, a number of efficient algorithms for frequent item set mining have been presented, but most of them focused on just enumerating the item set patterns which satisfy the given conditions, and it was a different matter how to store and index the result of patterns for efficient data analysis. Recently, we proposed a fast algorithm of extracting all frequent item set patterns from transaction databases and simultaneously indexing the result of huge patterns using Zero-suppressed BDDs (ZBDDs). That method, ZBDD-growth, is not only enumerating/listing the patterns efficiently, but also indexing the output data compactly on the memory to be analyzed with various algebraic operations. In this paper, we present a variation of ZBDD-growth algorithm to generate frequent closed item sets. This is a quite simple modification of ZBDD-growth, and additional computation cost is relatively small compared with the original algorithm for generating all patterns. Our method can conveniently be utilized in the environment of ZBDD-based pattern indexing.
    Download PDF (844K)
  • Kenichi Kurihara, Yoshitaka Kameya, Taisuke Sato
    2007 Volume 2 Issue 1 Pages 317-325
    Published: 2007
    Released on J-STAGE: March 15, 2007
    JOURNAL FREE ACCESS
    Clustering word co-occurrences has been studied to discover clusters as latentconcepts. Previous work has applied the semantic aggregate model (SAM), and reports that discovered clusters seem semantically significant. The SAM assumes a co-occurrence arises from one latent concept. This assumption seems moderately natural. However, to analyze latent concepts more deeply, the assumption may be too restrictive. We propose to make clusters for each part of speech from co-occurrence data. For example, we make adjective clusters and noun clusters from adjective—noun co-occurrences while the SAM builds clusters of “co-occurrences.” The proposed approach allows us to analyze adjectives and nouns independently.
    To take this approach, we propose a frequency-based infinite relational model (FIRM) for word co-occurrences. The FIRM is a stochastic block model that takes into account the frequency of observations although traditional stochastic blockmodels ignore it. The FIRM also utilizes the Dirichlet process so that the number of clusters is inferred. We derive a variational inference algorithm for the model to apply to a large dataset. Experimental results show that the FIRM is more helpful to analyze adjectives and nouns independently, and the FIRM clusters capture the SAM clusters better than a stochastic blockmodel.
    Download PDF (1396K)
  • Nozomi Kobayashi, Kentaro Inui, Yuji Matsumoto
    2007 Volume 2 Issue 1 Pages 326-337
    Published: 2007
    Released on J-STAGE: March 15, 2007
    JOURNAL FREE ACCESS
    The task of opinion extraction and structurization is the key component of opinion mining, which allow Web users to retrieve and summarize people's opinions scattered over the Internet. Our aim is to develop a method for extracting opinions that represent evaluation of concumer products in a structured form. To achieve the goal, we need to consider some issues that are relevant to the extraction task: How the task of opinion extraction and structurization should be designed,and how to extract the opinions which we defined. We define an opinion unit consisting of a quadruple, that is, the opinion holder, the subject being evaluated, the part or the attribute in which it is evaluated, and the evaluation that expresses positive or negative assessment. In this task, we focus on two subtasks (a) extracting subject/aspect-evaluation relations, and (b) extracting subject/aspect-aspect relations, we approach each extraction task using a machine learning-based method. In this paper, we discuss how customer reviews in web documents can be best structured. We also report on the results of our experiments and discuss future directions.
    Download PDF (400K)
Computer Networks and Broadcasting
  • Shinta Sugimoto, Francis Dupont, Ryoji Kato
    2007 Volume 2 Issue 1 Pages 338-346
    Published: 2007
    Released on J-STAGE: March 15, 2007
    JOURNAL FREE ACCESS
    We specified a mechanism with which Mobile IPv6 and IPsec/IKE can work together efficiently. The interaction is necessary for updating the endpoint address of an IPsec tunnel in accordance with movement performed by a mobile node. Based on an analysis of needs for interaction between Mobile IPv6 and IPsec/IKE, we designed and implemented a mechanism that is an extension to the PF_KEY framework. The proposed mechanism allows Mobile IPv6 to inform IPsec/IKE of the movement so that necessary updates to the security policy database and security association database can be taken by IPsec/IKE.This notification helps IKE to update its internal state. The mechanism is also applicable to the other scenarios, such as NEMO, Mobile VPN and its variants.
    Download PDF (423K)
  • Peter Ivo Racz, Takahiro Matsuda, Miki Yamamoto
    2007 Volume 2 Issue 1 Pages 347-355
    Published: 2007
    Released on J-STAGE: March 15, 2007
    JOURNAL FREE ACCESS
    Self-similar traffic patterns have been observed in many measurements of Internet traffic. Self-similarity is very detrimental to the performance of packet networks and recent research has focused on understanding and reducing its amount in Internet traffic. TCP Reno has been identified as being one of the primary sources of self-similarity. We explore the potential of another version of TCP in this paper to reduce the degree of self-similarity in aggregated TCP traffic. We decompose both TCP Reno and TCP Vegas to demonstrate and explain their underlying mechanisms, and separately measure what effects congestion-avoidance and timeouts/exponential backoff mechanisms have on the self-similarity in aggregated TCP flows. We reveal how TCP Vegas reduces the degree of self-similarity and eventually completely eliminates it from aggregated TCP flows at low levels of packet loss. However, at high levels of packet loss we show that TCP Vegas is detrimental, because it increases the degree of aggregated TCP-flow self-similarity.
    Download PDF (549K)
  • Kazuhide Fukushima, Shinsaku Kiyomoto, Toshiaki Tanaka
    2007 Volume 2 Issue 1 Pages 356-367
    Published: 2007
    Released on J-STAGE: March 15, 2007
    JOURNAL FREE ACCESS
    Copyright protection is a major issue in online content-distribution services and many key-management schemes have been proposed for protecting content. Key-distribution processes impose large burdens even though the communications bandwidth itself is restricted in the distribution of mobile content provided to millions of users. Mobile devices also have low computational capacities. Thus, a new scheme of key management, where the load on the key-distribution server is optimal and loads on clients are practical, is required for services. Tree-based schemes aim at reducing the load on the server and do not take reducing the load on clients into account. The load on clients is minimized in a star-based scheme, on the other hand, while the load on the server increases in proportion to the number of clients. These structures are far from being scalable. We first discuss a relaxation of conventional security requirements for key-management schemes in this paper and define new requirements to improve the efficiency of the schemes. We next propose the τ-gradual key-management scheme. Our scheme satisfies the new security requirements and loads on the server, and it has far fewer clients than conventional schemes. It uses an intermediate configuration between that of a star- and a tree-structure that allows us to continuously change it by controlling the number of clients in a group, mmax. The scheme can be classified as τ-star-based, τ-tree-based, or τ-intermediate depending on the parameter, mmax. We then present a quantitative evaluation of the load on the server and clients using all our schemes based on practical assumptions. The load on the server and that on clients involves a trade-off with the τ-intermediate scheme. We can construct an optimal key-management structure according to system requirements using our schemes, while maintaining security. We describe a concrete strategy for setting parameter mmax. Finally, we present general parameter settings by which loads on both the server and clients using the τ-intermediate scheme are lower than those using the τ-tree-based scheme.
    Download PDF (365K)
  • Yujin Noishiki, Hidetoshi Yokota, Akira Idoue
    2007 Volume 2 Issue 1 Pages 368-376
    Published: 2007
    Released on J-STAGE: March 15, 2007
    JOURNAL FREE ACCESS
    In on-demand ad-hoc routing protocols, control packets are flooded into a network during path discovery. Especially in dense ad-hoc networks, these protocols generate a large number of broadcast packets that cause contention, packet collisions and battery power wastage in mobile nodes. We propose an efficient route establishment method that adaptively lowers re-broadcasting overhead based on the number of adjacent nodes and the number of routes that these nodes accommodate. Through simulation, we demonstrate that our proposed method is especially efficient in densely populated areas. It achieves efficiency by lowering the number of control packets for path discovery without causing a drop in the path discovery success ratio. Also, by taking path concentration into account, our method further improves packet delivery. We provide useful guidelines based on simulation results.
    Download PDF (535K)
  • Jing Cai, Tsutomu Terada, Takahiro Hara, Shojiro Nishio
    2007 Volume 2 Issue 1 Pages 377-388
    Published: 2007
    Released on J-STAGE: March 15, 2007
    JOURNAL FREE ACCESS
    As the recent advances in the wireless technologies and mobile terminals, mobile users equipped with mobile devices are able to access wireless services through 3G cellular network, WiFi hotspot, or WiMAX link, as well as through satellite digital broadcast or terrestrial digital broadcast. By effectively taking advantage of these complementary communication modes, we explore a new hybrid data delivery model, i.e., Hybrid Wireless Broadcast (HWB) model to benefit from the optimal combination of the push-based and pull-based broadcast and on-demand point-to-point wireless communication. The HWB model can provide a flexible and complementary information service in different bandwidths and service ranges, and greatly improve the responsibility, scalability, and efficiency of the system. The results of simulation study show the proposed HWB approach achieves a significant improvement in system performance.
    Download PDF (635K)
  • Damdinsuren Amarmend, Masayoshi Aritsugi, Yoshinari Kanamori
    2007 Volume 2 Issue 1 Pages 389-400
    Published: 2007
    Released on J-STAGE: March 15, 2007
    JOURNAL FREE ACCESS
    With the advent of mobile technology, broadcasting data over one or more wireless channels has been considered an excellent way to efficiently disseminate data to a large number of mobile users. In such a wireless broadcast environment, the minimization of both access and tuning times is an important problem because mobile devices usually have limited battery power. In this paper, we propose an index allocation method for data access over multiple channels. Our method first derives external index information from the scheduled data, and then allocates it over the channels, which have shorter broadcast cycles and hotter data items. Moreover, local exponential indexes with different parameters are built within each channel for local data search. Experiments are performed to compare the effectiveness of our approach with an existing approach. The results show that our method outperforms the existing method by 30% in average access time and by 16% in average tuning time when the data skew is high and data item size is much bigger than index node size.
    Download PDF (453K)
  • Toshihiko Yamakami
    2007 Volume 2 Issue 1 Pages 401-407
    Published: 2007
    Released on J-STAGE: March 15, 2007
    JOURNAL FREE ACCESS
    As a sub-day-scale behavior analysis, the length of sessions in multiple mobile Internet services was examined. Using mobile clickstreams with user identifiers, two analyses were performed: a preparatory study for timeout values in session identification in 2000 and a long-term observation of session lengths and clicks per session in 2001. The first study showed that 10 minutes is a suitable timeout value for the observed mobile web. The second produced inter-service comparisons and showed the effects of different mobile-Internet-specific factors. The limitations and challenges for mobile-clickstream-based session identification are also discussed.
    Download PDF (316K)
Information Systems and Applications
  • Takashi Maeno, Susumu Date, Yoshiyuki Kido, Shinji Shimojo
    2007 Volume 2 Issue 1 Pages 408-419
    Published: 2007
    Released on J-STAGE: March 15, 2007
    JOURNAL FREE ACCESS
    Demands on efficient drug design have been increasing with the advancement of computing technology and bioinformatics. A variety of information technologies pertaining to drug design have been proposed recently and such technology mostly contributes to drug design research. Molecular docking simulation is a promising application for drug design, and can be realized with current information technology. However although docking simulation and the related information technology have advanced in recent years, scientists still have difficulty finding a suitable parameter set of docking simulations for accuracy of simulation. The parameter-tuning step takes a long time, and existing computing technology can hardly assist in this step. This is because the parameter-tuning step involves factors that are difficult to automate with computers. In this paper, we propose a new architecture for assisting procedures that require the decisions of scientists, especially when they need to tune parameters in a docking simulation.
    Download PDF (540K)
feedback
Top