-
Tho Hoan Phaml, Dang Hung Tran, Tu Bao Ho, Kenji Satou, Gabriel Valien ...
2005 Volume 16 Issue 2 Pages
3-11
Published: 2005
Released on J-STAGE: July 11, 2011
JOURNAL
FREE ACCESS
Eukaryotic genomes are packaged by the wrapping of DNA around histone octamers to form nucleosomes. Nucleosome occupancy, acetylation, and methylation, which have a major impact on all nuclear processes involving DNA, have been recently mapped across the yeast genome using chromatin immunoprecipitation and DNA microarrays. However, this experimental protocol is laborious and expensive. Moreover, experimental methods often produce noisy results. In this paper, we introduce a computational approach to the qualitative prediction of nucleosome occupancy, acetylation, and methylation areas in DNA sequences. Our method uses support vector machines to discriminate between DNA areas with high and low relative occupancy, acetylation, or methylation, and rank
k-gram features based on their support for these DNA modifications. Experimental results on the yeast genome reveal genetic area preferences of nucleosome occupancy, acetylation, and methylation that are consistent with previous studies. Supplementary files are available from http://www.jaist.ac.jp/tran/nucleosome/.
View full abstract
-
Marcos J. Araúzo-Bravo, Akinori Sarai
2005 Volume 16 Issue 2 Pages
12-21
Published: 2005
Released on J-STAGE: July 11, 2011
JOURNAL
FREE ACCESS
A simple knowledge-based method for DNA atomic structure prediction from nucleic sequence is presented. We used free B-DNA crystal structures to estimate the distribution of trinucleotide base pairs and tetranucleotide base-pair steps conformational coordinates. We used these distributions as a basis to predict the 3D position of the non-hydrogen atoms of the nucleic bases of any arbitrary DNA sequence of any length. The only constraint imposed was that the structure is a B-DNA one with Watson-Crick complementary base pairs. The method was tested on not seen DNA structures with sequence lengths varying from 6bp to 12bp. The obtained predictions have RNISE around 0.5 Å for the translational conformational coordinates, and around 5° for the rotational. For the estimation of the nucleic base non-hydrogen atom coordinates the RMSE is around 1.1 Å. The knowledge-based method outperformed a technique based on genetic algorithms in the prediction of B-DNA structures.
View full abstract
-
Mitsuru Kato, Masao Nagasaki, Atsushi Doi, Satoru Miyano
2005 Volume 16 Issue 2 Pages
22-31
Published: 2005
Released on J-STAGE: July 11, 2011
JOURNAL
FREE ACCESS
Automatic graph drawing function for biopathways is indispensable for biopathway databases and softwares. This paper proposes a new grid-based algorithm for biopathway layout that considers (a) edge-edge crossing, (b) node-edge crossing, (c) distance measures between nodes, as its costs, and (d) subcellular localization information from Gene Ontology, as its constraints. For this algorithm, we newly define cost functions, devise an efficient method for computing the costs (a)-(c) by employing a matrix representing the difference between two layouts, and take a steepest descent method for searching locally optimal solutions and multi-step layout method for finding better solutions. We implemented this algorithm on Cell Illustrator which is a biopathway modeling and simulation software. The algorithm is applied to a signal transduction pathway of apoptosis induced by
fas ligand. We compare our layout with that of the grid-based algorithm by Li and Kurata (Bioinformatics 21 (9): 2036-2042, 2005). The result shows that our algorithm reduces edgeedge crossings and node-edge crossings, and solves the “isolated island problem”, that is, despite the intension, some groups of nodes are apart from other nodes in the layout. As a result, the biological understandability of the layout is fairly improved.
View full abstract
-
Olivo Miotto, Tin Wee Tan, Vladimir Brusic
2005 Volume 16 Issue 2 Pages
32-44
Published: 2005
Released on J-STAGE: July 11, 2011
JOURNAL
FREE ACCESS
Curators of biological databases transfer knowledge from scientific publications, a laborious and expensive manual process. Machine learning algorithms can reduce the workload of curators by filtering relevant biomedical literature, though their widespread adoption will depend on the availability of intuitive tools that can be configured for a variety of tasks. We propose a new method for supporting curators by means of document categorization, and describe the architecture of a curator-oriented tool implementing this method using techniques that require no computational linguistic or programming expertise. To demonstrate the feasibility of this approach, we prototyped an application of this method to support a real curation task: identifying PubMed abstracts that contain allergen cross-reactivity information. We tested the performance of two different classifier algorithms (CART and ANN), applied to both composite and single-word features, using several feature scoring functions. Both classifiers exceeded our performance targets, the ANN classifier yielding the best results. These results show that the method we propose can deliver the level of performance needed to assist database curation.
View full abstract
-
Joséc. Clemente, Kenji Satou, Gabriel Valiente
2005 Volume 16 Issue 2 Pages
45-55
Published: 2005
Released on J-STAGE: July 11, 2011
JOURNAL
FREE ACCESS
There has been much interest in the structural comparison and alignment of metabolic pathways. Several techniques have been conceived to assess the similarity of metabolic pathways of different organisms. In this paper, we show that the combination of a new heuristic algorithm for the comparison of metabolic pathways together with any of three enzyme similarity measures (hierarchical, information content, and gene ontology) can be used to derive a metabolic pathway similarity measure that is suitable for reconstructing phylogenetic relationships from metabolic pathways. Experimental results on the Glycolysis pathway of 73 organisms representing the three domains of life show that our method outperforms previous techniques.
View full abstract
-
Naoki Sato, Masayuki Ishikawa, Makoto Fujiwara, Kintake Sonoike
2005 Volume 16 Issue 2 Pages
56-68
Published: 2005
Released on J-STAGE: July 11, 2011
JOURNAL
FREE ACCESS
Chloroplasts originate from ancient cyanobacteria-like endosymbiont. Several tens of chloroplast proteins are encoded by the chloroplast genome, while more than hundreds are encoded by the nuclear genome in plants and algae, but the exact number and identity of nuclear-encoded chloroplast proteins are still unknown. We describe here attempts to identify a large number of unidentified chloroplast proteins of endosymbiont origin (CPRENDOs). Our strategy consists of whole genome protein clustering by the homolog group method, which is optimized for organismnumber, and phylogenetic profiling that extract groups conserved in cyanobacteria and photosynthetic eukaryotes. An initial minimal set of CPRENDOs was predicted without targeting prediction and experimentally validated.
View full abstract
-
Kishore R. Sakharkar, Vincent T.K. Chow
2005 Volume 16 Issue 2 Pages
69-75
Published: 2005
Released on J-STAGE: July 11, 2011
JOURNAL
FREE ACCESS
Niche dependent differential gene loss and overlapping genes have been proposed as means of achieving genome reduction by retaining indispensable genes and compressing maximum amount of information in available sequence space. Herein, we analyzed the differential gene loss and overlapping genes in bacterial genomes with different lifestyles. Our results clearly suggest that gene loss and overlapping genes could be a result of evolutionary pressure to minimize genome size. Comparative analysis of the genomes shows that the genomes display marked similarities in patterns of protein length and frequency. It is clear from our analysis that habitat is a major factor contributing to genome reduction. These comparisons increase our knowledge of the forces that drive the extreme specialization of the bacteria and its association to the host.
View full abstract
-
Mihoko Otake, Toshihisa Takagi
2005 Volume 16 Issue 2 Pages
76-85
Published: 2005
Released on J-STAGE: July 11, 2011
JOURNAL
FREE ACCESS
The importance of modeling and simulation of biological process is growing for further understanding of living systems at all scales from molecular to cellular, organic, and individuals. In the field of neuroscience, there are so called platform simulators, the de-facto standard neural simulators. More than a hundred neural models are registered on the model database. These models are executable in corresponding simulation environments. But usability of the registered models is not sufficient. In order to make use of the model, the users have to identify the input, output and internal state variables and parameters of the models. The roles and units of each variable and parameter are not explicitly defined in the model files. These are suggested implicitly in the papers where the simulation results are demonstrated. In this study, we propose a novel method of reassembly and interfacing models registered on biological model database. The method was applied to the neural models registered on one of the typical biological model database, ModelDB. The results are discribed in detail with the hippocampal pyramidal neuron model. The model is executable in NEURON simulator environment, which demonstrates that somatic EPSP amplitude is independent of synapse location. Input and output parameters and variables were identified successfully, and the results of the simulation were recorded in the organized form with annotations.
View full abstract
-
Shugo Hamahashi, Shuichi Onami
2005 Volume 16 Issue 2 Pages
86-93
Published: 2005
Released on J-STAGE: July 11, 2011
JOURNAL
FREE ACCESS
The spindle orientation is a crucial piece of information to understand the development of embryo. The spindle forms during cell division and the cell divides along the spindle axis. Spindle orientation was measured in many different mutant embryos of
Caenrohabditis elegans. However, the objectivity and the productivity of these measurements were low because these measurements were made manually. Here we present a system that automates the measurement of the spindle orientation in
C. elegans embryo. Automation increases the objectivity and productivity of the measurement. We confirmed the applicability of the system by applying it to spindles during the second divisions in wild-type and mutant
C. elegans embryos.
View full abstract
-
A New Approach from the Poly-tRNA Theory
Koji Ohnishi, Madoka Ohshima, Naotaka Furuichi
2005 Volume 16 Issue 2 Pages
94-103
Published: 2005
Released on J-STAGE: July 11, 2011
JOURNAL
FREE ACCESS
Poly-tRNA theory have revealed that the tRNA gene-clusters in the
Bacillus subtilis trrnD-and
rrnB-operons are relics of early peptide-synthesizing RNA apparatus. The
trrnD-type and
rrnB-type poly-tRNA models were re-analyzed by using recent databases. The results elucidated that the 16 amino acid (aa)-
trrnD- and the 21 aa-
rrnB-peptides (whose aa sequences are in the order of aa specificities of tRNAs in the respective tRNA cluster) are really relics of earliest peptides encoded by most primitive mRNAs,
trrnD-mRNA and
rrnB-mRNA, which are homologous to tRNA
Gly and tRNA
His, respectively. Genes encoding various protein superfamilies (including pgtB protein, glycyl-tRNA synthetase alpha, C-type lectin, F1-ATP-synthase gamma, etc) were concluded to have derived from tRNA
Gly-tRNA
cys-tRNA
Leu region (including trrnD-mRNA region) in the
trrnD-poly-tRNA. Genes for another group of protein superfamilies (including adenylate kinase, glyceraldehyde-3-phosphate dehydrogenase, helix-turn-helix DNA-binding domains, etc.) were found to have derived
rrnB-mRNA, which is most plausibly homologous to a region containing the tRNA
His of the
rrnB-poly-tRNA. Thus Proto-tRNA (
Gly) reconstructed from tRNA
Gly and other tRNAs strongly suggested that proto-tRNA was most plausibly a viroid-like possibly self-cleavable replicable ribozyme possessing a possible hammerhead-like structure.
View full abstract
-
Biased Fragment Replacement for Searching Low-Energy Conformation
Sung-Joon Park
2005 Volume 16 Issue 2 Pages
104-113
Published: 2005
Released on J-STAGE: July 11, 2011
JOURNAL
FREE ACCESS
A novel fragment replacement strategy for the fragment-based protein structure prediction is proposed. Despite the recent advance of
de novo prediction of protein tertiary structure, intricate protein topologies still exist at unsatisfactory prediction quality. Although this difficulty is in part due to the accuracy of energy functions, it also relates to the search ability of sampling methods. To enhance the global optimization method that finds low-energy conformations, this study tests a biased sampling approach. The proposed approach is inspired by the fact that local structures of a protein have geometrical rigidity and flexibility. For capturing the pivotal local structures to generate various topologies, this approach first measures the energetic fluctuation of target fragments on dihedral angles of a protein, and then the quantity is converted to probability used by probabilistic selection of fragment replacement. Due to the requirement of the dihedral angles, a Genetic Algorithm implements the proposed idea, and experimental results show that the GA is capable of providing the dihedral angles as
template-like proteins. The results suggest that the proposed approach can reach low-energy conformations with comparable prediction quality to that of an existing method. Interestingly, the low-energy states were associated with the frequent replacement of fragments in natively-coil regions. However, unfavorable compactification of the predicted models was observed. All experimental data are available at http://www.proteinsilico.org/PRO/.
View full abstract
-
Luonan Chen, Ling-Yun Wu, Ruiqi Wang, Yong Wang, Shihua Zhang, Xiang-S ...
2005 Volume 16 Issue 2 Pages
114-124
Published: 2005
Released on J-STAGE: July 11, 2011
JOURNAL
FREE ACCESS
We propose a novel method for solving the structure comparison problem for proteins, based on a decomposition technique. We define the structure alignment as a multi-objective optimization problem with both discrete and continuous variables, i.e., maximizing the number of aligned atoms and minimizing their root mean square distance. By controlling a single distance-related parameter, theoretically we can obtain a variety of optimal alignments corresponding to different optimal matching patterns, i.e., from a large matching portion to a small portion. The number of variables in our algorithm increases with the number of atoms of protein pairs in almost a linear manner. The software is available upon request, or from http://zhangroup.aporc.org/bioinfo/samo/.
View full abstract
-
Yutaka Ueno, Katsunori Isono, Katsutoshi Takahashi, Yukio Shimonohara, ...
2005 Volume 16 Issue 2 Pages
125-135
Published: 2005
Released on J-STAGE: July 11, 2011
JOURNAL
FREE ACCESS
We examined the statistical performance of clustering single particle molecular images by bottom-up clustering, a hierarchical algorithm, using simulated protein images with a low signalto-noise ratio. Using covariance for the measure of similarity together with the iterative alignment, our method was found to be fairly robust against noise. Clustering tests of four known protein structures were performed at three levels of noise and with three levels of smoothing. A significant effect of smoothing was confirmed in our results for images with noise suggesting an effective degree of smoothing depending on the noise and structural features of the target molecule. The consistency of clustering results was evaluated by the average solid angle of projection, and the precision of our clustering results was checked by the average image correlation between the obtained cluster image and the true projection. Once image features are extracted appropriately, the average solid angle also represents the degree of clustering precision.
View full abstract
-
Woo-Hyuk Jang, Dong-Soo Han, Hong-Soog Kim, Sung-Doke Lee
2005 Volume 16 Issue 2 Pages
136-147
Published: 2005
Released on J-STAGE: July 11, 2011
JOURNAL
FREE ACCESS
Domain Combination based Protein-Protein Interaction Prediction (DCPPIP) method is revealed to show outstanding prediction accuracy in Yeast proteins. However, it is not yet apparent whether the method is still valid and can achieve comparable prediction accuracy for the proteins in other species. In this paper, we report the validation results of applying the DCPPIP method for Fly and Human proteins. We also report the results of inter-species validation, in which protein interaction and domain data of other species are used as learning set. 10, 351 interacting protein pairs are used for the validation for Fly, 2, 345 protein pairs for Human. 80% of the data are used as learning sets and 20% are reserved as test sets. High prediction accuracies (Fly: sensitivity =17%, specificity =92%, Human: sensitivity=96%, specificity=95%) are achieved in both Fly and Human cases. Interactions of proteins in Human, Mouse,
H. pylori,
E. coli, and
C. elegans are predicted and validated using the protein interaction and domain data in Yeast, Fly, and the combination of Yeast and Fly respectively. Again, good prediction accuracy is achieved when the test protein pair has common domains with the proteins in a learning set of proteins. A notion of
Domain Overlapping Rate (DOR) among species is newly developed in this paper and the correlation between DOR and prediction accuracy is examined. According to out test results, there exists fairly obvious correlation between DOR and prediction accuracy.
View full abstract
-
Carlos A. Del Carpio M., Abdul Rajjak Shaikh, Eichiro Ichiishi, Michih ...
2005 Volume 16 Issue 2 Pages
148-160
Published: 2005
Released on J-STAGE: July 11, 2011
JOURNAL
FREE ACCESS
Hitherto analyses of protein complexes are frequently confined to the changes in the interface of the protein subunits undergoing interaction, while the holistic picture of the protein monomers' structure transformation, or the pervasive rigidity adopted by the newly formed complex are most often than not improperly evaluated in spite of the multiple and deep insights that they can yield about the interaction process itself at the molecular level, or at the higher level of genomic functional analyses for which relevant systems biological information can be obtained. To address this aspect of protein-protein interaction we propose in this work a newly developed algorithm that is based on graph theoretical instances and makes possible the evaluation of the changes in the flexibility of the interacting molecules and the rigidity adopted at complex formation. Since one can also figure out the opposite process, i.e. that in which the complex decomposes into its constituent subunits, each of which may accomplish another vital role in the organism, the methodology proposed here is also able to address such problem. The algorithm we propose performs a rigidity and/or flexibility evaluation of every node (atom) on the network constituted by the entire set of intra and intermolecular inter-atomic interactions. Comparison of flexible or rigid molecular regions or domains within the complex with those in the respective isolated monomers leads to quantification of the loss (or gain) in the number of degrees of freedom at complex formation and their effects on protein complex formation mechanisms. This index is also valuable in the identification of collective motions within the protein that may play a critical role in the process of complex formation, and the influences they may have in the behavior and function of the complex (as well as the subunits constituting it) within the organism. Furthermore, the methodology, embedded in protein docking algorithms allows the development of a framework for categorizing and ranking decoys output by broadly used grid scoring type algorithms, one of which is the system for protein-protein interaction system MIAX that has been under continuous development in recent years.
View full abstract
-
Kazuya Sumikoshi, Tohru Terada, Shugo Nakamura, Kentaro Shimizu
2005 Volume 16 Issue 2 Pages
161-173
Published: 2005
Released on J-STAGE: July 11, 2011
JOURNAL
FREE ACCESS
We describe a fast protein-protein docking algorithm using a series expansion in terms of newly designed bases to efficiently search the entire six-dimensional conformational space of rigid body molecules. This algorithm is an ab initio docking algorithm designed to list candidates of putative conformations from a global conformational space for unbound docking. In our algorithm, a scoring function is constructed from terms that are the inner products of two scalar fields expressing individual molecules. The mapping from a molecule to a scalar field can be arbitrarily defined to express an energy term. Since this scoring scheme has the same expressiveness as that of a method using a fast Fourier transform (FFT), it has the flexibility to introduce various physicochemical energies. Currently, we are using scalar fields that approximate desolvation free energy and steric hindrance energy. Fast calculation of the scoring function for each conformation of the six-dimensional search space is realized by expansion of the fields in terms of basis functions which are combinations of spherical harmonics and modified Legendre polynomials, and the use of only low-order terms, which carry most of the information on the scalar field. We have implemented this algorithm and evaluated the computation time and precision by using actual protein structure data of complexes and their monomers. This paper presents the results for six unbound cases and in all the cases we obtained at least one conformation close to the native structures (interface RMSD<3.0Å) within the top 1000 candidates with about 40 seconds of computation time using a single Pentium4 2.4 GHz CPU.
View full abstract
-
Hisayuki Horai, Kouichi Doi, Hirofumi Doi
2005 Volume 16 Issue 2 Pages
174-182
Published: 2005
Released on J-STAGE: July 11, 2011
JOURNAL
FREE ACCESS
Our research activity of making the lexicon of relatively short oligopeptides has been one of the first steps to view the world of proteome from the perspective of oligopeptides. We propose a new method for the prediction of protein function, especially GeneOntology terms (GO terms), based on statistical characteristics of oligopeptides as an application of the lexicon. In the lexicon, a known function of a protein is inherited to its oligopeptides, and the correspondence between oligopeptides and the function is calculated in the whole proteins. In our method, unknown functions of proteins are predicted by means of the correspondence automatically. We measured the prediction performance using the 28, 520 whole human proteins registered in RefSeq for several GO terms by recall-precision graphs. The GO terms include ‘membrane’, ‘nucleus’, ‘ATP binding’, ‘hydorolase activity’, ‘GTP binding’, ‘intracellular signaling cascade’ and ‘ubiquitin cycle’. In most cases, it scores 70% recall with 80% precision. The prediction for ATP binding and GTP binding results in quite high performance: it scores 80% recall with 80% precision. Even in the worst case (ubiquitin cycle), it scores 62.6% recall with 80% precision. These results suggest that the proposed method is quite efficient for predicting GO terms.
View full abstract
-
A New Algorithm for Repeated Measurements in Gene Expression Data
Shinya Matsumoto, Ken-ichi Aisaki, Jun Kanno
2005 Volume 16 Issue 2 Pages
183-194
Published: 2005
Released on J-STAGE: July 11, 2011
JOURNAL
FREE ACCESS
The availability of whole-genome sequence data and high-throughput techniques such as DNA microarray enable researchers to monitor the alteration of gene expression by a certain organ or tissue in a comprehensive manner. The quantity of gene expression data can be greater than 30, 000 genes per one measurement, making data clustering methods for analysis essential. Biologists usually design experimental protocols so that statistical significance can be evaluated; often, they conduct experiments in triplicate to generate a mean and standard deviation. Existing clustering methods usually use these mean or median values, rather than the original data, and take significance into account by omitting data showing large standard deviations, which eliminates potentially useful information. We propose a clustering method that uses each of the triplicate data sets as a probability distribution function instead of pooling data points into a median or mean. This method permits truly unsupervised clustering of the data from DNA microarrays.
View full abstract
-
Jung Hun Oh, Jean Gaol, Animesh Nandi, Prem Gurnani, Lynne Knowles, Jo ...
2005 Volume 16 Issue 2 Pages
195-204
Published: 2005
Released on J-STAGE: July 11, 2011
JOURNAL
FREE ACCESS
Surface-enhanced laser desorption/ionization time-of-flight (SELDI-TOF) mass spectrometry data has been increasingly analyzed for identifying biomarkers to help early detection of the disease. Ovarian cancer commonly recurs at the rate of 75% within a few months or several years later after standard treatment. Since recurrent ovarian cancer is relatively difficult to be diagnosed and small tumors generally respond better to treatment, new methods for the detection of early relapse in ovarian cancer are urgently needed. Here, we propose a new algorithm SVM-MB/RFE (SVMMarkov Blanket/Recursive Feature Elimination) based on SVM-RFE, which identifies biomarkers for predicting the early recurrence of ovarian cancer. In this approach, we first apply t-test for feature pruning and then binning using 5-fold cross validation. Finally, 58 peaks are obtained from 27000 of the raw data. Such dramatically reduced features relax the computational burden in the next step of our algorithm. We compare the performance of three feature selection algorithms and demonstrate that SVM-MB/RFE outperforms other methods.
View full abstract
-
Nasimul Noman, Hitoshi Iba
2005 Volume 16 Issue 2 Pages
205-214
Published: 2005
Released on J-STAGE: July 11, 2011
JOURNAL
FREE ACCESS
This paper proposes an improved evolutionary method for constructing the underlying network structure and inferring effective kinetic parameters from the time series data of gene expression using decoupled S-system formalism. We employed Trigonometric Differential Evolution (TDE) as the optimization engine of our algorithm for capturing the dynamics in gene expression data. A more effective fitness function for attaining the sparse structure, which is the hallmark of biological networks, has been applied. Experiments on artificial genetic network show the power of the algorithm in constructing the network structure and predicting the regulatory parameters. The method is used to evaluate interactions between genes in the SOS signaling pathway in
Escherichia coli using gene expression data.
View full abstract
-
The Gene Regulatory Networks Case
Raffaella Gentilini
2005 Volume 16 Issue 2 Pages
215-224
Published: 2005
Released on J-STAGE: July 11, 2011
JOURNAL
FREE ACCESS
We consider the problem of integrating different systems biology formalisms, namely, the process calculi based formalism, the modeling approach based on systems of differential equations, and the one relying on automata-like descriptions (and model checking).
Specifically, we define automatic procedures for translating stochastic π-calculus descriptions of
gene regulatory networks to
S-systems differential equations. Tools for extracting and reasoning on (approximate) solutions of S-systems have been recently developed in the literature, and can be exploited to establish a link with automata-based systems biology and model checking techniques.
View full abstract
-
Dace Ruklisa, Alvis Brazma, Juris Viksna
2005 Volume 16 Issue 2 Pages
225-236
Published: 2005
Released on J-STAGE: July 11, 2011
JOURNAL
FREE ACCESS
We study the Finite State Linear Model (FSLM) for modelling gene regulatory networks proposed by A. Brazma and T. Schlitt in [4]. The model incorporates biologically intuitive gene regulatory mechanism similar to that in Boolean networks, and can describe also the continuous changes in protein levels. We consider several theoretical properties of this model; in particular we show that the problem whether a particular gene will reach an active state is algorithmically unsolvable. This imposes some practical difficulties in simulation and reverse engineering of FSLM networks. Nevertheless, our simulation experiments show that sufficiently many of FSLM networks exhibit a regular behaviour and that the model is still quite adequate to describe biological reality.
We also propose a comparatively efficient O (2
KnK+1M2Km log m) time algorithm for reconstruction of FSLM networks from experimental data. Experiments on reconstruction of random networks are performed to estimate the running time of the algorithm in practice, as well as the number of measurements needed for successful network reconstruction.
View full abstract
-
Dan He, Abdullah N. Arslan
2005 Volume 16 Issue 2 Pages
237-246
Published: 2005
Released on J-STAGE: July 11, 2011
JOURNAL
FREE ACCESS
The
constrained pairwise sequence alignment (CPSA) problem aims to align two given sequences by aligning their similar subsequences in the same region under the guidance of a given pattern (constraint). Let the lengths of the sequences be
m, and
n where
n ≤
m, and let
r ≥
n be the length of the given pattern. The optimum constrained pairwise alignment score can be computed using
O (
rn) space by a naive dynamic programming solution. If an optimal alignment path is desired then the space requirement of the naive dynamic programming algorithm is
O (
rnm). There is a divide-and-conquer algorithm that reduces the memory requirement of finding an optimal alignment for the
CPSA problem to
O(
rn). In this paper, we present a space-efficient
CPSA algorithm that returns an optimal alignment. Our analysis on real protein sequences suggests that our algorithm requires only
O(
n) space in practice. This algorithm is not only space efficient but also very fast. A generalization of the CPSA problem for multiple sequences is called the constrained multiple sequence alignment (
CMSA) problem. Our CPSA algorithm also improves the space requirement of progressive
CMSA algorithms that use solutions of
CPSA problems.
View full abstract
-
Hongwei Wu, Fenglou Mao, Zhengchang Su, Victor Olman, Ying Xu
2005 Volume 16 Issue 2 Pages
247-259
Published: 2005
Released on J-STAGE: July 11, 2011
JOURNAL
FREE ACCESS
We present a computational method for prediction of functional modules that can be directly applied to the newly sequenced microbial genomes for predicting gene functions and the component genes of biological pathways. We first quantify the functional relatedness among genes based on their distribution (i. e., their existences and orders) across multiple microbial genomes, and obtain a gene network in which every pair of genes is associated with a score representing their functional relatedness. We then apply a threshold-based clustering algorithm to this gene network, and obtain modules for each of which the number of genes is bounded from above by a pre-specified value and the component genes are more strongly functionally related to each other than genes across the predicted modules. Particularly, when the module size is bounded by 130, we obtain 167 functional modules covering 813 genes for
Escherichia coli K12, and 138 functional modules covering 731 genes for
Bacillus subtilis subsp. subtilis str. 168. We have used the gene ontology (GO) information to assess the prediction results. The GO similarities among the genes of the same functional module are compared with the GO similarities among the genes that are randomly clustered together. This comparison reveals that our predicted functional modules are statistically and biologically significant, and the genes of the same functional module share more commonality in terms of
biological process than in terms of
molecular function or
cellular component. We have also examined the predicted functional modules that are common to both
Escherichia coli K12 and
Bacillus subtilis subsp.
subtilis str. 168, and provide explanations for some functional modules.
View full abstract
-
Xiao-Li Li, Chuan-Sheng Foo, Soon-Heng Tan, See-Kiong Ng
2005 Volume 16 Issue 2 Pages
260-269
Published: 2005
Released on J-STAGE: July 11, 2011
JOURNAL
FREE ACCESS
While recent technological advances have made available large datasets of experimentallydetected pairwise protein-protein interactions, there is still a lack of experimentally-determined protein complex data. To make up for this lack of protein complex data, we explore the mining of existing protein interaction graphs for protein complexes. This paper proposes a novel graph mining algorithm to detect the dense neighborhoods (highly connected regions) in an interaction graph which may correspond to protein complexes. Our algorithm first locates local cliques for each graph vertex (protein) and then merge the detected local cliques according to their affinity to form maximal dense regions. We present experimental results with yeast protein interaction data to demonstrate the effectiveness of our proposed method. Compared with other existing techniques, our predicted complexes can match or overlap significantly better with the known protein complexes in the MIPS benchmark database. Novel protein complexes were also predicted to help biologists in their search for new protein complexes.
View full abstract
-
Anatolij P. Potapov, Nico Voss, Nicole Sasse, Edgar Wingender
2005 Volume 16 Issue 2 Pages
270-278
Published: 2005
Released on J-STAGE: July 11, 2011
JOURNAL
FREE ACCESS
We present a first attempt to evaluate the generic topological principles underlying the mammalian transcriptional regulatory networks. Transcription networks,
TN, studied here are represented as graphs where vertices are genes coding for transcription factors and edges are causal links between the genes, each edge combining both gene expression and
trans-regulation events. Two transcription networks were retrieved from the TRANSPATH
® database: The first one,
TN_RN, is a ‘complete’ transcription network referred to as a reference network. The second one,
TN_p53, displays a particular transcriptional sub-network centered at
p53 gene. We found these networks to be fundamentally non-random and inhomogeneous. Their topology follows a power-law degree distribution and is best described by the scale-free model. Shortest-path-length distribution and the average clustering coefficient indicate a small-world feature of these networks. The networks show the dependence of the clustering coefficient on the degree of a vertex, thereby indicating the presence of hierarchical modularity. Clear positive correlation between the values of betweenness and the degree of vertices has been observed in both networks. The top list of genes displaying high degree and high betweennes, such as
p53, c-fos, c-jun and
c-myc, is enriched with genes that are known as having tumor-suppressor or proto-oncogene properties, which supports the biological significance of the identified key topological elements.
View full abstract
-
Pierre Baldi
2005 Volume 16 Issue 2 Pages
281-285
Published: 2005
Released on J-STAGE: July 11, 2011
JOURNAL
FREE ACCESS
-
Getting the Best of Two Worlds
Gunnar Von Heijne
2005 Volume 16 Issue 2 Pages
286
Published: 2005
Released on J-STAGE: July 11, 2011
JOURNAL
FREE ACCESS
Membrane protein research has gained a lot of momentum in recent years: high-resolution structures are being produced at an increasing rate, membrane proteomics is coming on line, and membrane proteins are recognized as drug targets of major importance. Bioinformatics has always been an integral part of the developments in the field, and today provides the tools necessary to identify the membrane complement of proteomes and to predict topologies and - in lucky cases - full 3D models of membrane proteins.
As in so many other areas, much is to be gained from a tighter integration between bioinformatics and experimental studies of membrane proteins. In our own work, we are reaching towards proteomewide studies of membrane protein, an area where experimental and theoretical approaches must be combined to push forward.
View full abstract
-
Shigeru Kondo
2005 Volume 16 Issue 2 Pages
287-291
Published: 2005
Released on J-STAGE: July 11, 2011
JOURNAL
FREE ACCESS
-
Reinhart Heinrich, Hiroshi Mamitsuka
2005 Volume 16 Issue 2 Pages
v
Published: 2005
Released on J-STAGE: July 11, 2011
JOURNAL
FREE ACCESS