IPSJ Transactions on Bioinformatics

Detecting Fusion Genes in Long-Read Transcriptome Sequencing Data with FUGAREC

Keigo Masuda, Yoshiaki Sota, Hideo Matsuda

Article type: Original Paper
Subject area: Original Paper
2024 Volume 17 Pages 1-9
Published: 2024
Released on J-STAGE: February 22, 2024

DOIhttps://doi.org/10.2197/ipsjtbio.17.1

JOURNAL FREE ACCESS

Show abstractHide abstract

Fusion genes are important targets and biomarkers for cancer therapy. Methods of accurately detecting fusion genes are needed in clinical practice. RNA-seq is widely used to detect active fusion genes. Long-read RNA-seq can sequence the full length of mRNA, and long-read RNA-seq is expected to detect fusion genes that cannot be detected by short-read RNA-seq. However, long-read RNA-seq has high basecalling error rates, and gap sequences may occur near the breakpoints of long reads that are not aligned to the genome. When gap sequences occur, it is impossible to identify the correct fusion gene or breakpoint using existing methods. To address these challenges in fusion gene detection, we introduce a novel algorithm, FUGAREC (fusion detection with gap re-alignment and breakpoint clustering). FUGAREC uniquely combines gap sequence re-alignment with breakpoint clustering. This approach not only enhances the detection of previously undetectable fusion genes but also significantly reduces false positives. We demonstrate that FUGAREC has high fusion gene detection performance on both simulated data and sequenced data of a breast cancer cell line.

View full abstract

Download PDF (545K)
Exploring The Interplay Between Scoring Functions and Physico-chemical Properties in Antibody-antigen Docking

Sangeetha Ratnayake, Axel Martinelli, Toshinori Endo, Naoki Osada

Article type: Original Paper
Subject area: Original Paper
2024 Volume 17 Pages 10-17
Published: 2024
Released on J-STAGE: February 22, 2024

DOIhttps://doi.org/10.2197/ipsjtbio.17.10

JOURNAL FREE ACCESS

Show abstractHide abstract

The advent of antibody therapy has brought about a change in the treatment of diseases. The efficacy of antibody modeling relies on the intricate atomic interactions between antibodies and antigens. Traditional methods for determining antibody structures, such as X-ray crystallography, are costly and time-consuming. Computational docking offers a faster and more cost-effective approach to obtaining complex antibody and antigen complexes even in challenging scenarios. Rosetta, a widely employed software for protein structure modeling, incorporates a scoring function specifically tailored for modeling antibody-antigen interactions. The unique characteristics of the antibody-antigen interface can result in inaccurate predictions. Therefore, it is essential to understand the existing scoring function and the behavior of the antibody-antigen interface. In this study, we evaluated specific parameters within Rosetta-derived scoring functions, with a particular focus on the energy landscape of the structures they generated. We found that performance in antibody-antigen docking simulations could be enhanced by omitting parameters related to solvation. Also, we delved into the physico-chemical properties of antibody-antigen interfaces, paying special attention to the complementarity-determining regions and epitopes. Our exploration helped identify certain parameters that significantly influence docking simulation performance. These insights pave the way for the creation of more accurate scoring functions tailored for specific antibody-antigen interactions.

View full abstract

Download PDF (412K)
A Novel Approach to Detection Algorithms for Artifacts in Clinical Endoscopy

Seiryo Watanabe, Hironori Shigeta, Shigeto Seno, Hideo Matsuda

Article type: Original Paper
Subject area: Original Paper
2024 Volume 17 Pages 18-26
Published: 2024
Released on J-STAGE: April 30, 2024

DOIhttps://doi.org/10.2197/ipsjtbio.17.18

JOURNAL FREE ACCESS

Show abstractHide abstract

This study explores the comparative accuracy of two state-of-the-art algorithms, YOLOv3 and faster region-based convolutional neural network (R-CNN), in detecting endoscopy artifacts using the EAD2019 dataset. YOLOv3, primarily used for real-time tasks, and Faster R-CNN, which employs a two-step object detection process, exhibit variable performance based on the object characteristics. The analysis performed in this study focuses on identifying the objects or classes where each algorithm performs better. We conduct experiments to support our findings. We introduce a novel metric that quantifies the difference in average pixel intensities inside and outside the bounding boxes of detected objects. This metric forms the basis of a proposed ensemble method, allowing the method to effectively utilize either YOLOv3 or Faster R-CNN, depending on the characteristics of each class. The proposed method demonstrates an improved average precision score compared to using either algorithm separately. This research provides valuable insights into object detection in endoscopy, potentially enhancing artifact detection accuracy in medical imaging.

View full abstract

Download PDF (4378K)
Protein-compound Interaction Prediction Using Microbial Chemical Communication Network

Hongyi Shen, Yutaka Saito

Article type: Original Paper
Subject area: Original Paper
2024 Volume 17 Pages 27-32
Published: 2024
Released on J-STAGE: April 30, 2024

DOIhttps://doi.org/10.2197/ipsjtbio.17.27

JOURNAL FREE ACCESS

Show abstractHide abstract

Protein-compound interaction prediction is an important problem in drug discovery. Numerous machine learning methods have been proposed using protein sequences and compound structures as features. Several methods have used biological network information as additional features including protein-protein interactions and compound bioactivities. However, previous studies have only used network data from mammals such as human and mouse. Here we develop a new method for protein-compound interaction prediction that uses features learned from the relationships between microorganisms and secondary metabolites in nature (microbial chemical communication network; MCCN). We used node2vec representation learning to extract compound features from the MCCN, and deep canonical correlation analysis (CCA) to obtain the features for compounds not included in the MCCN. By incorporating these MCCN-derived features into an existing protein-compound interaction prediction method, we showed that prediction performance was improved in several benchmark experiments. We also discussed how to improve our method by incorporating microbiome co-occurrence information into the MCCN.

View full abstract

Download PDF (2808K)
Segmentation of Mouse Brain Slices with Unsupervised Domain Adaptation Considering Cross-sectional Locations

Yuki Shimojo, Kazuki Suehara, Tatsumi Hirata, Yukako Tohsato

Article type: Original Paper
Subject area: Original Paper
2024 Volume 17 Pages 33-39
Published: 2024
Released on J-STAGE: April 30, 2024

DOIhttps://doi.org/10.2197/ipsjtbio.17.33

JOURNAL FREE ACCESS

Show abstractHide abstract

Images of mouse brain slices, obtained under slightly different experimental conditions, are available in 84 datasets in the NeuroGT database (https://ssbd.riken.jp/neurogt/). Our goal was to obtain semantic segmentation results for eight brain anatomical regions. However, out of 84 datasets, only one dataset had true labels that could be used to train a convolutional neural network (CNN), and it was incomplete (131 out of 162 images). A segmentation model trained with the labeled images was less accurate on other images obtained under different experimental conditions because of differences of the image properties. We therefore tried Unsupervised Domain Adaptation (UDA), wherein the parameters of the CNN trained on the labeled images (source) were transferred to the unlabeled images (target). We used the positional information of the sample slices associated with each image to propose a novel loss function that approximated the class occurrence probabilities of segmentation results obtained from source and target images of brain samples at similar sliced locations, and we introduced it into the UDA. The proposed UDA method achieved an mIoU of 78.34%, which was 8% more accurate than the previous UDA methods such as Contrastive Learning and Self-Training (CLST) and Maximum Classifier Discrepancy (MCD). We demonstrated experimentally that the proposed method was useful for segmenting biomedical images with a small amount of incomplete training data.

View full abstract

Download PDF (4956K)
Advancing Antibody-antigen Interface Analysis in Docking Scoring Functions for Precision Docking Analysis

Sangeetha Ratnayake, Axel Martinelli, Toshinori Endo, Naoki Osada

Article type: Original Paper
Subject area: Original Paper
2024 Volume 17 Pages 40-47
Published: 2024
Released on J-STAGE: April 30, 2024

DOIhttps://doi.org/10.2197/ipsjtbio.17.40

JOURNAL FREE ACCESS

Show abstractHide abstract

Molecular docking simulations utilizing scoring functions are pivotal for assessing the stability of complex formations. The unique biochemical characteristics of antibody-antigen interfaces, however, present challenges in applying general parameter sets of scoring functions to these molecules, necessitating the customization of the scoring function to enhance prediction accuracy for structural configurations and binding affinities. In response to this, we have developed models within the Rosetta software framework, widely recognized for its utility in predicting antigen-antibody docking, to optimize the parameters of its scoring function. Through a quantitative evaluation of the shape of decoy distribution generated by Rosetta, we have been able to refine the parameters for each antibody-antigen complex, yielding a notable improvement in the prediction accuracy of the software for a given dataset. Furthermore, we have identified a distinct parameter set that is effective for the majority of complexes in our dataset, though not universally applicable. This study introduces a novel approach to customizing scoring functions, potentially contributing to advancements in drug discovery and deepening our understanding of the complexities inherent in antibody-antigen interactions at a molecular level.

View full abstract

Download PDF (805K)
Clarinet Plots: Alternative to Violin Plots to Display Zero-inflated Distribution of scRNA-seq Data

Makito Oku

Article type: Original Paper
Subject area: Original Paper
2024 Volume 17 Pages 48-54
Published: 2024
Released on J-STAGE: August 30, 2024

DOIhttps://doi.org/10.2197/ipsjtbio.17.48

JOURNAL FREE ACCESS

Show abstractHide abstract

Generally, scRNA-seq data contain many 0 values, and the expression of each gene shows a zero-inflated distribution. Therefore, violin plots are usually used to display distributions of scRNA-seq data because they can represent the shape of multi-modal distributions. However, when the proportion of 0 values is very large, the 0 peak becomes too large in a violin plot, and the shape of the distribution of non-zero values becomes difficult to see. To resolve this issue, in this study, clarinet plots are proposed as alternative to violin plots to display zero-inflated distribution of scRNA-seq data. In clarinet plots, each distribution is represented by a clarinet-like shape. The long axis corresponds to quantile, and the width represents the magnitude of each data value. The straight line at the end corresponds to 0 values. By using a clarinet plot, the proportion of 0 values and the distribution of nonzero values can be displayed simultaneously and effectively. Examples of application to artificial data and real data are shown.

View full abstract

Download PDF (1660K)
Detection of Dispersed Repeats in the Genomes of Bacteria from Different Phyla

Eugene Korotkov, Maria Korotkova

Article type: Original Paper
Subject area: Original Paper
2024 Volume 17 Pages 55-63
Published: 2024
Released on J-STAGE: December 27, 2024

DOIhttps://doi.org/10.2197/ipsjtbio.17.55

JOURNAL FREE ACCESS

Show abstractHide abstract

In this study, we searched for dispersed repeats in the genomes of bacteria from 42 phyla using the iterative procedure method. The results revealed that each genome contained a family of repeats with lengths from 440 to 580 bases and copy numbers from 1.0 × 10³ to 1.4 × 10⁴, depending on the species. The detected repeats occupied from 17 to 72% of the bacterial genome, and more than 90% of them were superimposed as motifs on the coding sequences. The repeats contained conserved islands interspersed with weakly similar regions. Consensus sequences calculated for all the found repeats appeared to significantly differ among the bacteria. We hypothesize that the detected repeat families may be involved in the formation of the bacterial nucleoid.

View full abstract

Download PDF (588K)
Quick Screening Mild Cognitive Impairment and Dementia using Quantitative Evaluation of Motor Control Function

Kyota Aoki, Kenji Niijima, Tsutomu Yoshioka

Article type: Original Paper
Subject area: Original Paper
2024 Volume 17 Pages 64-71
Published: 2024
Released on J-STAGE: December 27, 2024

DOIhttps://doi.org/10.2197/ipsjtbio.17.64

JOURNAL FREE ACCESS

Show abstractHide abstract

The Visual synchronization task (VST) evaluates the motor control function, generates evaluation parameters, and takes only 25 seconds to complete. Evaluation parameters of the VST were tested as predictors of mild cognitive impairment (MCI) and dementia. The mean rotation speed parameter decreased in order of control, MCI, and dementia groups. With the mean rotation speed parameters of right hand in the first period (represented as AveP3rR1), p-value of t-test between the control group and the dementia group is 8.37 × 10^-13, and AUC is 0.863. Between the control and dementia groups with 0.765 as a cut-off value of the AveP3rR1, specificity is 0.84 and sensitivity is 0.77. Using the mean rotation speed of the right hand in the first period as the evaluating parameter, the VST can be measured in less than 15 seconds, which enables an easy screening for MCI.

View full abstract

Download PDF (2846K)
AUTOEB: A Software for Systematically Evaluating Bipartitions in a Phylogenetic Tree Employing an Approximately Unbiased Test

Kohei Bamba, Ryo Harada, Yuji Inagaki

Article type: Original Paper
Subject area: Database/Software Paper
2024 Volume 17 Pages 72-82
Published: 2024
Released on J-STAGE: December 27, 2024

DOIhttps://doi.org/10.2197/ipsjtbio.17.72

JOURNAL FREE ACCESS

Show abstractHide abstract

The core of molecular phylogeny is the inference of a tree diagram representing the evolutionary relatedness among nucleotide or amino acid sequences. In addition, evaluating the credibility of “bipartitions,” each of which splits the inferred tree into two subtrees, is an indispensable part of modern phylogenetic studies. The most popular method for examining the credibility of bipartitions in a phylogenetic tree is the bootstrap. In the maximum likelihood framework, two alternative methods for the bootstrap, UFBoot2 and SH-aLRT, are available. In this study, we propose a new software “AUTOEB,” which evaluates bipartitions in a given phylogenetic tree employing an approximately unbiased (AU) test. For each bipartition, the software generates two alternative trees from a given tree by disrupting the bipartition of interest with the minimum changes in tree topology and compares them by the AU test. In the case of either or both alternative trees failing to be rejected, the software calls the particular bipartition “unresolved” and otherwise “resolved.” We phylogenetically analyzed four empirical sequence data and demonstrated that AUTOEB can provide an alternative criterion toward bipartitions that received high support values from the pre-existed methods, and help to evade potential false interpretations based on phylogenetic trees.

View full abstract

Download PDF (3194K)

Register with J-STAGE for free!