IPSJ Transactions on Bioinformatics
Online ISSN : 1882-6679
ISSN-L : 1882-6679
Current issue
Displaying 1-7 of 7 articles from this issue
 
  • Keigo Masuda, Yoshiaki Sota, Hideo Matsuda
    Article type: Original Paper
    Subject area: Original Paper
    2024 Volume 17 Pages 1-9
    Published: 2024
    Released on J-STAGE: February 22, 2024
    JOURNAL FREE ACCESS

    Fusion genes are important targets and biomarkers for cancer therapy. Methods of accurately detecting fusion genes are needed in clinical practice. RNA-seq is widely used to detect active fusion genes. Long-read RNA-seq can sequence the full length of mRNA, and long-read RNA-seq is expected to detect fusion genes that cannot be detected by short-read RNA-seq. However, long-read RNA-seq has high basecalling error rates, and gap sequences may occur near the breakpoints of long reads that are not aligned to the genome. When gap sequences occur, it is impossible to identify the correct fusion gene or breakpoint using existing methods. To address these challenges in fusion gene detection, we introduce a novel algorithm, FUGAREC (fusion detection with gap re-alignment and breakpoint clustering). FUGAREC uniquely combines gap sequence re-alignment with breakpoint clustering. This approach not only enhances the detection of previously undetectable fusion genes but also significantly reduces false positives. We demonstrate that FUGAREC has high fusion gene detection performance on both simulated data and sequenced data of a breast cancer cell line.

    Download PDF (545K)
  • Sangeetha Ratnayake, Axel Martinelli, Toshinori Endo, Naoki Osada
    Article type: Original Paper
    Subject area: Original Paper
    2024 Volume 17 Pages 10-17
    Published: 2024
    Released on J-STAGE: February 22, 2024
    JOURNAL FREE ACCESS

    The advent of antibody therapy has brought about a change in the treatment of diseases. The efficacy of antibody modeling relies on the intricate atomic interactions between antibodies and antigens. Traditional methods for determining antibody structures, such as X-ray crystallography, are costly and time-consuming. Computational docking offers a faster and more cost-effective approach to obtaining complex antibody and antigen complexes even in challenging scenarios. Rosetta, a widely employed software for protein structure modeling, incorporates a scoring function specifically tailored for modeling antibody-antigen interactions. The unique characteristics of the antibody-antigen interface can result in inaccurate predictions. Therefore, it is essential to understand the existing scoring function and the behavior of the antibody-antigen interface. In this study, we evaluated specific parameters within Rosetta-derived scoring functions, with a particular focus on the energy landscape of the structures they generated. We found that performance in antibody-antigen docking simulations could be enhanced by omitting parameters related to solvation. Also, we delved into the physico-chemical properties of antibody-antigen interfaces, paying special attention to the complementarity-determining regions and epitopes. Our exploration helped identify certain parameters that significantly influence docking simulation performance. These insights pave the way for the creation of more accurate scoring functions tailored for specific antibody-antigen interactions.

    Download PDF (412K)
  • Seiryo Watanabe, Hironori Shigeta, Shigeto Seno, Hideo Matsuda
    Article type: Original Paper
    Subject area: Original Paper
    2024 Volume 17 Pages 18-26
    Published: 2024
    Released on J-STAGE: April 30, 2024
    JOURNAL FREE ACCESS

    This study explores the comparative accuracy of two state-of-the-art algorithms, YOLOv3 and faster region-based convolutional neural network (R-CNN), in detecting endoscopy artifacts using the EAD2019 dataset. YOLOv3, primarily used for real-time tasks, and Faster R-CNN, which employs a two-step object detection process, exhibit variable performance based on the object characteristics. The analysis performed in this study focuses on identifying the objects or classes where each algorithm performs better. We conduct experiments to support our findings. We introduce a novel metric that quantifies the difference in average pixel intensities inside and outside the bounding boxes of detected objects. This metric forms the basis of a proposed ensemble method, allowing the method to effectively utilize either YOLOv3 or Faster R-CNN, depending on the characteristics of each class. The proposed method demonstrates an improved average precision score compared to using either algorithm separately. This research provides valuable insights into object detection in endoscopy, potentially enhancing artifact detection accuracy in medical imaging.

    Download PDF (4378K)
  • Hongyi Shen, Yutaka Saito
    Article type: Original Paper
    Subject area: Original Paper
    2024 Volume 17 Pages 27-32
    Published: 2024
    Released on J-STAGE: April 30, 2024
    JOURNAL FREE ACCESS

    Protein-compound interaction prediction is an important problem in drug discovery. Numerous machine learning methods have been proposed using protein sequences and compound structures as features. Several methods have used biological network information as additional features including protein-protein interactions and compound bioactivities. However, previous studies have only used network data from mammals such as human and mouse. Here we develop a new method for protein-compound interaction prediction that uses features learned from the relationships between microorganisms and secondary metabolites in nature (microbial chemical communication network; MCCN). We used node2vec representation learning to extract compound features from the MCCN, and deep canonical correlation analysis (CCA) to obtain the features for compounds not included in the MCCN. By incorporating these MCCN-derived features into an existing protein-compound interaction prediction method, we showed that prediction performance was improved in several benchmark experiments. We also discussed how to improve our method by incorporating microbiome co-occurrence information into the MCCN.

    Download PDF (2808K)
  • Yuki Shimojo, Kazuki Suehara, Tatsumi Hirata, Yukako Tohsato
    Article type: Original Paper
    Subject area: Original Paper
    2024 Volume 17 Pages 33-39
    Published: 2024
    Released on J-STAGE: April 30, 2024
    JOURNAL FREE ACCESS

    Images of mouse brain slices, obtained under slightly different experimental conditions, are available in 84 datasets in the NeuroGT database (https://ssbd.riken.jp/neurogt/). Our goal was to obtain semantic segmentation results for eight brain anatomical regions. However, out of 84 datasets, only one dataset had true labels that could be used to train a convolutional neural network (CNN), and it was incomplete (131 out of 162 images). A segmentation model trained with the labeled images was less accurate on other images obtained under different experimental conditions because of differences of the image properties. We therefore tried Unsupervised Domain Adaptation (UDA), wherein the parameters of the CNN trained on the labeled images (source) were transferred to the unlabeled images (target). We used the positional information of the sample slices associated with each image to propose a novel loss function that approximated the class occurrence probabilities of segmentation results obtained from source and target images of brain samples at similar sliced locations, and we introduced it into the UDA. The proposed UDA method achieved an mIoU of 78.34%, which was 8% more accurate than the previous UDA methods such as Contrastive Learning and Self-Training (CLST) and Maximum Classifier Discrepancy (MCD). We demonstrated experimentally that the proposed method was useful for segmenting biomedical images with a small amount of incomplete training data.

    Download PDF (4956K)
  • Sangeetha Ratnayake, Axel Martinelli, Toshinori Endo, Naoki Osada
    Article type: Original Paper
    Subject area: Original Paper
    2024 Volume 17 Pages 40-47
    Published: 2024
    Released on J-STAGE: April 30, 2024
    JOURNAL FREE ACCESS

    Molecular docking simulations utilizing scoring functions are pivotal for assessing the stability of complex formations. The unique biochemical characteristics of antibody-antigen interfaces, however, present challenges in applying general parameter sets of scoring functions to these molecules, necessitating the customization of the scoring function to enhance prediction accuracy for structural configurations and binding affinities. In response to this, we have developed models within the Rosetta software framework, widely recognized for its utility in predicting antigen-antibody docking, to optimize the parameters of its scoring function. Through a quantitative evaluation of the shape of decoy distribution generated by Rosetta, we have been able to refine the parameters for each antibody-antigen complex, yielding a notable improvement in the prediction accuracy of the software for a given dataset. Furthermore, we have identified a distinct parameter set that is effective for the majority of complexes in our dataset, though not universally applicable. This study introduces a novel approach to customizing scoring functions, potentially contributing to advancements in drug discovery and deepening our understanding of the complexities inherent in antibody-antigen interactions at a molecular level.

    Download PDF (805K)
  • Makito Oku
    Article type: Original Paper
    Subject area: Original Paper
    2024 Volume 17 Pages 48-54
    Published: 2024
    Released on J-STAGE: August 30, 2024
    JOURNAL FREE ACCESS

    Generally, scRNA-seq data contain many 0 values, and the expression of each gene shows a zero-inflated distribution. Therefore, violin plots are usually used to display distributions of scRNA-seq data because they can represent the shape of multi-modal distributions. However, when the proportion of 0 values is very large, the 0 peak becomes too large in a violin plot, and the shape of the distribution of non-zero values becomes difficult to see. To resolve this issue, in this study, clarinet plots are proposed as alternative to violin plots to display zero-inflated distribution of scRNA-seq data. In clarinet plots, each distribution is represented by a clarinet-like shape. The long axis corresponds to quantile, and the width represents the magnitude of each data value. The straight line at the end corresponds to 0 values. By using a clarinet plot, the proportion of 0 values and the distribution of nonzero values can be displayed simultaneously and effectively. Examples of application to artificial data and real data are shown.

    Download PDF (1660K)
feedback
Top