IPSJ Transactions on Bioinformatics
Online ISSN : 1882-6679
ISSN-L : 1882-6679
Volume 15
Displaying 1-4 of 4 articles from this issue
  • Viet Toan Tran, Hoang D. Quach, Phuong V. D. Van, Van Hoai Tran
    Article type: Original Paper
    Subject area: Original Paper
    2022 Volume 15 Pages 1-8
    Published: 2022
    Released on J-STAGE: January 14, 2022
    JOURNAL FREE ACCESS

    Without traditional cultures, metagenomics studies the microorganisms sampled from the environment. In those studies, the binning step results serve as an input for the next step of metagenomic projects such as assembly and annotation. The main challenging issue of this process is due to the lack of explicit features of metagenomic reads, especially in the case of short-read datasets. There are two approaches, namely, supervised and unsupervised learning. Unfortunately, only about 1% of microorganisms in nature is annotated. That can cause problems for supervised approaches when an under-study dataset contains unknown species. It is well-known that the main challenging issue of this process is due to the lack of explicit features of metagenomic reads, especially in the case of short-read datasets. Previous studies usually assumed that reads in a taxonomic label have similar k-mer distributions. Our new method is to use Natural Language Processing (NLP) techniques in generating feature vectors. Additionally, the paper presents a comprehensive unsupervised framework in order to apply different embeddings categorized as notable NLP techniques in topic modeling and sentence embedding. The experimental results present our proposed approach's comparative performance with other previous studies on simulated datasets, showing the feasibility of applying NLP for metagenomic binning. The program can be found at https://github.com/vandinhvyphuong/NLPBimeta.

    Download PDF (760K)
  • Takahiro Nakamura, Toshinori Endo, Naoki Osada
    Article type: Original Paper
    Subject area: Original Paper
    2022 Volume 15 Pages 9-16
    Published: 2022
    Released on J-STAGE: March 15, 2022
    JOURNAL FREE ACCESS

    PR domain-containing 9 (PRDM9) is a zinc-finger protein that binds to specific DNA motifs and induces the crossing-over between chromosomes, resulting in a high recombination rate around binding sites. Currently, the binding sites of PRDM9 are predicted with methods based on motif matching and Position-specific Weight Matrix (PWM). Meanwhile, the Convolutional Neural Network (CNN) has shown superior performance in recent studies to identify protein-binding regions in general, and it is expected to perform well in PRDM9 binding site prediction. In this study, we compared the performance of PWM and CNN for predicting PRDM9 binding sites with not only test data but also the correlation between the prediction score for a fragment and the local recombination rate to evaluate the performance without overfitting effects. Approximately 170,000 genomic DNA fragments of the human genome containing the Chromatin Immuno-Precipitation with high-throughput sequencing (ChIP-seq) peak of PRDM9 were used for constructing PWM and CNN. We found that CNN outperformed PWM in terms of area under the ROC curve and other metrics. Furthermore, the prediction scores of CNN correlated more strongly with the local recombination rate than PWM. We discuss that the superior performance of CNN would be in part due to the ability of CNN to capture the feature of surrounding sequences of actual PRDM9-binding sites.

    Download PDF (1812K)
  • Soki Marumoto, Takatomi Kubo, Makoto Tada, Kazushi Ikeda
    Article type: Original Paper
    Subject area: Original Paper
    2022 Volume 15 Pages 17-21
    Published: 2022
    Released on J-STAGE: August 09, 2022
    JOURNAL FREE ACCESS

    Fecal incontinence is a serious but popular problem for elderly people since it not only degrades their quality of physical and mental life but also increases the work of care givers. One promising tool to solve this problem is a defecation prediction system since a patient can go to toilet if he/she knows the time of excretion in advance. Our approach to develop such a system is to measure bowel sounds (BSs) using a wearable device, to predict the defecation time, and to informs the user before defecation. As a first step to the development, in this paper, it is shown that BSs include information of the defecation time by classifying the BSs before/after defecation. The classification is possible by detect the change of the power in the spectrogram of the BSs.

    Download PDF (1431K)
  • Hideki Kakeya, Yoshihisa Matsumoto
    Article type: Case Study Paper
    Subject area: Case Study Paper
    2022 Volume 15 Pages 22-29
    Published: 2022
    Released on J-STAGE: November 16, 2022
    JOURNAL FREE ACCESS

    A method to find a probability that a given bias of mutations occur naturally is proposed to test whether a newly detected virus is a product of natural evolution or a product of non-natural process such as genetic manipulation. The probability is calculated based on the neutral theory of molecular evolution and binominal distribution of non-synonymous (N) and synonymous (S) mutations. Though most of the conventional analyses, including dN/dS analysis, assume that any kinds of point mutations from a nucleotide to another nucleotide occurs with the same probability, the proposed model takes into account the bias in mutations, where the equilibrium of mutations is considered to estimate the probability of each mutation. The proposed method is applied to evaluate whether the Omicron variant strain of SARS-CoV-2, whose spike protein includes 29 N mutations and only one S mutation, can emerge through natural evolution. The result of binomial test based on the proposed model shows that the bias of N/S mutations in the Omicron spike can occur with a probability of 2.0 × 10-3 or less. Even with the conventional model where the probabilities of any kinds of mutations are all equal, the strong N/S mutation bias in the Omicron spike can occur with a probability of 3.7 × 10-3, which means that the Omicron variant is highly likely a product of non-natural process including artifact.

    Download PDF (1041K)
feedback
Top