IPSJ Transactions on Bioinformatics
Online ISSN : 1882-6679
ISSN-L : 1882-6679
Predicting PRDM9 Binding Sites by a Convolutional Neural Network and Verification Using Genetic Recombination Map
Takahiro NakamuraToshinori EndoNaoki Osada
Author information
JOURNAL FREE ACCESS

2022 Volume 15 Pages 9-16

Details
Abstract

PR domain-containing 9 (PRDM9) is a zinc-finger protein that binds to specific DNA motifs and induces the crossing-over between chromosomes, resulting in a high recombination rate around binding sites. Currently, the binding sites of PRDM9 are predicted with methods based on motif matching and Position-specific Weight Matrix (PWM). Meanwhile, the Convolutional Neural Network (CNN) has shown superior performance in recent studies to identify protein-binding regions in general, and it is expected to perform well in PRDM9 binding site prediction. In this study, we compared the performance of PWM and CNN for predicting PRDM9 binding sites with not only test data but also the correlation between the prediction score for a fragment and the local recombination rate to evaluate the performance without overfitting effects. Approximately 170,000 genomic DNA fragments of the human genome containing the Chromatin Immuno-Precipitation with high-throughput sequencing (ChIP-seq) peak of PRDM9 were used for constructing PWM and CNN. We found that CNN outperformed PWM in terms of area under the ROC curve and other metrics. Furthermore, the prediction scores of CNN correlated more strongly with the local recombination rate than PWM. We discuss that the superior performance of CNN would be in part due to the ability of CNN to capture the feature of surrounding sequences of actual PRDM9-binding sites.

Content from these authors
© 2022 by the Information Processing Society of Japan
Previous article Next article
feedback
Top