CNN6mA: Interpretable neural network model based on position-specific CNN and cross-interactive network for 6mA site prediction

. 2022 Dec 28;21:644-654.

doi: 10.1016/j.csbj.2022.12.043.

eCollection 2023.


Item in Clipboard

Sho Tsukiyama et al.

Comput Struct Biotechnol J.



N6-methyladenine (6mA) plays a critical role in various epigenetic processing including DNA replication, DNA repair, silencing, transcription, and diseases such as cancer. To understand such epigenetic mechanisms, 6 mA has been detected by high-throughput technologies on a genome-wide scale at single-base resolution, together with conventional methods such as immunoprecipitation, mass spectrometry and capillary electrophoresis, but these experimental approaches are time-consuming and laborious. To complement these problems, we have developed a CNN-based 6 mA site predictor, named CNN6mA, which proposed two new architectures: a position-specific 1-D convolutional layer and a cross-interactive network. In the position-specific 1-D convolutional layer, position-specific filters with different window sizes were applied to an inquiry sequence instead of sharing the same filters over all positions in order to extract the position-specific features at different levels. The cross-interactive network explored the relationships between all the nucleotide patterns within the inquiry sequence. Consequently, CNN6mA outperformed the existing state-of-the-art models in many species and created the contribution score vector that intelligibly interpret the prediction mechanism. The source codes and web application in CNN6mA are freely accessible at https://github.com/kuratahiroyuki/CNN6mA.git and http://kurata35.bio.kyutech.ac.jp/CNN6mA/, respectively.


6mA, N6-methyladenine; AUCs, Area under the curves; BERT, Bidirectional Encoder Representations from Transformers; CNN; CNN, Convolutional neural network; DNA modification; Deep learning; Interpretable prediction; LSTM, Long short-term memory; MCC, Matthews correlation coefficient; Machine learning; N6-methyladenine; RF, Random forest; SMRT, Single-molecule real-time; SN, Sensitivity; SP, Specificity; UMAP, Uniform manifold approximation and projection; t-SNE, t-distributed stochastic neighbor embedding.

Conflict of interest statement

All authors declare that they have no conflicts of interest.

Source link

Back to top button