Uncovering Cas9 PAM diversity through metagenomic mining and machine learning

Dataemia
11 Min Read



Summarize this content to 100 words: Sternberg, S. H., Redding, S., Jinek, M., Greene, E. C. & Doudna, J. A. DNA interrogation by the CRISPR RNA-guided endonuclease Cas9. Nature 507, 62–67 (2014).
Google Scholar 
Gleditzsch, D. et al. PAM identification by CRISPR-Cas effector complexes: diversified mechanisms and structures. RNA Biol. 16, 504–517 (2019).
Google Scholar 
Anders, C., Niewoehner, O., Duerst, A. & Jinek, M. Structural basis of PAM-dependent target DNA recognition by the Cas9 endonuclease. Nature 513, 569–573 (2014).
Google Scholar 
Leenay, R. T. & Beisel, C. L. Deciphering, communicating, and engineering the CRISPR PAM. J. Mol. Biol. 429, 177–191 (2017).
Google Scholar 
Collias, D. & Beisel, C. L. CRISPR technologies and the search for the PAM-free nuclease. Nat. Commun. 12, 555 (2021).
Google Scholar 
Silverstein, R. A. et al. Custom CRISPR—Cas9 PAM variants via scalable engineering and machine learning. Nature https://doi.org/10.1038/s41586-025-09021-y (2025).Walton, R. T., Christie, K. A., Whittaker, M. N. & Kleinstiver, B. P. Unconstrained genome targeting with near-PAMless engineered CRISPR-Cas9 variants. Science 368, 290–296 (2020).
Google Scholar 
Kleinstiver, B. P. et al. Engineered CRISPR-Cas9 nucleases with altered PAM specificities. Nature 523, 481–485 (2015).
Google Scholar 
Ciciani, M. et al. Automated identification of sequence-tailored Cas9 proteins using massive metagenomic data. Nat. Commun. 13, 6474 (2022).
Google Scholar 
Ruffolo, J. A. et al. Design of highly functional genome editors by modelling CRISPR–Cas sequences. Nature 645, 518–525 (2025).
Google Scholar 
Nayfach, S. et al. Engineering of CRISPR-Cas PAM recognition using deep learning of vast evolutionary data. Preprint at https://doi.org/10.1101/2025.01.06.631536 (2025).Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).
Google Scholar 
Dmitrijeva, M. et al. The mOTUs online database provides web-accessible genomic context to taxonomic profiling of microbial communities. Nucleic Acids Res. 53, D797–D805 (2025).
Google Scholar 
O’Cathail, C. et al. The European Nucleotide Archive in 2024. Nucleic Acids Res. 53, D49–D55 (2025).Russel, J., Pinilla-Redondo, R., Mayo-Muñoz, D., Shah, S. A. & Sørensen, S. J. CRISPRCasTyper: Automated Identification, Annotation, and Classification of CRISPR-Cas. Loci. CRISPR J. 3, 462–469 (2020).
Google Scholar 
Chylinski, K., Makarova, K. S., Charpentier, E. & Koonin, E. V. Classification and evolution of type II CRISPR-Cas systems. Nucleic Acids Res. 42, 6091–6105 (2014).
Google Scholar 
Koonin, E. V., Makarova, K. S., Wolf, Y. I. & Krupovic, M. Evolutionary entanglement of mobile genetic elements and host defence systems: guns for hire. Nat. Rev. Genet. 21, 119–131 (2020).
Google Scholar 
Rocha, E. P. C. & Bikard, D. Microbial defenses against mobile genetic elements and viruses: Who defends whom from what? PLOS Biol. 20, e3001514 (2022).
Google Scholar 
Martínez-Alvarez, L. & Peng, X. Redefining paradigms in the archaeal virus-host arms race. Preprint at https://doi.org/10.1101/2025.04.20.649705 (2025).Zaayman, M. & Wheatley, R. M. Fitness costs of CRISPR-Cas systems in bacteria. Microbiology 168, 10.1099/mic.0.001209 (2022).Camargo, A. P. et al. IMG/PR: a database of plasmids from genomes and metagenomes with rich annotations and metadata. Nucleic Acids Res. 52, D164–D173 (2024).
Google Scholar 
Camargo, A. P. et al. IMG/VR v4: an expanded database of uncultivated virus genomes within a framework of extensive functional, taxonomic, and ecological metadata. Nucleic Acids Res. 51, D733–D743 (2023).
Google Scholar 
Karvelis, T. et al. Rapid characterization of CRISPR-Cas9 protospacer adjacent motif sequence elements. Genome Biol. 16, 253 (2015).
Google Scholar 
Tang, L. et al. Efficient cleavage resolves PAM preferences of CRISPR-Cas in human cells. Cell Regen. 8, 44–50 (2019).
Google Scholar 
Ran, F. A. et al. In vivo genome editing using Staphylococcus aureus Cas9. Nature 520, 186–191 (2015).
Google Scholar 
Tan, Y. et al. Rationally engineered Staphylococcus aureus Cas9 nucleases with high genome-wide specificity. Proc. Natl. Acad. Sci. USA 116, 20969–20976 (2019).
Google Scholar 
Gao, S. et al. Genome editing with natural and engineered CjCas9 orthologs. Mol. Ther. J. Am. Soc. Gene Ther. 31, 1177–1187 (2023).
Google Scholar 
Kim, E. et al. In vivo genome editing with a small Cas9 orthologue derived from Campylobacter jejuni. Nat. Commun. 8, 14500 (2017).
Google Scholar 
Yamada, M. et al. Crystal structure of the minimal Cas9 from Campylobacter jejuni reveals the molecular diversity in the CRISPR-Cas9 systems. Mol. Cell 65, 1109–1121.e3 (2017).
Google Scholar 
Gasiunas, G. et al. A catalogue of biochemically diverse CRISPR-Cas9 orthologs. Nat. Commun. 11, 5512 (2020).Boratyn, G. M. et al. BLAST: a more efficient report with usability improvements. Nucleic Acids Res. 41, W29–W33 (2013).
Google Scholar 
Wang, K. et al. Structural insights into Type II-D Cas9 and its robust cleavage activity. Nat. Commun. 16, 7396 (2025).
Google Scholar 
Averina, O. A., Kuznetsova, S. A., Permyakov, O. A. & Sergiev, P. V. Current knowledge of base editing and prime editing. Mol. Biol. 58, 571–587 (2024).
Google Scholar 
Porto, E. M., Komor, A. C., Slaymaker, I. M. & Yeo, G. W. Base editing: advances and therapeutic opportunities. Nat. Rev. Drug Discov. 19, 839–859 (2020).
Google Scholar 
Mojica, F. J. M., Díez-Villaseñor, C., García-Martínez, J. & Almendros, C. Short motif sequences determine the targets of the prokaryotic CRISPR defence system. Microbiology 155, 733–740 (2009).
Google Scholar 
Qi, C. et al. PAMPHLET: PAM prediction homologous-enhancement toolkit for precise PAM prediction in CRISPR-Cas systems. J. Genet. Genom. 52, 258–268 (2025).
Google Scholar 
Hille, F. et al. The biology of CRISPR-Cas: backward and forward. Cell 172, 1239–1259 (2018).
Google Scholar 
Makarova, K. S. et al. An updated evolutionary classification of CRISPR–Cas systems. Nat. Rev. Microbiol. 13, 722–736 (2015).
Google Scholar 
Pourcel, C. et al. CRISPRCasdb a successor of CRISPRdb containing CRISPR arrays and cas genes from complete genome sequences, and tools to download and query lists of repeats and spacers. Nucleic Acids Res. https://doi.org/10.1093/nar/gkz915 (2019).
Google Scholar 
Burstein, D. et al. Major bacterial lineages are essentially devoid of CRISPR-Cas viral defence systems. Nat. Commun. 7, 10613 (2016).
Google Scholar 
Zünd, M. et al. High throughput sequencing provides exact genomic locations of inducible prophages and accurate phage-to-host ratios in gut microbial strains. Microbiome 9, 77 (2021).
Google Scholar 
Camargo, A. P. et al. Identification of mobile genetic elements with geNomad. Nat. Biotechnol. 42, 1303–1312 (2024).
Google Scholar 
Zielezinski, A. et al. Ultrafast and accurate sequence alignment and clustering of viral genomes. Nat. Methods 22, 1191–1194 (2025).
Google Scholar 
Edgar, R. C. PILER-CR: Fast and accurate identification of CRISPR repeats. BMC Bioinform. 8, 18 (2007).
Google Scholar 
Johnson, L. S., Eddy, S. R. & Portugaly, E. Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinform. 11, 431 (2010).
Google Scholar 
Makarova, K. S. et al. Evolutionary classification of CRISPR–Cas systems: a burst of class 2 and derived variants. Nat. Rev. Microbiol. 18, 67–83 (2020).
Google Scholar 
Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).
Google Scholar 
Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006).
Google Scholar 
Zetsche, B. et al. Cpf1 Is a Single RNA-Guided Endonuclease of a Class 2 CRISPR-Cas System. Cell 163, 759–771 (2015).
Google Scholar 
Schubert, M. S. et al. Optimized design parameters for CRISPR Cas9 and Cas12a homology-directed repair. Sci. Rep. 11, 19482 (2021).
Google Scholar 
Cornish-Bowden, A. Nomenclature for incompletely specified bases in nucleic acid sequences: rcommendations 1984. Nucleic Acids Res. 13, 3021–3030 (1985).
Google Scholar 
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
Google Scholar 
Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol. Biol. Evol. 26, 1641–1650 (2009).
Google Scholar 
Letunic, I. & Bork, P. Interactive Tree of Life (iTOL) v6: recent updates to the phylogenetic tree display and annotation tool. Nucleic Acids Res. 52, W78–W82 (2024).
Google Scholar 
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), 4171–4186 (Minneapolis, Minnesota. Association for Computational Linguistics, 2019).DeVries, T. & Taylor, G. W. Learning confidence for out-of-distribution detection in neural networks. Preprint at https://doi.org/10.48550/arXiv.1802.04865 (2018).Guo, E., Draper, D. & Iorio, M. D. Annealing double-head: an architecture for online calibration of deep neural networks. Preprint at https://doi.org/10.48550/arXiv.2212.13621 (2023).FANG, T. et al. Uncovering Cas9 PAM diversity through metagenomic mining and machine learning. Zenodo https://doi.org/10.5281/ZENODO.17855072 (2025).TaoDFang, Feer, L. & Bogensperger, L. Schwank-Lab/CRISPR-PAMdb: V1.0.0. Zenodo https://doi.org/10.5281/ZENODO.17855426 (2025).



Source link

Share This Article
Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

error: Content is protected !!