A geometric foundation model for enzyme retrieval with evolutionary insights

Dataemia
12 Min Read



Summarize this content to 100 words: Breaker, R. R. DNA enzymes. Nat. Biotechnol. 15, 427–431 (1997).Article 
CAS 
PubMed 

Google Scholar 
Knowles, J. R. Enzyme catalysis: not different, just better. Nature 350, 121–124 (1991).Article 
CAS 
PubMed 

Google Scholar 
Khosla, C. & Harbury, P. B. Modular enzymes. Nature 409, 247–252 (2001).Article 
CAS 
PubMed 

Google Scholar 
Chen, Y. & Li, F. Metabolomes evolve faster than metabolic network structures. Proc. Natl Acad. Sci. USA 121, e2400519121 (2024).Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 
Kraut, J. How do enzymes work? Science 242, 533–540 (1988).Article 
CAS 
PubMed 

Google Scholar 
Murakami, Y., Kikuchi, J.-i, Hisaeda, Y. & Hayashida, O. Artificial enzymes. Chem. Rev. 96, 721–758 (1996).Article 
CAS 
PubMed 

Google Scholar 
Klibanov, A. M. Improving enzymes by using them in organic solvents. Nature 409, 241–246 (2001).Article 
CAS 
PubMed 

Google Scholar 
Copeland, R. A. Enzymes: A Practical Introduction to Structure, Mechanism, and Data Aanalysis (Wiley, 2023).Nielsen, J. E. & McCammon, J. A. Calculating pKa values in enzyme active sites. Protein Sci. 12, 1894–1901 (2003).Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 
Eisenmesser, E. Z. et al. Intrinsic dynamics of an enzyme underlies catalysis. Nature 438, 117–121 (2005).Article 
CAS 
PubMed 

Google Scholar 
Noraini, M., Ong, H. C., Badrul, M. J. & Chong, W. A review on potential enzymatic reaction for biofuel production from algae. Renew. Sustain. Energy Rev. 39, 24–34 (2014).Article 
CAS 

Google Scholar 
Ding, K. et al. Machine learning-guided co-optimization of fitness and diversity facilitates combinatorial library design in enzyme engineering. Nat. Commun. 15, 6392 (2024).Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 
Bateman, A. et al. UniProt: the Universal protein knowledgebase in 2025. Nucleic Acids Res. 53, D609–D617 (2025).Article 

Google Scholar 
Bansal, P. et al. Rhea, the reaction knowledgebase in 2022. Nucleic Acids Res. 50, D693–D700 (2022).Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 
Caspi, R. et al. The MetaCyc database of metabolic pathways and enzymes—a 2019 update. Nucleic Acids Res. 48, D445–D453 (2020).Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 
Rudroff, F. et al. Opportunities and challenges for combining chemo- and biocatalysis. Nat. Catal. 1, 12–22 (2018).Article 

Google Scholar 
Li, W.-L. & Head-Gordon, T. Catalytic principles from natural enzymes and translational design strategies for synthetic catalysts. ACS Cent. Sci. 7, 72–80 (2020).Article 
PubMed 
PubMed Central 

Google Scholar 
Vogt, C. & Weckhuysen, B. M. The concept of active site in heterogeneous catalysis. Nat. Rev. Chem. 6, 89–111 (2022).Article 
PubMed 

Google Scholar 
Lonsdale, R., Harvey, J. N. & Mulholland, A. J. A practical guide to modelling enzyme-catalysed reactions. Chem. Soc. Rev. 41, 3025–3038 (2012).Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 
Hay, S. & Scrutton, N. S. Good vibrations in enzyme-catalysed reactions. Nat. Chem. 4, 161–168 (2012).Article 
CAS 
PubMed 

Google Scholar 
Li, F. et al. Deep learning-based kcat prediction enables improved enzyme-constrained model reconstruction. Nat. Catal. 5, 662–672 (2022).Article 
CAS 

Google Scholar 
Martín, A. J., Mitchell, S., Mondelli, C., Jaydev, S. & Pérez-Ramírez, J. Unifying views on catalyst deactivation. Nat. Catal. 5, 854–866 (2022).Article 

Google Scholar 
Goldman, S., Das, R., Yang, K. K. & Coley, C. W. Machine learning modeling of family wide enzyme-substrate specificity screens. PLoS Comput. Biol. 18, e1009853 (2022).Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 
Kroll, A., Ranjan, S., Engqvist, M. K. & Lercher, M. J. A general model to predict small molecule substrates of enzymes based on machine and deep learning. Nat. Commun. 14, 2787 (2023).Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 
González-Granda, S., Albarrán-Velo, J., Lavandera, I. & Gotor-Fernández, V. Expanding the synthetic toolbox through metal–enzyme cascade reactions. Chem. Rev. 123, 5297–5346 (2023).Article 
PubMed 

Google Scholar 
Hua, C. et al. Reactzyme: a benchmark for enzyme–reaction prediction. Adv. Neural Inf. Process. Syst. 37, 26415–26442 (2024).Ryu, J. Y., Kim, H. U. & Lee, S. Y. Deep learning enables high-quality and high-throughput prediction of enzyme commission numbers. Proc. Natl Acad. Sci. USA 116, 13996–14001 (2019).Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 
Sanderson, T., Bileschi, M. L., Belanger, D. & Colwell, L. J. ProteInfer, deep neural networks for protein functional inference. eLife 12, e80942 (2023).Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 
Li, Y. et al. DEEPre: sequence-based enzyme EC number prediction by deep learning. Bioinformatics 34, 760–769 (2018).Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 
Dalkiran, A. et al. ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature. BMC Bioinformatics 19, 1–13 (2018).Article 

Google Scholar 
Gligorijević, V. et al. Structure-based protein function prediction using graph convolutional networks. Nat. Commun. 12, 3168 (2021).Article 
PubMed 
PubMed Central 

Google Scholar 
Yu, T. et al. Enzyme function prediction using contrastive learning. Science 379, 1358–1363 (2023).Article 
CAS 
PubMed 

Google Scholar 
Xing, H. et al. High-throughput prediction of enzyme promiscuity based on substrate–product pairs. Brief. Bioinform. 25, bbae089 (2024).Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 
Mikhael, P. G., Chinn, I. & Barzilay, R. CLIPZyme: reaction-conditioned virtual screening of enzymes. Int. Conf. Mach. Learn. 235, 35647–35663 (2024).
Google Scholar 
Yang, J. et al. CARE: a benchmark suite for the classification and retrieval of enzymes. Adv. Neural Inf. Process. Syst. 37, 3094–3121 (2024).
Google Scholar 
Rappoport, D. & Jinich, A. Enzyme substrate prediction from three-dimensional feature representations using space-filling curves. J. Chem. Inf. Model. 63, 1637–1648 (2023).Article 
CAS 
PubMed 

Google Scholar 
Salas-Nuñez, L. F. et al. Machine learning to predict enzyme–substrate interactions in elucidation of synthesis pathways: a review. Metabolites 14, 154 (2024).Article 
PubMed 
PubMed Central 

Google Scholar 
Li, F., Chen, Y., Anton, M. & Nielsen, J. GotEnzymes: an extensive database of enzyme parameter predictions. Nucleic Acids Res. 51, D583–D586 (2023).Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 
Hua, C. et al. EnzymeFlow: generating reaction-specific enzyme catalytic pockets through flow matching and co-evolutionary dynamics. Preprint at https://arxiv.org/abs/2410.00327 (2024).Carbonell, P. et al. Selenzyme: enzyme selection tool for pathway design. Bioinformatics 34, 2153–2154 (2018).Article 
PubMed 
PubMed Central 

Google Scholar 
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 
Tian, W. & Skolnick, J. How well is enzyme function conserved as a function of pairwise sequence identity? J. Mol. Biol. 333, 863–882 (2003).Article 
CAS 
PubMed 

Google Scholar 
Ma, F. et al. Sequence homolog-based molecular engineering for shifting the enzymatic pH optimum. Synth. Syst. Biotechnol. 1, 195–206 (2016).Article 
PubMed 
PubMed Central 

Google Scholar 
Wang, J., Wu, Y., Sun, X., Yuan, Q. & Yan, Y. De novo biosynthesis of glutarate via α-keto acid carbon chain extension and decarboxylation pathway in Escherichia coli. ACS Synth. Biol. 6, 1922–1930 (2017).Article 
PubMed 

Google Scholar 
Reynolds, E. et al. Elucidation of gene clusters underlying withanolide biosynthesis in ashwagandha through yeast metabolic engineering. Preprint at bioRxiv https://doi.org/10.1101/2024.12.24.630284 (2024).Hekkelman, M. L., de Vries, I., Joosten, R. P. & Perrakis, A. AlphaFill: enriching AlphaFold models with ligands and cofactors. Nat. Methods 20, 205–213 (2023).Article 
CAS 
PubMed 

Google Scholar 
ESM Team. ESM Cambrian: revealing the mysteries of proteins with unsupervised learning. EvolutionaryScale https://evolutionaryscale.ai/blog/esm-cambrian (2024).Probst, D., Schwaller, P. & Reymond, J.-L. Reaction classification and yield prediction using the differential reaction fingerprint DRFP. Digit. Discov. 1, 91–97 (2022).Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 
Kingma, D. P. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2014).Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).Article 
CAS 
PubMed 

Google Scholar 
Song, Y. et al. Accurately predicting enzyme functions through geometric graph learning on ESMFold-predicted structures. Nat. Commun. 15, 8180 (2024).Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 
Järvelin, K. & Kekäläinen, J. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20, 422–446 (2002).Article 

Google Scholar 
Chen, H., Lyne, P. D., Giordanetto, F., Lovell, T. & Li, J. On evaluating molecular-docking methods for pose prediction and enrichment factors. J. Chem. Inf. Model. 46, 401–415 (2006).Article 
CAS 
PubMed 

Google Scholar 
Van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
Google Scholar 
Ribeiro, A. J. M. et al. Mechanism and Catalytic Site Atlas (M-CSA): a database of enzyme reaction mechanisms and active sites. Nucleic Acids Res. 46, D618–D623 (2018).Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 
Zhang, Y. et al. P450Rdb: a manually curated database of reactions catalyzed by cytochrome P450 enzymes. J. Adv. Res. 63, 35–42 (2024).Article 
CAS 
PubMed 

Google Scholar 
Samusevich, R. et al. Discovery and characterization of terpene synthases powered by machine learning. Preprint at https://doi.org/10.1101/2024.01.29.577750 (2024).Huang, H. et al. Panoramic view of a superfamily of phosphatases through substrate profiling. Proc. Natl Acad. Sci. USA 112, E1974–E1983 (2015).Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 
Zheng, S. et al. Deep learning driven biosynthetic pathways navigation for natural products with BioNavi-NP. Nat. Commun. 13, 3342 (2022).Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 
Zeng, T., Jin, Z., Zheng, S., Yu, T. & Wu, R. Developing BioNavi for hybrid retrosynthesis planning. JACS Au 4, 2492–2502 (2024).Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 
van Beusekom, B. et al. Homology-based hydrogen bond information improves crystallographic structures in the PDB. Protein Sci. 27, 798–808 (2018).Article 
PubMed 

Google Scholar 
Krivák, R. & Hoksza, D. P2Rank: machine learning based tool for rapid and accurate prediction of ligand binding sites from protein structure. J. Cheminform. 10, 39 (2018).Article 
PubMed 
PubMed Central 

Google Scholar 
van Kempen, M. et al. Fast and accurate protein structure search with Foldseek. Nat. Biotechnol. 42, 243–246 (2024).Article 
CAS 
PubMed 

Google Scholar 
Jing, B. et al. Learning from protein structure with geometric vector perceptrons. Int. Conf. Learn. Represent. (2021).Schütt, K. et al. Schnet: a continuous-filter convolutional neural network for modeling quantum interactions. Adv. Neural Inf. Process. Syst. 30 (2017).Vaswani, A. Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017).



Source link

Share This Article
Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

error: Content is protected !!