PIGNet: a physics-informed deep learning model toward generalized drug-target interaction predictions

. 2022 Feb 7;13(13):3661-3673.

doi: 10.1039/d1sc06946b.

eCollection 2022 Mar 30.


Item in Clipboard

Seokhyun Moon et al.

Chem Sci.



Recently, deep neural network (DNN)-based drug-target interaction (DTI) models were highlighted for their high accuracy with affordable computational costs. Yet, the models’ insufficient generalization remains a challenging problem in the practice of in silico drug discovery. We propose two key strategies to enhance generalization in the DTI model. The first is to predict the atom-atom pairwise interactions via physics-informed equations parameterized with neural networks and provides the total binding affinity of a protein-ligand complex as their sum. We further improved the model generalization by augmenting a broader range of binding poses and ligands to training data. We validated our model, PIGNet, in the comparative assessment of scoring functions (CASF) 2016, demonstrating the outperforming docking and screening powers than previous methods. Our physics-informing strategy also enables the interpretation of predicted affinities by visualizing the contribution of ligand substructures, providing insights for further ligand optimization.

Conflict of interest statement

There are no conflicts to declare.


Fig. 1. Our model architecture. A protein–ligand complex is represented in a graph and adjacency matrices are assigned from the binding structure of the complex. Each node feature is updated through neural networks to carry the information of covalent bonds and intermolecular interactions. Given the distance and final node features of each atom pair, four energy components are calculated from the physics-informed parameterized equations. The total binding affinity is obtained as a sum of pairwise binding affinities, which is a sum of the four energy components divided by an entropy term.

Fig. 2

Fig. 2. The training scheme of PIGNet. We use three types of data in model training – true binding complex, true binder ligand–protein pair in a computer-generated binding pose, and non-binding decoy complex. PIGNet predicts binding free energy for each input. For a true binding complex, the model learns to predict its true binding energy. The model also learns to predict the energy of a computer-generated binding pose complex or a non-binding decoy complex in higher value than the true binding energy and threshold energy, respectively. Finally, PIGNet learns the proper correlation of ligand atom position and binding affinity by minimizing the derivative loss.

Fig. 3

Fig. 3. Interpretation of the predicted outcomes. (a) Substructural analysis of ligands for two target proteins. Protein-tyrosine phosphatase non-receptor type 1 (PTPN1) and platelet activating factor acetylhydrolase (PAF-AH). The blue and red circles indicate common and different substructures, respectively, and the predicted energy contribution (unit: kcal mol−1) of each substructure is annotated. The inhibitory constant, Ki, indicates how potent the ligand binds to the target protein. (b) A distance–energy plot of carbon–carbon pairwise van der Waals (vdW) energy components in the test set. The red solid line illustrates the original distance-energy relation without any deviation induced by learnable parameters. The closer the color of a data point to yellow, the larger the number of corresponding carbon–carbon pairs. (c) The average value of the corrected sum of vdW radii, , corresponding to different carbon–carbon pair types. Csp2–Csp2, Csp2–Csp3, and Csp3–Csp3 pairs are compared. The results include 95% confidence intervals.

Fig. 4

Fig. 4. Plot of the average Pearson’s correlation coefficients, R, of the 4-fold PIGNet model, with or without the uncertainty estimator, on the datasets classified according to the total uncertainty. PIGNet with the uncertainty estimator – low: the lowest third, random: the randomly selected one third, high: the highest third of the uncertainty distribution. PIGNet without Monte Carlo dropout – baseline: the scores of a single PIGNet model shown in the Table 1. The lower the uncertainty, the more probable the model would have correctly predicted the result. Error bars represent 95% confidence intervals. PIGNet was tested at the 2  300th training epoch with and without Monte Carlo dropout.


    1. Mamoshina P. Vieira A. Putin E. Zhavoronkov A. Mol. Pharmaceutics. 2016;13:1445–1454. doi: 10.1021/acs.molpharmaceut.5b00982.



    1. Cao C. Liu F. Tan H. Song D. Shu W. Li W. Zhou Y. Bo X. Xie Z. Genomics, Proteomics Bioinf. 2018;16:17–32. doi: 10.1016/j.gpb.2017.07.003.




    1. Zemouri R. Zerhouni N. Racoceanu D. Appl. Sci. 2019;9:1526. doi: 10.3390/app9081526.


    1. Wainberg M. Merico D. Delong A. Frey B. J. Nat. Biotechnol. 2018;36:829–838. doi: 10.1038/nbt.4233.



    1. Greener J. G. Kandathil S. M. Moffat L. Jones D. T. Nat. Rev. Mol. Cell Biol. 2021:1–16.


Source link

Back to top button