Background and Aim: Many studies associated with the combination of machine learning (ML) and pharmacometrics have appeared in recent years. ML can be used as an initial step for fast screening of covariates in population pharmacokinetic (popPK) models. The present study aimed to integrate covariates derived from different popPK models using ML. Methods: Two published popPK models of valproic acid (VPA) in Chinese epileptic patients were used, where the population parameters were influenced by some covariates. Based on the covariates and a one-compartment model that describes the pharmacokinetics of VPA, a dataset was constructed using Monte Carlo simulation, to develop an XGBoost model to estimate the steady-state concentrations (
) of VPA. We utilized SHapley Additive exPlanation (SHAP) values to interpret the prediction model, and calculated estimates of VPA exposure in four assumed scenarios involving different combinations of CYP2C19 genotypes and co-administered antiepileptic drugs. To develop an easy-to-use model in the clinic, we built a simplified model by using CYP2C19 genotypes and some noninvasive clinical parameters, and omitting several features that were infrequently measured or whose clinically available values were inaccurate, and verified it on our independent external dataset. Results: After data preprocessing, the finally generated combined dataset was divided into a derivation cohort and a validation cohort (8:2). The XGBoost model was developed in the derivation cohort and yielded excellent performance in the validation cohort with a mean absolute error of 2.4 mg/L, root-mean-squared error of 3.3 mg/L, mean relative error of 0%, and percentages within
20% of actual values of 98.85%. The SHAP analysis revealed that daily dose, time, CYP2C19*2 and/or *3 variants, albumin, body weight, single dose, and CYP2C19*1*1 genotype were the top seven confounding factors influencing the
of VPA. Under the simulated dosage regimen of 500 mg/bid, the VPA exposure in patients who had CYP2C19*2 and/or *3 variants and no carbamazepine, phenytoin, or phenobarbital treatment, was approximately 1.74-fold compared to those with CYP2C19*1/*1 genotype and co-administered carbamazepine + phenytoin + phenobarbital. The feasibility of the simplified model was fully illustrated by its performance in our external dataset. Conclusion: This study highlighted the bridging role of ML in big data and pharmacometrics, by integrating covariates derived from different popPK models.
Monte Carlo simulation; XGBoost; covariate; machine learning; population pharmacokinetic; shap; therapeutic drug monitoring; valproic acid.