Treffer: Overview and Practical Recommendations on Using Shapley Values for Identifying Predictive Biomarkers via CATE Modeling.
P. Gutierrez and J. Y. Gérardy, “Causal Inference and Uplift Modelling: A Review of the Literature,” in Proceedings of Machine Learning Research, ed. C. Hardgrove, L. Dorard, K. Thompson, et al., (PMLR, 2017), 1–13, http://proceedings.mlr.press/v67/gutierrez17a/gutierrez17a.pdf.
E. H. Kennedy, “Towards Optimal Doubly Robust Estimation of Heterogeneous Causal Effects,” Electronic Journal of Statistics 17, no. 2 (2023): 3008–3049.
M. C. Knaus, M. Lechner, and A. Strittmatter, “Machine Learning Estimation of Heterogeneous Causal Effects: Empirical Monte Carlo Evidence,” Econometrics Journal 24, no. 1 (2021): 134–161, https://doi.org/10.1093/ectj/utaa014.
Z. Zhang, H. Seibold, M. V. Vettore, W. J. Song, and V. François, “Subgroup Identification in Clinical Trials: An Overview of Available Methods and Their Implementations With R,” Annals of Translational Medicine 6, no. 7 (2018): 122.
I. Lipkovich, D. Svensson, B. Ratitch, and A. Dmitrienko, “Modern Approaches for Evaluating Treatment Effect Heterogeneity From Clinical Trials and Observational Data,” Statistics in Medicine 43, no. 22 (2024): 4388–4436, https://doi.org/10.1002/sim.10167.
S. Chen, L. Tian, T. Cai, and M. Yu, “A General Statistical Framework for Subgroup Identification and Comparative Treatment Scoring,” Biometrics 73 (2017): 1199–1209, https://doi.org/10.1111/biom.12676.
D. Jacob, “Cross‐Fitting and Averaging for Machine Learning Estimation of Heterogeneous Treatment Effects,” (2020). arXiv:2007.02852v2.
J. D. Huling and M. Yu, “Subgroup Identification Using the Personalized Package,” Journal of Statistical Software 98, no. 5 (2021): 1–60.
S. Athey and S. Wager, “Estimating Treatment Effects With Causal Forests: An Application,” Observational Studies 5, no. 2 (2016): 37–51.
S. Powers, J. Qian, K. Jung, et al., “Some Methods for Heterogeneous Treatment Effect Estimation in High Dimensions,” Statistics in Medicine 37, no. 11 (2018): 1767–1787.
P. Hahn, J. Murray, and C. Carvalho, CRAN Brf Package: Causal Inference for a Binary Treatment and Continuous Outcome Using Bayesian Causal Forests (CRAN, 2022), https://CRAN.R‐project.org/package=bcf.
H. Xin, L. Hesen, G. Yihua, and C. S. Ivan, “Predictive Biomarker Identification for Biopharmaceutical Development,” Statistics in Biopharmaceutical Research 13, no. 2 (2021): 239–247, https://doi.org/10.1080/19466315.2020.1819404.
M. Gottlow, D. Svensson, I. Lipkovich, et al., “Application of Structured Statistical Analyses to Identify a Biomarker Predictive of Enhanced Tralokinumab Efficacy in Phase III Clinical Trials for Severe, Uncontrolled Asthma,” BMC Pulmonary Medicine 19 (2019): 1–17, https://doi.org/10.1186/s12890‐019‐0889‐4.
W. Y. Loh, L. Cao, and P. Zhou, “Subgroup Identification for Precision Medicine: A Comparative Review of 13 Methods,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 9, no. 5 (2019): 1–21, https://doi.org/10.1002/widm.1326.
A. Curth, D. Svensson, J. Weatherall, and S. v. dM, “Really Doing Great at Estimating CATE? A Critical Look at ML Benchmarking Practices in Treatment Effect Estimation,” in 35th Conference on Neural Information Processing Systems (NeurIPS) (Track on Datasets and Benchmarks, 2021).
K. Sechidis, S. Sun, Y. Chen, et al., “WATCH: A Workflow to Assess Treatment Effect Heterogeneity in Drug Development for Clinical Trial Sponsors,” Pharmaceutical Statistics 24, no. 2 (2025): e2463, https://doi.org/10.1002/pst.2463.
K. Sechidis, P. Metcalfe, D. Svensson, J. Weatherall, and G. Brown, “Distinguishing Prognostic and Predictive Biomarkers: An Information Theoretic Approach,” Bioinformatics 34, no. 23 (2018): 3365–3376, https://doi.org/10.1093/bioinformatics/bty515.
T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, 2nd ed. (Springer New York, NY, 2009).
S. Lundberg and S. I. Lee, “A Unified Approach to Interpreting Model Predictions,” Advances in Neural Information Processing Systems 30 (2017): 4768–4777.
A. V. Ponce‐Bobadilla, V. Schmitt, C. S. Maier, S. Mensing, and S. Stodtmann, “Practical Guide to SHAP Analysis: Explaining Supervised Machine Learning Model Predictions in Drug Development,” Clinical and Translational Science 17 (2024): e70056.
A. Wojtuch, R. Jankowski, and S. Podlewska, “How Can SHAP Values Help to Shape Metabolic Stability of Chemical Compounds?,” Journal of Cheminformatics 13 (2021): 1–20, https://doi.org/10.1186/s13321‐021‐00542‐y.
V. Chernozhukov, C. Hansen, N. Kallus, M. Spindler, and V. Syrgkanis, “Applied Causal Inference Powered by ML and AI,” (2024), https://arxiv.org/abs/2403.02467.
E. Mosca, F. Szigeti, S. Tragianni, D. Gallagher, and G. Groh, “SHAP‐Based Explanation Methods: A Review for NLP Interpretability,” in International Committee on Computational Linguistics (ACL Anthology, 2022), 4593–4603.
T. Heskes, E. Sijben, I. G. Bucur, and T. Claassen, “Causal Shapley Values: Exploiting Causal Knowledge to Explain Individual Predictions of Complex Models,” 2020. arXiv Preprint arXiv:2011.01625.
O. Hines, K. Diaz‐Ordaz, and S. Vansteelandt, “Variable Importance Measures for Heterogeneous Causal Effects,” 2022. arXiv Preprint arXiv:2204.06030.
D. B. Rubin, “Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies,” Journal of Educational Psychology 66, no. 5 (1974): 688–701.
V. Chernozhukov, M. Demirer, E. Duflo, and I. Fernandez‐Val, “Generic Machine Learning Inference on Heterogeneous Treatment Effects in Randomized Experiments, With an Application to Immunization in India,” in NBER Working Paper (National Bureau of Economic Research (NBER), 2018) No. 24678.
I. Lipkovich, A. Dmitrienko, and R. B. D. Sr, “Tutorial in Biostatistics: Data‐Driven Subgroup Identification and Analysis in Clinical Trials,” Statistics in Medicine 36 (2017): 136–196, https://doi.org/10.1002/sim.7064.
J. C. Foster, J. M. Taylor, and S. J. Ruberg, “Subgroup Identification From Randomized Clinical Trial Data,” Statistics in Medicine 30, no. 24 (2011): 2867–2880.
S. R. Künzel, J. S. Sekhona, P. J. Bickela, and B. Yua, “Metalearners for Estimating Heterogeneous Treatment Effects using Machine Learning,” Proceedings of the National Academy of Sciences (PNAS) 116, no. 10 (2019), https://doi.org/10.1093/ectj/utaa014.
A. Alaa and M. Van Der Schaar, “Validating Causal Inference Models via Influence Functions,” in Proceedings of the 36th International Conference on Machine Learning, ed. K. Chaudhuri and R. Salakhutdinov (PMLR, 2019), 191–201.
S. J. Ruberg and L. Shen, “Personalized Medicine: Four Perspectives of Tailored Medicine,” Statistics in Biopharmaceutical Research 7, no. 3 (2015): 214–229.
S. Athey, J. Tibshirani, and S. Wager, “Generalized Random Forests,” Annals of Statistics 47, no. 2 (2019): 1148–1178.
P. R. Hahn, J. S. Murray, and C. M. Carvalho, “Bayesian Regression Tree Models for Causal Inference: Regularization, Confounding, and Heterogeneous Effects (With Discussion),” Bayesian Analysis 15, no. 3 (2020): 965–1056, https://doi.org/10.1214/19‐BA1195.
E. Hermansson and D. Svensson, “On Discovering Treatment‐Effect Modifiers Using Virtual Twins and Causal Forest ML in the Presence of Prognostic Biomarkers,” in Computational Science and Its Applications – ICCSA, ed. O. Gervasi, B. Murgante, S. Misra, et al. (Springer International Publishing, 2021), 624–640.
P. Robinson, “Root‐ N‐Consistent Semiparametric Regression,” Econometrica 56, no. 4 (1988): 931–954.
K. Sechidis, C. Zhang, S. Sun, Y. Chen, A. Spector, and B. Bornkamp, “Using Individualized Treatment Effects to Assess Treatment Effect Heterogeneity,” Statistics in Medicine 44 (2025).
S. Athey and G. Imbens, “Recursive Partitioning for Heterogeneous Causal Effects,” Proceedings of the National Academy of Sciences 113, no. 27 (2016): 7353–7360.
L. Tian, A. A. Alizadeh, A. J. Gentles, and R. Tibshirani, “A Simple Method for Estimating Interactions Between a Treatment and a Large Number of Covariates,” Journal of the American Statistical Association 109, no. 508 (2014): 1517–1532.
C. Molnar, Interpretable Machine Learning: A Guide for Making Black Box Models Explainable (Self‐Published, 2019).
C. Molnar, G. Casalicchio, and B. Bischl, “Interpretable Machine Learning ‐ A Brief History, State‐Of‐The‐Art and Challenges,” in Joint European Conference on Machine Learning and Knowledge Discovery in Databases (Springer International Publishing, 2020), 417–431.
M. T. Ribeiro, S. Singh, and C. Guestrin, “Why Should i Trust You? Explaining the Predictions of Any Classifier,” in ACM SIGKDD (Association for Computing Machinery, 2016), 1135–1144.
K. Simonyan, A. Vedaldi, and A. Zisserman, Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps (ICLR, 2014).
A. Shrikumar, P. Greenside, and A. Kundaje, Learning Important Features Through Propagating Activation Differences (PMLR, 2017), 3145–3153.
R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, Grad‐Cam: Visual Explanations From Deep Networks via Gradient‐Based Localization (IEEE, 2017), 618–626.
D. Smilkov, N. Thorat, B. Kim, F. Viégas, and M. Wattenberg, “Smoothgrad: Removing Noise by Adding Noise,” 2017. arXiv Preprint arXiv.1706.03825.
C. Molnar, Interpretable Machine Learning. A Guide for Making Black Box Models Explainable, 2nd ed. (Self‐published, 2024).
L. S. Shapley, A Value for n‐Person Games (Princeton University Press, 1953), 307–318.
M. Sundararajan and A. Najmi, “The Many Shapley Values for Model Explanation,” 2020. arXiv Preprint arXiv:1908.08474.
K. Aas, M. Jullum, and A. Løland, “Explaining Individual Predictions When Features Are Dependent: More Accurate Approximations to Shapley Values,” Artificial Intelligence 298 (2021): 103502, https://doi.org/10.1016/j.artint.2021.103502.
S. Lipovetsky and M. Conklin, “Analysis of Regression in Game Theory Approach,” Applied Stochastic Models in Business and Industry 17, no. 4 (2001): 319–330.
J. H. Friedman, “Greedy Function Approximation: A Gradient Boosting Machine,” Annals of Statistics 29 (2001): 1189–1232.
S. M. Lundberg, G. Erion, H. Chen, et al., “From Local Explanations to Global Understanding With Explainable AI for Trees,” Nature Machine Intelligence 2, no. 1 (2020): 56–67.
H. Chen, I. C. Covert, S. M. Lundberg, and S. I. Lee, “Algorithms to Estimate Shapley Value Feature Attributions,” Nature Machine Intelligence 5, no. 6 (2023): 590–601.
“Kernelshap: Kernel SHAP,” 2024 R Package Version 0.7.0.
J. Tibshirani, S. Athey, E. Sverdrup, and S. Wager, Grf: Generalized Random Forests. CRAN (CRAN, 2021) R Package Version 2.0.2.
O. Machluf, T. Frostig, G. Shoham, T. Milo, E. Berkman, and R. Pryluk, “Robust CATE Estimation Using Novel Ensemble Methods,” (2025), https://arxiv.org/abs/2407.03690.
N. Sellereite and M. Jullum, “Shapr: An R‐Package for Explaining Machine Learning Models With Dependence‐Aware Shapley Values,” Journal of Open Source Software 5, no. 46 (2019): 2027, https://doi.org/10.21105/joss.02027.
“SuperLearner: Super Learner Prediction,” 2024 R Package Version 2.0–29.
M. Man, T. S. Nguyen, C. Battioui, and G. Mi, Predictive Subgroup/Biomarker Identification and Machine Learning Methods (Springer International Publishing, 2019), 1–22.
A. Messalas, Y. Kanellopoulos, and C. Makris, Model‐Agnostic Interpretability With Shapley Values (IEEE, 2019), 1–7.
R. Shwartz‐Ziv and A. Armon, “Tabular Data: Deep Learning Is Not All You Need,” Information Fusion 81 (2022): 84–90, https://doi.org/10.1016/j.inffus.2021.11.011.
D. Svensson and E. Hermansson, “SHAPCATE,” (2025), https://github.com/DaveJSvensson/SHAP_CATE.
L. Grinsztajn, E. Oyallon, and G. Varoquaux, “Why Do Tree‐Based Models Still Outperform Deep Learning on Typical Tabular Data?,” in Neural Information Processing Systems (Curran Associates, Inc., 2022).
T. Chen, T. He, M. Benesty, et al., Xgboost: Extreme Gradient Boosting CRAN (CRAN, 2023). R Package Version 1.7.5.1.
A. Liaw and M. Wiener, “Classification and Regression by randomForest,” R News 2, no. 3 (2002): 18–22.
K. Imai and M. L. Li, “Experimental Evaluation of Individualized Treatment Rules,” Journal of the American Statistical Association 118, no. 541 (2021): 242–256.
A. M. Alaa and S. v. dM, Validating Causal Inference Models via Influence Functions (PMLR, 2019), 95.
T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” in KDD'16. Association for Computing Machinery (ACM, 2016).
I. Lipkovich, D. Svensson, B. Ratitch, and A. Dmitrienko, “Overview of Modern Approaches for Identifying and Evaluating Heterogeneous Treatment Effects From Clinical Data,” Clinical Trials 20, no. 4 (2023): 380–393, https://doi.org/10.1177/17407745231174544.
X. Su, A. Peña, L. Liu, and R. Levine, “Random Forests of Interaction Trees for Estimating Individualized Treatment Effects in Randomized Trials,” Statistics in Medicine 37 (2017): 2547–2560, https://doi.org/10.1002/sim.7660.
A. J. Vickers, R. W. Rees, C. E. Zollman, et al., “Acupuncture for Chronic Headache in Primary Care: Large, Pragmatic, Randomised Trial,” BMJ 328, no. 7442 (2004): 744, https://doi.org/10.1136/bmj.38029.421863.EB.
H. Seibold, A. Zeileis, and T. Hothorn, “Model‐Based Recursive Partitioning for Subgroup Analyses,” International Journal of Biostatistics 12, no. 1 (2016): 45–63.
C. M. Witt, E. A. Vertosick, N. E. Foster, et al., “The Effect of Patient Characteristics on Acupuncture Treatment Outcomes: An Individual Patient Data Meta‐Analysis of 20,827 Chronic Pain Patients in Randomized Controlled Trials,” Clinical Journal of Pain 35 (2019): 428–434.
S. J. Ruberg, “Assessing and Communicating Heterogeneity of Treatment Effects for Patient Subpopulations: The Hardest Problem There Is,” Pharmaceutical Statistics 20, no. 5 (2020): 939–944, https://doi.org/10.1002/pst.2110.
E. Candes, Y. Fan, L. Janson, and J. Lv, “Panning for Gold:'Model‐X'knockoffs for High Dimensional Controlled Variable Selection,” Journal of the Royal Statistical Society. Series B, Statistical Methodology 80, no. 3 (2018): 551–577.
T. Jiang, Y. Li, and A. A. Motsinger‐Reif, “Knockoff Boosted Tree for Model‐Free Variable Selection,” Bioinformatics 37, no. 7 (2020): 976–983, https://doi.org/10.1093/bioinformatics/btaa770.
K. Sechidis, M. Kormaksson, and D. Ohlssen, “Using Knockoffs for Controlled Predictive Biomarker Identification,” Statistics in Medicine 40, no. 25 (2021): 5453–5473.
I. Madakkatel and E. Hyppönen, “LLpowershap: Logistic Loss‐Based Automated Shapley Values Feature Selection Method,” BMC Medical Research Methodology 24, no. 1 (2024): 247.
M. Muschalik, H. Baniecki, F. Fumagalli, P. Kolpaczki, B. Hammer, and E. Hüllermeier, “Shapiq: Shapley Interactions for Machine Learning,” in Neural Information Processing Systems (NeurIPS, 2024).
Weitere Informationen
In recent years, two parallel research trends have emerged in machine learning, yet their intersections remain largely unexplored. On one hand, there has been a significant increase in literature focused on Individual Treatment Effect (ITE) modeling, particularly targeting the Conditional Average Treatment Effect (CATE) using meta-learner techniques. These approaches often aim to identify causal effects from observational data. On the other hand, the field of Explainable Machine Learning (XML) has gained traction, with various approaches developed to explain complex models and make their predictions more interpretable. A prominent technique in this area is Shapley Additive Explanations (SHAP), which has become mainstream in data science for analyzing supervised learning models. However, there has been limited exploration of SHAP's application in identifying predictive biomarkers through CATE models, a crucial aspect in pharmaceutical precision medicine. We address inherent challenges associated with the SHAP concept in multi-stage CATE strategies and introduce a surrogate estimation approach that is agnostic to the choice of CATE strategy, effectively reducing computational burdens in high-dimensional data. Using this approach, we conduct simulation benchmarking to evaluate the ability to accurately identify biomarkers using SHAP values derived from various CATE meta-learners and Causal Forest.
(© 2026 John Wiley & Sons Ltd.)