Have any question ? +44 2030 2627 92

ISSN: 2977-0041 | Open Access

Journal of Material Sciences and Engineering Technology

Volume : 3 Issue : 4

SHAP-Enhanced Machine Learning for Explainable Stroke Risk Prediction in Hypertensive Patients

David Olanrewaju Akinwale, Oluwasegun Anthony Bosede and Olushina Olawale Awe*

ABSTRACT
Background:
Stroke remains a critical global health concern, disproportionately affecting individuals with hypertension which is a well-established, modifiable risk factor. Traditional risk scoring systems often fall short in accurately predicting stroke onset due to their reliance on fixed clinical thresholds and limited variable interaction modeling. As the complexity of health data increases, machine learning (ML) and explainable artificial intelligence (XAI) present powerful tools to uncover hidden patterns and enable precision risk stratification.

Objectives: This study proposes a novel, interpretable machine learning framework that uses ensemble learning and SHapley Additive exPlanations (SHAP) to enhance stroke risk prediction in hypertensive patients. The objective is twofold: to improve the predictive power of stroke models and to provide clinically relevant insights that support real-time, data-driven decisions in preventive care.

Methods: We utilized a real-world clinical dataset encompassing demographic, physiological, and behavioral variables associated with stroke. Data preprocessing included k-nearest neighbor imputation for missing values, normalization of continuous features, and class balancing via Synthetic Minority Oversampling Technique (SMOTE). A hybrid feature selection pipeline, combining the sparsityenforcing capabilities of LASSO regression with the iterative refinement of Recursive Feature Elimination (RFE), was employed to identify the most salient predictors. Multiple ML models, including logistic regression, deep neural networks, random forests, and gradient boosting machines, were trained and validated using cross-validation. SHAP values were computed post-training to enable individualized, interpretable model outputs.

Results: Ensemble models, particularly Gradient Boosting and Random Forest, demonstrated superior discriminative performance, achieving AUC-ROC scores above 0.78 following class balancing. The integrated LASSO-RFE approach revealed age, hypertension status, and average glucose levels as dominant predictors across models. SHAP visualizations confirmed the influence of these features, while also highlighting nuanced interactions involving lifestyle and socioeconomic variables. Logistic Regression, when optimized for recall, achieved the highest balanced accuracy (0.77), reinforcing the clinical utility of simpler models when interpretability is paramount.

Conclusion: This study introduces a transparent and high-performing machine learning framework for stroke risk prediction in hypertensive individuals. By integrating ensemble learning, hybrid feature selection, and explainable AI, the framework bridges the gap between predictive modeling and clinical applicability. These findings support the deployment of interpretable ML tools in routine care, enabling proactive interventions, personalized patient education, and ultimately, reduced stroke incidence.

JOURNAL INDEXING