Result: Leveraging Modern Machine Learning to Improve Early Warning Systems and Reduce Chronic Absenteeism in Early Childhood. EdWorkingPaper No. 24-1081

Title:
Leveraging Modern Machine Learning to Improve Early Warning Systems and Reduce Chronic Absenteeism in Early Childhood. EdWorkingPaper No. 24-1081
Language:
English
Source:
Annenberg Institute for School Reform at Brown University. 2024.
Availability:
Annenberg Institute for School Reform at Brown University. Brown University Box 1985, Providence, RI 02912. Tel: 401-863-7990; Fax: 401-863-1290; e-mail: AISR_Info@brown.edu; Web site: http://www.annenberginstitute.org
Peer Reviewed:
N
Page Count:
50
Publication Date:
2024
Sponsoring Agency:
Institute of Education Sciences (ED)
Contract Number:
R305A220036
R305B200011
Document Type:
Report Reports - Research
Education Level:
Elementary Education
Early Childhood Education
Kindergarten
Primary Education
Grade 1
Grade 2
Grade 3
Geographic Terms:
IES Funded:
Yes
Entry Date:
2025
Accession Number:
ED663646
Database:
ERIC

Further Information

Chronic absenteeism is a critical issue that has been linked to many adverse student outcomes. The current study focuses on improving a key system already in place in many school districts--early warning systems (EWSs)--in order to decrease chronic absenteeism in students' earliest schooling years. Using a demographically diverse population of students followed from PreK to third grade in Boston Public Schools (N=6,698), we demonstrate how and why two modern machine learning algorithms--the Synthetic Minority Oversampling Technique (SMOTE) and Extreme Gradient Boosting (XGBoost)--can improve EWS accuracy in proactively identifying students who are at risk of becoming chronically absent. The best-performing XGBoost model with SMOTE was approximately 52 percentage points more accurate (in terms of recall rate) than the logistic regression model closest to those used in current EWSs in correctly predicting students who would be chronically absent in third grade. Our analyses introduce varying probability thresholds and the incorporation of different years of data, showing the potential of these models to cater to school districts aiming to leverage machine learning predictions while adhering to budgetary or intervention constraints.

As Provided