Investigating Gender Differential Item Functioning (DIF) in the eTIMSS 2019 Math Assessment
Abstract
The Trends in International Mathematics and Science Study (TIMSS) has long been a crucial tool for evaluating students’ mathematical achievement on a global scale. The reliability, validity, and fairness of its score interpretations have been extensively researched. With the introduction of the digital version (eTIMSS) and the addition of innovative Problem-Solving and Inquiry (PSI) tasks in 2019, it becomes essential to explore how these newly adopted changes affect item and assessment quality. Differential Item Functioning (DIF) analysis is an advanced method in educational assessment used to detect potentially biased items across different demographic groups, such as gender groups (boys and girls). However, previous research on math assessments has yielded inconclusive findings on which item features may be the sources of gender-related DIF. By utilizing eTIMSS 2019 United States student response and demographic data, this study has two main objectives: (1) identifying DIF items and estimating their magnitude using both Non-Item Response Theory (Non-IRT)-based and Item Response Theory (IRT)-based DIF detection methods, and (2) systematically coding item features through consensus coding to explore their associations with DIF patterns using stepwise linear regression. The findings indicated that the gender DIF patterns were related to the number of clauses, the presence of context, the involvement of construction, and the number of metric system units. Additionally, newly added PSI booklets exhibited a higher percentage of DIF items compared to regular digital items. The inconsistency in linking item features to DIF estimates between regular digital math items and PSI items stressed the need for further research on the effects of item features on how students respond to different task types, especially in the era of digital technology.
Description
Thesis (Master's)--University of Washington, 2025
