OPTIMIZING SCORING RELIABILITY FOR CREATIVE MATHEMATICAL PROBLEM-SOLVING ASSESSMENTS: A GENERALIZABILITY THEORY APPROACH
Keywords:
Generalizability Theory, creative problem-solving, constructed-response test, inter-rater reliability, essay test.Abstract
Rater-induced error significantly challenges the scoring reliability of creative mathematical problem-solving assessments. This study applied Generalizability Theory to analyze score variance from 140 students and 3 raters across three scoring designs. The Generalizability (G) study revealed the person-by-rater interaction as the largest error source (35.50-35.90%), highlighting inconsistent rater judgments. A Decision (D) study showed that increasing raters from one to three substantially improved reliability (relative G-coefficient: .45 to .71). Notably, a design where each rater specializes in scoring specific items (p x (i:r)) yielded the highest absolute reliability (.69). These findings provide empirical guidance for designing effective scoring procedures to enhance the reliability of complex skill assessments.
Downloads
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.