ASSESSING AI-GENERATED MATH ITEMS: EVIDENCE FROM EIGHTH-GRADE TRIANGLE CONGRUENCE

Authors

  • MAJED M. ALJODEH DEPARTMENT OF EDUCATION AND PSYCHOLOGY, UNIVERSITY OF TABUK, TABUK, SAUDI ARABIA

Keywords:

AI-generated assessment; ChatGPT; curriculum alignment; human-AI collaboration; mathematics education.

Abstract

The rapid advancement of generative AI has prompted new inquiries into its role in educational assessment. This study evaluates the quality of a mathematics achievement test on triangle congruence generated by ChatGPT, aligned with the eighth-grade Jordanian curriculum. The objective was to assess the validity, curricular relevance, clarity, and grade-level appropriateness of AI-generated test items. A quantitative design was employed, with 72 educational professionals including supervisors and mathematics teachers rating 20 multiple-choice items using a five-point Likert scale across four evaluation criteria. Findings showed that 80% of the items met or exceeded expert expectations, 85% scored as having high content validity, and 75% matched the expected difficulty level concerning cognitive demands. However, three items raised red flags on language clarity or contextual drift, indicating an ongoing requirement for human moderation in educational contexts that are specific to culture. These findings suggest that human-AI co-design models can enhance assessment efficiency while safeguarding pedagogical standards. Implications for integrating generative AI into curriculum-based assessment frameworks and recommendations for educator training, ethical oversight, and prompt refinement are discussed.

Downloads

How to Cite

ALJODEH, M. M. (2025). ASSESSING AI-GENERATED MATH ITEMS: EVIDENCE FROM EIGHTH-GRADE TRIANGLE CONGRUENCE. TPM – Testing, Psychometrics, Methodology in Applied Psychology, 32(S2(2025) : Posted 09 June), 251–265. Retrieved from https://tpmap.org/submission/index.php/tpm/article/view/219