Item Analysis of Mid-Term Science Examination Questions in Junior Secondary Education: Evidence from Indonesia
Abstract
Assessment quality is a crucial element in ensuring that science education achieves its intended learning objectives, particularly at the junior secondary level where foundational concepts are introduced. This study aimed to evaluate the quality of teacher-constructed mid-term examination items in Grade VII science by focusing on item difficulty, discrimination index, and overall reliability. Using a quantitative descriptive design, data were collected from 68 student responses to 25 test items, comprising 20 multiple-choice and 5 essay questions, and analyzed through classical test theory. The findings showed that 28% of items were classified as easy, 52% as moderate, and 20% as difficult, with 60% demonstrating acceptable or good discrimination power and a KR-20 reliability coefficient of 0.72, indicating adequate internal consistency. While the results suggest that the test achieved a balanced level of difficulty and acceptable reliability, the presence of items with poor discrimination and extreme difficulty levels reveals weaknesses in test construction. The discussion underscores that systematic item analysis is essential to refine teacher-made assessments and align them with both curriculum standards and international benchmarks. The novelty of this study lies in its focus on teacher-constructed science tests in junior secondary schools in Indonesia, a context that remains underexplored in the literature. The implications of this research point to the need for enhanced teacher assessment literacy, institutional support, and continuous evaluation practices to improve the validity and reliability of classroom-based examinations.
Downloads
References
Adeoye, M. A., & Jimoh, H. A. (2023). Problem-Solving Skills Among 21st-Century Learners Toward Creativity and Innovation Ideas. Thinking Skills and Creativity Journal, 6(1), 52–58. https://doi.org/10.23887/tscj.v6i1.62708
Ali Rezigalla, A. (2022). Item Analysis: Concept and Application. In Medical Education for the 21st Century. IntechOpen. https://doi.org/10.5772/intechopen.100138
Andarwulan, T., Al Fajri, T. A., & Damayanti, G. (2021). Elementary Teachers’ Readiness toward the Online Learning Policy in the New Normal Era during Covid-19. International Journal of Instruction, 14(3), 771–786. https://doi.org/10.29333/iji.2021.14345a
Apriliyani, P., Susantini, E., & Yuliani, Y. (2023). Validity of Science Literacy on the Respiratory System in Indonesia’s Merdeka Curriculum. IJORER : International Journal of Recent Educational Research, 4(2), 163–175. https://doi.org/10.46245/ijorer.v4i2.297
Atikah, A., Sudiyatno, S., Rahim, A., & Marlina, M. (2022). Assessing the item of final assessment mathematics test of junior high school using Rasch model. Jurnal Elemen, 8(1), 117–130. https://doi.org/10.29408/jel.v8i1.4482
Azraii, A. B., Ramli, A. S., Ismail, Z., Abdul-Razak, S., Badlishah-Sham, S. F., Mohd-Kasim, N. A., Ali, N., Watts, G. F., & Nawawi, H. (2021). Validity and reliability of an adapted questionnaire measuring knowledge, awareness and practice regarding familial hypercholesterolaemia among primary care physicians in Malaysia. BMC Cardiovascular Disorders, 21(1), 39. https://doi.org/10.1186/s12872-020-01845-y
Beerepoot, M. T. P. (2023). Formative and Summative Automated Assessment with Multiple-Choice Question Banks. Journal of Chemical Education, 100(8), 2947–2955. https://doi.org/10.1021/acs.jchemed.3c00120
Black, P., & Wiliam, D. (2018). Classroom assessment and pedagogy. Assessment in Education: Principles, Policy & Practice, 25(6), 551–575. https://doi.org/10.1080/0969594X.2018.1441807
Bonner, S. M. (2013). Validity in Classroom Assessment: Purposes, Properties, and Principles. In SAGE Handbook of Research on Classroom Assessment (pp. 87–106). SAGE Publications, Inc. https://doi.org/10.4135/9781452218649.n6
Brookhart, S. M. (2018). Appropriate Criteria: Key to Effective Rubrics. Frontiers in Education, 3. https://doi.org/10.3389/feduc.2018.00022
Brookhart, S. M., & McMillan, J. H. (2019). Classroom Assessment and Educational Measurement. Routledge. https://doi.org/10.4324/9780429507533
Depiani, M. R., Pujani, N. M., & Devi, N. L. P. L. (2019). PENGEMBANGAN INSTRUMEN PENILAIAN PRAKTIKUM IPA BERBASIS INKUIRI TERBIMBING. Jurnal Pendidikan Dan Pembelajaran Sains Indonesia (JPPSI), 2(2), 59. https://doi.org/10.23887/jppsi.v2i2.19374
Fauzie, M., Pada, A. U. T., & Supriatno, S. (2021). Analysis of the difficulty index of item bank according to cognitive aspects during the Covid-19 pandemic. Jurnal Penelitian Dan Evaluasi Pendidikan, 25(2). https://doi.org/10.21831/pep.v25i2.42603
Göloğlu Demir, C. (2021). The impact of high-stakes testing on the teaching and learning processes of mathematics. Journal of Pedagogical Research, 5(2), 119–137. https://doi.org/10.33902/JPR.2021269677
Granberg, C., Palm, T., & Palmberg, B. (2021). A case study of a formative assessment practice and the effects on students’ self-regulated learning. Studies in Educational Evaluation, 68, 100955. https://doi.org/10.1016/j.stueduc.2020.100955
Hartati, N., & Yogi, H. P. S. (2019). Item Analysis for a Better Quality Test. English Language in Focus (ELIF), 2(1), 59. https://doi.org/10.24853/elif.2.1.59-70
Hidayah, M. A., Retnawati, H., & Yusron, E. (2022). Characteristics of National Standardized School Examination Test Items on Biology Subject in High School. Journal of Education Research and Evaluation, 6(3), 397–406. https://doi.org/10.23887/jere.v6i3.42656
Hirsh, Å. (2020). When assessment is a constant companion: students’ experiences of instruction in an era of intensified assessment focus. Nordic Journal of Studies in Educational Policy, 6(2), 89–102. https://doi.org/10.1080/20020317.2020.1756192
Iryayo, M., & Widyantoro, A. (2018). Exploring the accuracy of school-based English test items for grade XI students of senior high schools. REID (Research and Evaluation in Education), 4(1), 45–57. https://doi.org/10.21831/reid.v4i1.19971
Ismail, S. M., Rahul, D. R., Patra, I., & Rezvani, E. (2022). Formative vs. summative assessment: impacts on academic motivation, attitude toward learning, test anxiety, and self-regulation skill. Language Testing in Asia, 12(1), 40. https://doi.org/10.1186/s40468-022-00191-4
Karim, S. A., Sudiro, S., & Sakinah, S. (2021). Utilizing test items analysis to examine the level of difficulty and discriminating power in a teacher-made test. EduLite: Journal of English Education, Literature and Culture, 6(2), 256. https://doi.org/10.30659/e.6.2.256-269
Khairani, A. Z., & Shamsuddin, H. (2016). Assessing Item Difficulty and Discrimination Indices of Teacher-Developed Multiple-Choice Tests. In Assessment for Learning Within and Beyond the Classroom (pp. 417–426). Springer Singapore. https://doi.org/10.1007/978-981-10-0908-2_35
Kissi, P., Baidoo-Anu, D., Anane, E., & Annan-Brew, R. K. (2023). Teachers’ test construction competencies in examination-oriented educational system: Exploring teachers’ multiple-choice test construction competence. Frontiers in Education, 8. https://doi.org/10.3389/feduc.2023.1154592
Lahza, H., Smith, T. G., & Khosravi, H. (2023). Beyond item analysis: Connecting student behaviour and performance using e‐assessment logs. British Journal of Educational Technology, 54(1), 335–354. https://doi.org/10.1111/bjet.13270
Misbah, Z., Gulikers, J., Dharma, S., & Mulder, M. (2020). Evaluating competence-based vocational education in Indonesia. Journal of Vocational Education & Training, 72(4), 488–515. https://doi.org/10.1080/13636820.2019.1635634
Moto, A., Musyarofah, L., & Taufik, ani S. (2022). Developing Item Analysis of Teacher–Made Test for Summative Assessment of Seventh Grade of SMPN 8 Komodo in Academic Year 2020/2021. Budapest International Research and Critics Institute (BIRCI-Journal). https://doi.org/10.33258/birci.v5i1.4237
Mulyani, H., Tanuatmodjo, H., & Iskandar, R. (2020). Quality analysis of teacher-made tests in financial accounting subject at vocational high schools. Jurnal Pendidikan Vokasi, 10(1). https://doi.org/10.21831/jpv.v10i1.29382
Muslim Darmawan, -, S., Dwi Riyanti, Yohanes Gatot Sutapa Yuliana, & Sumarni. (2022). Test-Items Analysis of English Teacher-Made Test. Journal of English Education and Teaching, 6(4), 498–513. https://doi.org/10.33369/jeet.6.4.498-513
Naumann, A., Rieser, S., Musow, S., Hochweber, J., & Hartig, J. (2019). Sensitivity of test items to teaching quality. Learning and Instruction, 60, 41–53. https://doi.org/10.1016/j.learninstruc.2018.11.002
Nugroho, A., Warnars, H. L. H. S., Heriyadi, Y., & Tanutama, L. (2019). Measure The Level Of Success In Using Google Drive with the Kuder Richardson (KR) Reliability Method. 2019 International Congress on Applied Information Technology (AIT), 1–7. https://doi.org/10.1109/AIT49014.2019.9144915
Odukoya, J. A., Adekeye, O., Igbinoba, A. O., & Afolabi, A. (2018). Item analysis of university-wide multiple choice objective examinations: the experience of a Nigerian private university. Quality & Quantity, 52(3), 983–997. https://doi.org/10.1007/s11135-017-0499-2
Osborne, J. (2013). The 21st century challenge for science education: Assessing scientific reasoning. Thinking Skills and Creativity, 10, 265–279. https://doi.org/10.1016/j.tsc.2013.07.006
Paidi, P., Mercuriani, I. S., & Subali, B. (2020). Students’ Competence in Cognitive Process and Knowledge in Biology Based on Curriculum Used in Indonesia. International Journal of Instruction, 13(3), 491–510. https://doi.org/10.29333/iji.2020.13334a
Patric Griffin, E. C. (2015). Assessment and Teaching of 21st Century Skills (P. Griffin & E. Care (eds.)). Springer Netherlands. https://doi.org/10.1007/978-94-017-9395-7
Perdana, R., Riwayani, R., Jumadi, J., & Rosana, D. (2019). Development, Reliability, and Validity of Open-ended Test to Measure Student’s Digital Literacy Skill. International Journal of Educational Research Review, 4(4), 504–516. https://doi.org/10.24331/ijere.628309
PISA. (2023). PISA 2022 Results (Volume II). OECD. https://doi.org/10.1787/a97db61c-en
Puad, L. M. A. Z., & Ashton, K. (2023). A critical analysis of Indonesia’s 2013 national curriculum: Tensions between global and local concerns. The Curriculum Journal, 34(3), 521–535. https://doi.org/10.1002/curj.194
Rafi, I., Retnawati, H., Apino, E., Hadiana, D., Lydiati, I., & Rosyada, M. N. (2023). What might be frequently overlooked is actually still beneficial: Learning from post national-standardized school examination. Pedagogical Research, 8(1), em0145. https://doi.org/10.29333/pr/12657
Schildkamp, K., van der Kleij, F. M., Heitink, M. C., Kippers, W. B., & Veldkamp, B. P. (2020). Formative assessment: A systematic review of critical teacher prerequisites for classroom practice. International Journal of Educational Research, 103, 101602. https://doi.org/10.1016/j.ijer.2020.101602
Setyawarno, D., & Kurniawati, A. (2018). Implementation of Authentic Assessment in Science Learning at Indonesian Schools. Journal of Science Education Research, 2(2), 47–55. https://doi.org/10.21831/jser.v2i2.22468
Sumarsono, D., Arrafii, M. A., & Imansyah, I. (2023). Evaluating the Quality of a Teacher’s Made Test against Five Principles of Language Assessment. Journal of Languages and Language Teaching, 11(2), 225. https://doi.org/10.33394/jollt.v11i2.7481
Swiecki, Z., Khosravi, H., Chen, G., Martinez-Maldonado, R., Lodge, J. M., Milligan, S., Selwyn, N., & Gašević, D. (2022). Assessment in the age of artificial intelligence. Computers and Education: Artificial Intelligence, 3, 100075. https://doi.org/10.1016/j.caeai.2022.100075
Vivian Wong Shao Yun, Norhidayah Md Ulang, & Siti Hamidah Husain. (2023). Measuring the Internal Consistency and Reliability of the Hierarchy of Controls in Preventing Infectious Diseases on Construction Sites: The Kuder-Richardson (KR-20) and Cronbach’s Alpha. Journal of Advanced Research in Applied Sciences and Engineering Technology, 33(1), 392–405. https://doi.org/10.37934/araset.33.1.392405
Wardani, I. S., & Fiorintina, E. (2023). Building Critical Thinking Skills of 21st Century Students through Problem Based Learning Model. JPI (Jurnal Pendidikan Indonesia), 12(3), 461–470. https://doi.org/10.23887/jpiundiksha.v12i3.58789
Yan, Z., Li, Z., Panadero, E., Yang, M., Yang, L., & Lao, H. (2021). A systematic review on factors influencing teachers’ intentions and implementations regarding formative assessment. Assessment in Education: Principles, Policy & Practice, 28(3), 228–260. https://doi.org/10.1080/0969594X.2021.1884042
Copyright (c) 2023 Iis Nopita Sari, Messy Putri Anggraini, Rasmita Maryoni, Ahmad Walid

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with ISEJ: Indonesian Science Education Journal agree to the following terms:
- Authors retain copyright and grant the ISEJ: Indonesian Science Education Journal right of first publication with the work simultaneously licensed under Creative Commons Attribution License (CC BY 4.0) that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors can enter into separate, additional contractual arrangements for the non-exclusive distribution of the published version of the work (e.g., post it to an institutional repository or edit it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) before and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.