Item Analysis of Mid-Term Science Examination Questions in Junior Secondary Education: Evidence from Indonesia

Iis Nopita Sari; Messy Putri Anggraini; Rasmita Maryoni; Ahmad Walid

doi:10.62159/isej.v4i3.358

Iis Nopita Sari Universitas Islam Negeri Fatmawati Soekarno Bengkulu
Messy Putri Anggraini Universitas Islam Negeri Fatmawati Soekarno Bengkulu
Rasmita Maryoni Universitas Islam Negeri Fatmawati Soekarno Bengkulu
Ahmad Walid Universitas Islam Negeri Fatmawati Soekarno Bengkulu

DOI: https://doi.org/10.62159/isej.v4i3.358

Keywords: Assessment Literacy, Item Analysis, Junior Secondary Education, Science Examination, Test Reliability

Abstract

Assessment quality is a crucial element in ensuring that science education achieves its intended learning objectives, particularly at the junior secondary level where foundational concepts are introduced. This study aimed to evaluate the quality of teacher-constructed mid-term examination items in Grade VII science by focusing on item difficulty, discrimination index, and overall reliability. Using a quantitative descriptive design, data were collected from 68 student responses to 25 test items, comprising 20 multiple-choice and 5 essay questions, and analyzed through classical test theory. The findings showed that 28% of items were classified as easy, 52% as moderate, and 20% as difficult, with 60% demonstrating acceptable or good discrimination power and a KR-20 reliability coefficient of 0.72, indicating adequate internal consistency. While the results suggest that the test achieved a balanced level of difficulty and acceptable reliability, the presence of items with poor discrimination and extreme difficulty levels reveals weaknesses in test construction. The discussion underscores that systematic item analysis is essential to refine teacher-made assessments and align them with both curriculum standards and international benchmarks. The novelty of this study lies in its focus on teacher-constructed science tests in junior secondary schools in Indonesia, a context that remains underexplored in the literature. The implications of this research point to the need for enhanced teacher assessment literacy, institutional support, and continuous evaluation practices to improve the validity and reliability of classroom-based examinations.

Downloads

Download data is not yet available.

References

Adeoye, M. A., & Jimoh, H. A. (2023). Problem-Solving Skills Among 21st-Century Learners Toward Creativity and Innovation Ideas. Thinking Skills and Creativity Journal, 6(1), 52–58. https://doi.org/10.23887/tscj.v6i1.62708

Ali Rezigalla, A. (2022). Item Analysis: Concept and Application. In Medical Education for the 21st Century. IntechOpen. https://doi.org/10.5772/intechopen.100138

Andarwulan, T., Al Fajri, T. A., & Damayanti, G. (2021). Elementary Teachers’ Readiness toward the Online Learning Policy in the New Normal Era during Covid-19. International Journal of Instruction, 14(3), 771–786. https://doi.org/10.29333/iji.2021.14345a

Apriliyani, P., Susantini, E., & Yuliani, Y. (2023). Validity of Science Literacy on the Respiratory System in Indonesia’s Merdeka Curriculum. IJORER : International Journal of Recent Educational Research, 4(2), 163–175. https://doi.org/10.46245/ijorer.v4i2.297

Atikah, A., Sudiyatno, S., Rahim, A., & Marlina, M. (2022). Assessing the item of final assessment mathematics test of junior high school using Rasch model. Jurnal Elemen, 8(1), 117–130. https://doi.org/10.29408/jel.v8i1.4482

Azraii, A. B., Ramli, A. S., Ismail, Z., Abdul-Razak, S., Badlishah-Sham, S. F., Mohd-Kasim, N. A., Ali, N., Watts, G. F., & Nawawi, H. (2021). Validity and reliability of an adapted questionnaire measuring knowledge, awareness and practice regarding familial hypercholesterolaemia among primary care physicians in Malaysia. BMC Cardiovascular Disorders, 21(1), 39. https://doi.org/10.1186/s12872-020-01845-y

Beerepoot, M. T. P. (2023). Formative and Summative Automated Assessment with Multiple-Choice Question Banks. Journal of Chemical Education, 100(8), 2947–2955. https://doi.org/10.1021/acs.jchemed.3c00120

Black, P., & Wiliam, D. (2018). Classroom assessment and pedagogy. Assessment in Education: Principles, Policy & Practice, 25(6), 551–575. https://doi.org/10.1080/0969594X.2018.1441807

Bonner, S. M. (2013). Validity in Classroom Assessment: Purposes, Properties, and Principles. In SAGE Handbook of Research on Classroom Assessment (pp. 87–106). SAGE Publications, Inc. https://doi.org/10.4135/9781452218649.n6

Brookhart, S. M. (2018). Appropriate Criteria: Key to Effective Rubrics. Frontiers in Education, 3. https://doi.org/10.3389/feduc.2018.00022

Brookhart, S. M., & McMillan, J. H. (2019). Classroom Assessment and Educational Measurement. Routledge. https://doi.org/10.4324/9780429507533

Depiani, M. R., Pujani, N. M., & Devi, N. L. P. L. (2019). PENGEMBANGAN INSTRUMEN PENILAIAN PRAKTIKUM IPA BERBASIS INKUIRI TERBIMBING. Jurnal Pendidikan Dan Pembelajaran Sains Indonesia (JPPSI), 2(2), 59. https://doi.org/10.23887/jppsi.v2i2.19374

Fauzie, M., Pada, A. U. T., & Supriatno, S. (2021). Analysis of the difficulty index of item bank according to cognitive aspects during the Covid-19 pandemic. Jurnal Penelitian Dan Evaluasi Pendidikan, 25(2). https://doi.org/10.21831/pep.v25i2.42603

Göloğlu Demir, C. (2021). The impact of high-stakes testing on the teaching and learning processes of mathematics. Journal of Pedagogical Research, 5(2), 119–137. https://doi.org/10.33902/JPR.2021269677

Granberg, C., Palm, T., & Palmberg, B. (2021). A case study of a formative assessment practice and the effects on students’ self-regulated learning. Studies in Educational Evaluation, 68, 100955. https://doi.org/10.1016/j.stueduc.2020.100955

Hartati, N., & Yogi, H. P. S. (2019). Item Analysis for a Better Quality Test. English Language in Focus (ELIF), 2(1), 59. https://doi.org/10.24853/elif.2.1.59-70

Hidayah, M. A., Retnawati, H., & Yusron, E. (2022). Characteristics of National Standardized School Examination Test Items on Biology Subject in High School. Journal of Education Research and Evaluation, 6(3), 397–406. https://doi.org/10.23887/jere.v6i3.42656

Hirsh, Å. (2020). When assessment is a constant companion: students’ experiences of instruction in an era of intensified assessment focus. Nordic Journal of Studies in Educational Policy, 6(2), 89–102. https://doi.org/10.1080/20020317.2020.1756192

Iryayo, M., & Widyantoro, A. (2018). Exploring the accuracy of school-based English test items for grade XI students of senior high schools. REID (Research and Evaluation in Education), 4(1), 45–57. https://doi.org/10.21831/reid.v4i1.19971

Ismail, S. M., Rahul, D. R., Patra, I., & Rezvani, E. (2022). Formative vs. summative assessment: impacts on academic motivation, attitude toward learning, test anxiety, and self-regulation skill. Language Testing in Asia, 12(1), 40. https://doi.org/10.1186/s40468-022-00191-4

Karim, S. A., Sudiro, S., & Sakinah, S. (2021). Utilizing test items analysis to examine the level of difficulty and discriminating power in a teacher-made test. EduLite: Journal of English Education, Literature and Culture, 6(2), 256. https://doi.org/10.30659/e.6.2.256-269

Khairani, A. Z., & Shamsuddin, H. (2016). Assessing Item Difficulty and Discrimination Indices of Teacher-Developed Multiple-Choice Tests. In Assessment for Learning Within and Beyond the Classroom (pp. 417–426). Springer Singapore. https://doi.org/10.1007/978-981-10-0908-2_35

Kissi, P., Baidoo-Anu, D., Anane, E., & Annan-Brew, R. K. (2023). Teachers’ test construction competencies in examination-oriented educational system: Exploring teachers’ multiple-choice test construction competence. Frontiers in Education, 8. https://doi.org/10.3389/feduc.2023.1154592

Lahza, H., Smith, T. G., & Khosravi, H. (2023). Beyond item analysis: Connecting student behaviour and performance using e‐assessment logs. British Journal of Educational Technology, 54(1), 335–354. https://doi.org/10.1111/bjet.13270

Misbah, Z., Gulikers, J., Dharma, S., & Mulder, M. (2020). Evaluating competence-based vocational education in Indonesia. Journal of Vocational Education & Training, 72(4), 488–515. https://doi.org/10.1080/13636820.2019.1635634

Moto, A., Musyarofah, L., & Taufik, ani S. (2022). Developing Item Analysis of Teacher–Made Test for Summative Assessment of Seventh Grade of SMPN 8 Komodo in Academic Year 2020/2021. Budapest International Research and Critics Institute (BIRCI-Journal). https://doi.org/10.33258/birci.v5i1.4237

Mulyani, H., Tanuatmodjo, H., & Iskandar, R. (2020). Quality analysis of teacher-made tests in financial accounting subject at vocational high schools. Jurnal Pendidikan Vokasi, 10(1). https://doi.org/10.21831/jpv.v10i1.29382

Muslim Darmawan, -, S., Dwi Riyanti, Yohanes Gatot Sutapa Yuliana, & Sumarni. (2022). Test-Items Analysis of English Teacher-Made Test. Journal of English Education and Teaching, 6(4), 498–513. https://doi.org/10.33369/jeet.6.4.498-513

Naumann, A., Rieser, S., Musow, S., Hochweber, J., & Hartig, J. (2019). Sensitivity of test items to teaching quality. Learning and Instruction, 60, 41–53. https://doi.org/10.1016/j.learninstruc.2018.11.002

Nugroho, A., Warnars, H. L. H. S., Heriyadi, Y., & Tanutama, L. (2019). Measure The Level Of Success In Using Google Drive with the Kuder Richardson (KR) Reliability Method. 2019 International Congress on Applied Information Technology (AIT), 1–7. https://doi.org/10.1109/AIT49014.2019.9144915

Odukoya, J. A., Adekeye, O., Igbinoba, A. O., & Afolabi, A. (2018). Item analysis of university-wide multiple choice objective examinations: the experience of a Nigerian private university. Quality & Quantity, 52(3), 983–997. https://doi.org/10.1007/s11135-017-0499-2

Osborne, J. (2013). The 21st century challenge for science education: Assessing scientific reasoning. Thinking Skills and Creativity, 10, 265–279. https://doi.org/10.1016/j.tsc.2013.07.006

Paidi, P., Mercuriani, I. S., & Subali, B. (2020). Students’ Competence in Cognitive Process and Knowledge in Biology Based on Curriculum Used in Indonesia. International Journal of Instruction, 13(3), 491–510. https://doi.org/10.29333/iji.2020.13334a

Patric Griffin, E. C. (2015). Assessment and Teaching of 21st Century Skills (P. Griffin & E. Care (eds.)). Springer Netherlands. https://doi.org/10.1007/978-94-017-9395-7

Perdana, R., Riwayani, R., Jumadi, J., & Rosana, D. (2019). Development, Reliability, and Validity of Open-ended Test to Measure Student’s Digital Literacy Skill. International Journal of Educational Research Review, 4(4), 504–516. https://doi.org/10.24331/ijere.628309

PISA. (2023). PISA 2022 Results (Volume II). OECD. https://doi.org/10.1787/a97db61c-en

Puad, L. M. A. Z., & Ashton, K. (2023). A critical analysis of Indonesia’s 2013 national curriculum: Tensions between global and local concerns. The Curriculum Journal, 34(3), 521–535. https://doi.org/10.1002/curj.194

Rafi, I., Retnawati, H., Apino, E., Hadiana, D., Lydiati, I., & Rosyada, M. N. (2023). What might be frequently overlooked is actually still beneficial: Learning from post national-standardized school examination. Pedagogical Research, 8(1), em0145. https://doi.org/10.29333/pr/12657

Schildkamp, K., van der Kleij, F. M., Heitink, M. C., Kippers, W. B., & Veldkamp, B. P. (2020). Formative assessment: A systematic review of critical teacher prerequisites for classroom practice. International Journal of Educational Research, 103, 101602. https://doi.org/10.1016/j.ijer.2020.101602

Setyawarno, D., & Kurniawati, A. (2018). Implementation of Authentic Assessment in Science Learning at Indonesian Schools. Journal of Science Education Research, 2(2), 47–55. https://doi.org/10.21831/jser.v2i2.22468

Sumarsono, D., Arrafii, M. A., & Imansyah, I. (2023). Evaluating the Quality of a Teacher’s Made Test against Five Principles of Language Assessment. Journal of Languages and Language Teaching, 11(2), 225. https://doi.org/10.33394/jollt.v11i2.7481

Swiecki, Z., Khosravi, H., Chen, G., Martinez-Maldonado, R., Lodge, J. M., Milligan, S., Selwyn, N., & Gašević, D. (2022). Assessment in the age of artificial intelligence. Computers and Education: Artificial Intelligence, 3, 100075. https://doi.org/10.1016/j.caeai.2022.100075

Vivian Wong Shao Yun, Norhidayah Md Ulang, & Siti Hamidah Husain. (2023). Measuring the Internal Consistency and Reliability of the Hierarchy of Controls in Preventing Infectious Diseases on Construction Sites: The Kuder-Richardson (KR-20) and Cronbach’s Alpha. Journal of Advanced Research in Applied Sciences and Engineering Technology, 33(1), 392–405. https://doi.org/10.37934/araset.33.1.392405

Wardani, I. S., & Fiorintina, E. (2023). Building Critical Thinking Skills of 21st Century Students through Problem Based Learning Model. JPI (Jurnal Pendidikan Indonesia), 12(3), 461–470. https://doi.org/10.23887/jpiundiksha.v12i3.58789

Yan, Z., Li, Z., Panadero, E., Yang, M., Yang, L., & Lao, H. (2021). A systematic review on factors influencing teachers’ intentions and implementations regarding formative assessment. Assessment in Education: Principles, Policy & Practice, 28(3), 228–260. https://doi.org/10.1080/0969594X.2021.1884042