Introduction: This paper seeks to quantify the reliability of the assessment of students’ answers to essay-type questions, in an attempt to define the role of such questions in University examinations.
Methods: The marks awarded for essay-type questions during three consecutive final undergraduate examinations in surgery were analyzed. The mean scores, 95% confidence intervals and the standard error of the mean were calculated to determine the distribution of the marks. Statistical analysis was used to determine the correlation of the marks awarded for the same answer by different markers and deduce the dependability of this method of testing.
Results: The marks awarded to 233 answer papers were available for analysis. The marks awarded by each pair of examiners for student answers to individual questions coincided on only 46.3% of
occasions, but varied within just ± 5% on 90.7% of occasions. Use of the kappa index to determine the agreement between markers produced a value of just 0.385, well short of the ideal of 1.0. Assessment of the overall reliability of this type of examination by Cronbach’s reliability coefficent gave a value of 0.672.
Conclusion: There was a significant variation among markers in the evaluation of answers to essaytype questions. However, the overall test reliability was acceptable enough to justify continuation of this type of assessment as a supplement to other methods.