Close Menu

Journal of Education and Development in the Caribbean

A Raw Score Rater Measurement Model for Performance Assessment

Publication Date: 
December 2010

There are several disadvantages of Rasch and other probabilistic models that outweigh those of the raw score classical test theory models. Examiners, parents and students can be mystified by the Rasch and probabilistic models as they may be less relevant in formative or training contexts. For these reasons, a raw score model of rater severity and rater fit was constructed and was applied to a data set of 8708 English Language proficiency candidates. This model is more intuitive and thus easier to explain to the stakeholders, including examiners, as a preliminary quality control tool. The results are compared with that of the many-facet Rasch measurement (MFRM) model and show that by using a data set where all raters rate each essay, the correlation of both model severities is .998 but is reduced to .630 for the dataset with a significant amount of missing data (blind double-marking). When the adjustments for rater severity were conducted using the MFRM model, over one-half of the candidates’ scores differed by more than one-half mark. The significance of this was seen in the corresponding grade adjustments; 7% of the candidates re-graded, clearly showing that such differences can have important consequences for candidates whose scores lie near the cut-off boundary. The new CTT rater model yielded about ½ (3.3% compared with 6.5%) of the full MFRM rater severity model re-grade adjustments.

Top of Page