When rater reliability is not enough: Teacher observation systems and a case for the generalizability study

Hill, H. C.; Charalambous, Charalambos Y.; Kraft, M. A.

doi:10.3102/0013189X12437203

Article

Date

2012

Author

Hill, H. C.

Charalambous, Charalambos Y.
Kraft, M. A.

Source

Educational Researcher

Volume

Issue

Pages

56-64

Google Scholar check

Metadata

Show full item record

Abstract

In recent years, interest has grown in using classroom observation as a means to several ends, including teacher development, teacher evaluation, and impact evaluation of classroom-based interventions. Although education practitioners and researchers have developed numerous observational instruments for these purposes, many developers fail to specify important criteria regarding instrument use. In this article, the authors argue that for classroom observation to succeed in its aims, improved observational systems must be developed. These systems should include not only observational instruments but also scoring designs capable of producing reliable and cost-efficient scores and processes for rater recruitment, training, and certification. To illustrate how such a system might be developed and improved, the authors provide an empirical example that applies generalizability theory to data from a mathematics observational instrument. © 2012 AERA.