Sampling of Common Items: An Unrecognized Source of Error in Test Equating.

Michaelides, Michalis P.; Haertel, Edward H.; California Univ,  Los Angeles Center for the Study, of Evaluation

dc.contributor.author	Michaelides, Michalis P.	en
dc.contributor.author	Haertel, Edward H.	en
dc.contributor.author	California Univ, Los Angeles Center for the Study, of Evaluation	en
dc.creator	Michaelides, Michalis P.	en
dc.creator	Haertel, Edward H.	en
dc.creator	California Univ, Los Angeles Center for the Study, of Evaluation	en
dc.date.accessioned	2017-07-27T10:22:24Z
dc.date.available	07-Jan
dc.date.available	2017-07-27T10:22:24Z
dc.date.issued	2004
dc.identifier.uri	https://gnosis.library.ucy.ac.cy/handle/7/37687
dc.description.abstract	There is variability in the estimation of an equating transformation because common-item parameters are obtained from responses of samples of examinees. The most commonly used standard error of equating quantifies this source of sampling error, which decreases as the sample size of examinees used to derive the transformation increases. In a similar way of reasoning, the common items that are embedded in test forms are also sampled from a larger pool of items that could potentially serve as common items. Thus, there is additional error variance due to the sampling of common items. Currently, common items are treated as fixed; the conventional standard error of equating captures only the variance due to the sampling of examinees. In this study, a formula for quantifying the standard error due to the sampling of the common items is derived using the delta method and assuming that equating is carried out with the mean/sigma method. The analytic formula relies on the assumption of bivariate normality of the IRT difficulty parameter estimates. The derived standard error and a bootstrap approximation for the same quantity are calculated for a statewide assessment under both three- and one-parameter logistic IRT models; for the polytomous items, a graded response model is fitted. For the oneparameter logistic case, a small-sample bootstrap approximation to the standard error of equating due to the sampling of examinees is derived for comparison purposes. There was some discrepancy between the analytic and the bootstrap approximation of the error due to the sampling of common items. Examination of the assumption of bivariate normality of the difficulty parameter estimates showed that the assumption does not hold for the data set analyzed. For simulated data drawn from a population that was distributed as bivariate normal, the two methods for estimating the error gave nearly identical results, confirming the correctness of the analytic approximation. The comparison with the examinee-sampling standard error of equating revealed that the two sources of equating error were of about the same magnitude. In other words, the conventional standard error of the equating function reflects only about half the equating error variation. Numerical results demonstrate that for individual examinee scores the two equating errors comprised only a small proportion of the total error variance; measurement error was the largest component in individual score variability. For group-level scores though, the picture was different. Measurement error in score summaries shrinks as sample size increases. Examineesampling equating error also decreases as samples become larger. Error due to common-item sampling does not depend on the size of the examinee sample--it is affected by the number of common items used--so it could constitute the dominant source of error for summary scores. The random selection of common items should be acknowledged in the analysis of a test and the arising error variance calculated for proper reporting of score accuracy.	en
dc.publisher	Center for Research on Evaluation Standards and Student Testing CRESST	en
dc.source	Center for Research on Evaluation Standards and Student Testing CRESST	en
dc.subject	Test items	en
dc.subject	Testing	en
dc.subject	Error patterns	en
dc.subject	Interrater reliability	en
dc.subject	Test reliability	en
dc.subject	Comparative analysis	en
dc.title	Sampling of Common Items: An Unrecognized Source of Error in Test Equating.	en
dc.type	info:eu-repo/semantics/report
dc.description.volume	636
dc.author.faculty	Σχολή Κοινωνικών Επιστημών και Επιστημών Αγωγής / Faculty of Social Sciences and Education
dc.author.department	Τμήμα Ψυχολογίας / Department of Psychology
dc.type.uhtype	Report	en
dc.description.notes	Accession Number: ED483403; Sponsoring Agency: Institute of Education Sciences (ED), Washington, DC; Acquisition Information: Center for the Study of Evaluation (CSE)/National Center for Research on Evaluation, Standards, and Student Testing (CRESST), 300 Charles E. Young Drive North, GSE&IS Bldg., 3rd Flr./Mailbox 951522, Los Angeles, CA 90095-1522. Tel: 310-206-1532.; Reference Count: 27; Journal Code: JAN2017; Level of Availability: Available online; Publication Type: Reports - Descriptive; Entry Date: 2005	en
dc.contributor.orcid	Michaelides, Michalis P. [0000-0001-6314-3680]
dc.gnosis.orcid	0000-0001-6314-3680

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

Τμήμα Ψυχολογίας / Department of Psychology [935]

Show simple item record