Show simple item record

dc.contributor.authorMichaelides, Michalis P.en
dc.contributor.authorNational Center for Research on Evaluation, Standards, and, Student Testingen
dc.creatorMichaelides, Michalis P.en
dc.creatorNational Center for Research on Evaluation, Standards, and, Student Testingen
dc.date.accessioned2017-07-27T10:22:25Z
dc.date.available04-Jan
dc.date.available2017-07-27T10:22:25Z
dc.date.issued2006
dc.identifier.urihttps://gnosis.library.ucy.ac.cy/handle/7/37688
dc.description.abstractConsistent behavior is a desirable characteristic that common items are expected to have when administered to different groups. Findings from the literature have established that items do not always behave in consistent ways; item indices and IRT item parameter estimates of the same items differ when obtained from different administrations. Content effects, such as discrepancies in instructional emphasis, and context effects, such as changes in the presentation, format, and positioning of the item, may result in differential item difficulty for different groups. When common items are differentially difficult for two groups, using them to generate an equating transformation is questionable. The delta-plot method is a simple, graphical procedure that identifies such items by examining their classical test theory difficulty values. After inspection, such items are likely to drop to a non-common-item status. Two studies are described in this report. Study 1 investigates the influence of common items that behave inconsistently across two administrations on equated score summaries. Study 2 applies an alternative to the delta-plot method for flagging common items for differential behavior across administrations. The first study examines the effects of retaining versus discarding the common items flagged as outliers by the delta-plot method on equated score summary statistics. For four statewide assessments that were administered in two consecutive years under the common-item nonequivalent groups design, the equating functions that transform the Year-2 to the Year-1 scale are estimated using four different IRT equating methods (Stocking & Lord, Haebara, mean/sigma, mean/mean) under two IRT models--the three- and the one-parameter logistic models for the dichotomous items with Samejima's (1969) graded response model for polytomous items. The changes in the Year-2 equated mean scores, mean gains or declines from Year 1 to Year 2, and proportions above a cut-off point are examined when all the common items are used in the equating process versus when the delta-plot outliers are excluded from the common-item pool. Results under the four equating methods were more consistent when a one-parameter rather than when a three-parameter logistic model was fitted. In two of the four assessments, the treatment of outlying common items had an impact on aggregate statistics: equated mean scores, mean gains and proportions above a cut-off differed considerably. Factors such as the number of outlying items, their type (dichotomously or polytomously scored), their level of difficulty, the direction and the amount of their change from Year 1 to Year 2, and the IRT model and equating transformation fitted to the data are discussed with regards to their influence on equated summary statistics. The differential behavior of common items can be considered as a special case of Differential Item Functioning (DIF); the two different groups that respond to a common item can be regarded as the focal and reference groups, and their performance can be compared for DIF. Study 2 applies the Mantel-Haenszel statistic (Mantel & Haenszel, 1959), which is widely used for DIF analysis, on one statewide assessment that was administered to two consecutive annual cohorts of students. Sixty-nine common items, including nine polytomous items, are analyzed first with the delta-plot method and then with the Mantel-Haenszel procedure. A scheme for flagging dichotomous items for negligible, intermediate, or large DIF takes into account both the significance of the Mantel-Haenszel statistic and the effect size of the log-odds ratio; an alternative scheme developed specifically for polytomous items utilizes Mantel's chi-square statistic (Mantel, 1963) and the Standardized Mean Difference (e.g., Dorans & Schmitt, 1991/1993). The Mantel-Haenszel procedure flagged three common items, including one polytomous, for intermediate DIF. The delta-plot identified two dichotomous items only; one of which was flagged by both procedures. Assumptions are examined and it is argued that the Mantel-Haenszel procedure is more appropriate for comparing the performance of two groups because differences in the distributions of ability of the two cohorts are taken into account. The availability of schemes that classify items according to the amount of DIF they exhibit can be informative for the judgmental decision on how to deal with flagged items. However, some caveats relating to test construction and implementation of the equating design are noted if the proposed procedures are to be applied effectively. The same common items and an adequately large number of them must be presented in corresponding forms across administrations. This is pertinent especially for assessments employing a matrix-sampling design, where the common-items are spread among many forms. The following are appended: (1) ST Output--Equating Transformations; (2) Mantel-Haenszel Procedure and ETS Scheme for Flagging; and (3) SPSS Output for Principal Components of Responses to Common. (Contains 12 notes.)en
dc.publisherNational Center for Research on Evaluation, Standards, and Student Testing (CRESST)en
dc.sourceNational Center for Research on Evaluation, Standards, and Student Testing (CRESST)en
dc.subjectEquated scoresen
dc.subjectTest itemsen
dc.subjectItem response theoryen
dc.subjectEvaluation methodsen
dc.subjectDifficulty levelen
dc.subjectContext effecten
dc.subjectComputationen
dc.subjectStudent evaluationen
dc.subjectMantel haenszel procedureen
dc.titleEffects of Misbehaving Common Items on Aggregate Scores and an Application of the Mantel-Haenszel Statistic in Test Equating.en
dc.typeinfo:eu-repo/semantics/report
dc.description.volume688
dc.author.facultyΣχολή Κοινωνικών Επιστημών και Επιστημών Αγωγής / Faculty of Social Sciences and Education
dc.author.departmentΤμήμα Ψυχολογίας / Department of Psychology
dc.type.uhtypeReporten
dc.description.notesAccession Number: ED492876; Sponsoring Agency: Office of Educational Research and Improvement (ED), Washington, DC.; Acquisition Information: National Center for Research on Evaluation, Standards, and Student Testing (CRESST). 300 Charles E Young Drive N, GSE&IS Building 3rd Floor, Mailbox 951522, Los Angeles, CA 90095-1522. Tel: 310-206-1532; Fax: 310-825-3883; Web site: http://www.cresst.org.; Reference Count: 74; Journal Code: JAN2017; Level of Availability: Available online; Publication Type: Reports - Research; Entry Date: 2006en
dc.contributor.orcidMichaelides, Michalis P. [0000-0001-6314-3680]
dc.gnosis.orcid0000-0001-6314-3680


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record