文章詳目資料

測驗統計年刊

  • 加入收藏
  • 下載文章
篇名 A Comparison of Three Polytomous DIF Detection Methods
卷期 18下
並列篇名 三種多元化計分題之試題差異性診斷法的比較
作者 吳莉安蔡蓉青
頁次 001-021
關鍵字 試題差異性羅吉斯迴歸檢定差異試題及測驗功能檢定differential item functioning logistic regression proceduredifferential functioning of items and tests procedureTSCI
出刊日期 201012

中文摘要

本論文以模擬研究比較了三種不同的試題差異性(DIF)診斷法—羅吉斯迴歸檢定、概度比檢定,以及差異試題及測驗功能檢定在等級反應模式(graded response model)下之表現。操縱變因包括了樣本數(兩種)、母群體之分配(兩種)、以及測驗中所含DIF題數之比例(四種)。在十六種組合之下,各做了一百次試驗。試驗結果發現,這三種方法之型一誤差(type I error)大致上都符合0.05的限定。而在檢定力(power)的表現上,概度比檢定最好、差異試題及測驗功能檢定次之、羅吉斯迴歸檢定最差。平均而言,羅吉斯迴歸檢定之檢定力的表現低於0.4,而且只對DIF性質明顯的題目偵測較為靈敏。

英文摘要

The performance of the three procedures -- the logistic regression procedure (LogR), the likelihood ratio test (LRT), and the differential functioning of items and tests procedure (DFIT) in detecting differential item functioning (DIF) under the graded response model were compared in a simulation study. Factors manipulated included sample size, differences in the ability distributions between the focal and the reference groups, and four different percentages of DIF items contained in a test. For each of the sixteen combinations, 100 replications of DIF detection were simulated. All three DIF procedures adhered to nominal type I error rates under most conditions. LRT was the most powerful among the three under all situations. DFIT was less powerful than LRT, but also useful for DIF detection especially with groups of different ability distributions and relatively large percentage of DIF items. LogR, with mean powers lower than 0.4 in all conditions, appeared to be sensitive only to items with large DIF size.

相關文獻