Technical information
Four scoring methods were used to calibrate examinees’ scores in this CAT simulation study. The first scoring method was maximum likelihood (ML). The goal of ML is to find an estimate of
θ that maximizes the likelihood of observing the response pattern given the items administered. ML does not work if examinee does not have all 0s or all 1s in his/her response pattern. When the response pattern is nonmixed, the likelihood function will still be a monotonically increasing function, like the item response curve. This problem has been addressed by combining ML with other estimation methods. The second method was the weighted likelihood estimation (WLE). Based on the ML estimator, as
n becomes large, the bias approaches zero. In applied testing circumstances,
n is not arbitrarily large. Thus, bias will not asymptotically zero. In order to correct the bias of the ML estimator, Warm proposed the WLE method to adjust the first derivative of the log likelihood [
10]. The third method was the modal a posteriori (MAP) estimation. This method involves estimating the value of
θ that maximizes the likelihood of observing the response pattern given the prior distribution. Iterative procedures such as the Newton-Raphson are commonly used to locate the maximum of the posterior. The fourth method was expected a posterior (EAP) estimation. The EAP method involves finding the expected value of the posterior by using quadrature weight corresponding to the prior distribution. If the normal distribution is used, then the weights equal the area under the normal distribution contained between the quadrature points [
11].
This CAT study evaluated 6 item selection methods. The first was the maximum Fisher information (MFI) method, which selects the item that provides the MFI at
θ^ [
12]. Fisher information provides the amount of measurement precision at a given
θ^. The item that provides maximum information at the current
θ^ best measures the current ability during CAT administration. The second method is maximum likelihood weighted information [
5], which weights Fisher information by the likelihood function to take into account uncertainty about
θ^. The third method was maximum posterior weighted information (MPWI) [
6], which finds the maximum information by weighting the information function by the posterior distribution. Therefore, the MPWI method selects the next item that provides the most posterior-weighted information in CAT. The fourth method was maximum expected information (MEI), which examines the observed information at each of the predicted
θ^ in terms of whether a correct or incorrect response was assigned. The MEI method selects the next item that provides the MEI in CAT. The fifth method was minimum expected posterior variance (MEPV), which selects the item that minimizes the posterior variance when each item is administered [
13]. After the average of the posterior variance of the given responses is calculated for the remaining items, the MEPV method selects the next item with the smallest average posterior variance. The sixth method was K-L information, which provides global information as a candidate take an item [
14]. K-L method selects the next item that provides greater discrimination between current
θ and
θ^ as an item is administered.
The CAT was terminated at a cut score (-1.96) with a variablelength set of items selected from a pool of 360 KMLE items. CAT was continued until the candidate’s cognitive ability was deemed significantly above or below the passing cut score (95% confidence interval), which was based on the 2014 standard setting of the KMLE [
15], or the candidate completed the maximum number of items (50).
The DETECT value was used to examine the extent of the multidimensional simple structure of the KMLE [
16]. An exploratory and confirmatory DETECT analysis can be conducted using the ‘sirt’ package in the R program [
8]. The confirmatory DETECT value was less than 0.1 when the 8 content areas were assumed to be 8 dimensions in the KMLE. As a result, content-balancing in CAT could consider the KMLE to have 8 dimensions.
Statistics
In order to assess how well the true θ is recovered by CAT, several statistics have been proposed in the CAT literature. A statistic commonly used in the CAT literature is bias, which is defined as:
where N is the number of examinees in the study (i= each individual).
Bias is averaged across examinees in a simulation study by computing the mean of bias values across those examinees.
The RMSE is computed by taking the square of bias and then taking the square root of the result, and has the advantage of being in the same scale as θ. It is defined as:
The correlation statistic was provided to evaluate the recovery of the true θ by CAT. Finally, the efficiency of CAT was evaluated by averaging the number of items administered in CAT under each condition.
A sample of the R code is shown below
R code [Rasch model, EAP scoring, MFI item selection case]
require(ltm)
require(irtoys)
require(catR)
setwd(“H:\CAT_simulation_2018\Analysis”)
responses < - read.table(“data2017.txt”, header= F)
items <- read.table(“medical_items_2017.csv”, header=T, sep=”,”)
res < - as.matrix(responses[,-
1])
p.rasch < - est(res, model= ”1PL”,rasch= TRUE, engine= ”ltm”)
B.Rasch < - p.rasch$est
rasch< -cbind(B.Rasch,items[,
2])
theta.eap.est < - eap(res, B.Rasch, qu= normal.qu())
EMT< -rasch
theta.gen< -theta.eap.est[,
1]
Item_Para< -data.frame(cbind(rasch[,1:3],1,rasch[,
4]))
colnames(Item_Para)< -list(“a”,”b”,”c”,”d”,”group”)
Item_Para$control[Item_Para$group= = ”1”] < - “A”
Item_Para$control[Item_Para$group= = ”2”] < - “B”
Item_Para$control[Item_Para$group= = ”3”] < - “C”
Item_Para$control[Item_Para$group= = ”4”] < - “D”
Item_Para$control[Item_Para$group= = ”5”] < - “E”
Item_Para$control[Item_Para$group= = ”6”] < - “F”
Item_Para$control[Item_Para$group= = ”7”] < - “G”
Item_Para$control[Item_Para$group= = ”8”] < - “H”
########## CAT Constraint ###########################
start1= list(seed= NA,nrItems= 5,theta=0,startSelect= ”MFI”)
test1< -list(method= ”ML”,itemSelect= ”MFI”)
final2< -list(method= ”EAP”)
stop1< -list(rule= c(“classification”,”length”), thr= c(-1.96,50), alpha= 0.001)
cbList< - list(names= c(“A”, “B”, “C”, “D”, “E”, “F”,”G”,”H”),
props= c(0.125, 0.125, 0.125, 0.069, 0.42,0.056,0.56,0.017))
res1<- simulateRespondents(theta.gen,Params,responsesMatrix=res, start= start1,test= test1, stop= stop1,
final= final2, cbControl= cbList, save.output= TRUE,
output= c(“H:/CAT_simulation_2018/Analysis/”,”out”,”csv”) )
cbind(res1$bias,res1$RMSE, res1$correlation,res1$testLength)