## Introduction

## Overview of computerized adaptive testing

### Brief history of computerized adaptive testing

*θ*), it shows how well a test measures candidates at each value of

*θ*. Therefore, the TIF is an index of local precision at the test level and is useful for ensuring desirable exam objectivity and for developing a test instrument that satisfies a target information function. In order to provide each candidate with a precise

*θ*estimate, the target information function should be high and constant across

*θ*. However, a classical test with a fixed set of items will have low but equal precision across all

*θ*[3]. Consequently, for a classical test to have high and equal precision across all examinees, it would require a very large number of items. In contrast to classical tests, CAT can yield a high and equal degree of precision for all candidates, and requires fewer items than a classical test to reach a high level of precision [3].

### Item response theory

*P*(

*u*= 1 |

_{ij}*a*,

_{i}*b*,

_{i}*c*,

_{i}*θ*) is the probability of getting an item

_{j}*i*correct given person parameter and item parameters

*θ*is a latent trait parameter (

_{j}*a*,

_{j}*b*, ability

_{j}*c*).

_{j}*θ*of a person

_{j}*j*,

*b*is the item difficulty parameter for an item

_{i}*i*,

*a*is the item discrimination parameter for an item

_{i}*i*,

*c*is the guessing parameter for an item

_{i}*i*, and

*z*is the standard normal deviate (

*a*(

_{j}*θ*-

_{j}*b*)). The 2-parameter normal ogive model is a special case of the 3-parameter model, with the

_{i}*c*parameter removed:

_{i}*a*parameter at a single value:

_{i}*P*(

*θ*),

*a*,

_{i}*b*, and

_{i}*θ*have essentially the same interpretations as in the normal ogive model. The discrepancy in the values of

_{j}*P*(θ) between the normal ogive models and the logistic models is less than 0.01 for all values of

*θ*[16].

*c*parameter, referred to as the guessing parameter, represents the probability of answering an item correctly regardless of an examinee’s level of

*θ*. Thus, an examinee at a very low level of

*θ*will have a

*c*value as the probability of answering the item

*i*correctly. Examinees at a low level of

*θ*are affected by the

*c*parameter because given difficult items they would randomly guess the correct answer more often than those at a higher level of

*θ*. The parameter

*b*is usually considered an index of item difficulty. It represents the point on the

*θ*scale at which an examinee has a 50% chance of answering the item

*i*correctly when

*c*is equal to zero [16]. Although the

*b*parameter theoretically ranges from −∞ to ∞,

*b*values between −2.0 and 2.0 include more than 95% of all cases in the standard normal distribution. Items with values of

*b*near −2.0 are very easy items, and those with

*b*values near 2.0 are very difficult items. The item discrimination parameter

*a*is the slope of

*P*(θ) at the point of

*θ*=

*b*. Although the range of

*a*is theoretically from −∞ to ∞, negatively discriminating items are ignored for operational purposes. Thus, the usual

*a*value ranges from zero to ∞, with a practical upper limit of about 3.0. A high value of a indicates a steep IRF, whereas a low value indicates a flat IRF.

*c*parameter is zero. The 1-parameter logistic model is, in turn, a special case of the 2PLM where all items have the unit value of

*a*and

*c*has a value of zero. The Rasch model is the simplest form of the unidimensional IRT model, as discrimination parameters equally anchor 1 as discrimination parameters are equally fixed across all items with the value of 1 across all items [17]. These IRT models have been applied in CAT for several decades.

### Unidimensional computerized adaptive testing

#### Item bank

#### Starting item

#### Item selection rule

#### Scoring procedure

#### Termination criterion

### Computerized adaptive testing management

### Content balancing

### Item analyses

#### Preliminary item analysis

#### Pretest-item calibration

#### Test/item monitoring

### Standard setting

### Practice analysis

### Item bank updates

*a*-parameter estimates for the slope of the linear transformation, and the mean of the

*b*-parameter estimates for the intercept of the linear transformation [42]. The mean/sigma method uses the means and standard deviations of the

*a*- and

*b*-parameter estimates from the common items [43]. The ICC method finds the linking coefficient by using the sum of the squared differences between the ICCs for each item given a particular ability [44]. The test characteristic curve method uses the squared difference between the test characteristic curves for a given ability [45]. Before new licensing/certification exam items are administered as operational test items, they must undergo a process in which the item is administered as a pilot item for the purpose of collecting item information. A pretest pool is constructed and published with each version of the exam. The number of items in the pretest pool is determined by test development experts and psychometricians. The items for the pretest pools will be selected from the group of items that have been approved for pretest use. In CAT administration, pretest items are selected at random from the pool of pretest items. Each pretest item must be administered to a sufficiently representative sample in order to collect information on the performance of the item. Pretest items are incorporated into the operational test so that candidates cannot recognize the difference between operational and pretest items. In order to reduce the effects of warm-up or fatigue, pretest items are not administered at the beginning or the end of CAT. Pretest items are administered after several operational items are assigned. Finally, pretest items are selected as operational items in a CAT item bank through pretest item calibration [34].