INTRODUCTION
The need for critical thinking in the field of nursing has recently been emphasized, resulting in the proliferation of pertinent studies [
1,
2]. The Korea Institute of Curriculum and Evaluation defines the concept of critical thinking as thinking intended to grasp the logical structure and meaning of texts in order to make best judgments concerning concepts, criteria, contexts, and methods so as to decide whether to accept certain opinions or whether to conduct certain acts [
3]. However, the current measurements used to evaluate general critical thinking skills or disposition levels do not adequately assess these skills in the context of the problems faced in clinical practice. Furthermore, critical thinking skills are dependent on the specific conditions and context of the field or time period. Existing studies are limited in that they examine general critical thinking skills using instruments that fail to account for the context of clinical conditions. Although critical thinking as a concept is a key objective within nursing education and practice, few standardized instruments have been developed to measure critical thinking levels specifically for the field of nursing. Thus, there is a need to look beyond a purely theoretical understanding of critical thinking and to examine the application of critical thinking processes in a more appropriate context. Simply put, an instrument needs to be developed that can measure critical thinking skills while accounting for specific geographical, cultural, and clinical contexts.
As a result of this need to develop a more refined instrument, Shin et al. [
4] developed a 30-item clinical critical thinking skills (CCTS) test and subsequently assessed the item difficulty, discriminant validity, internal reliability, content validity, and criterion-related validity of the instrument. However, the internal reliability was found to be a little low (Cronbach’s α=0.55), possibly due to respondent fatigue as a result of the time required to respond to all 30 items (approximately 50 minutes). If true, the reliability of this tool might be enhanced through item response alternative analysis. Therefore, this study aimed to reevaluate the CCTS with the aim of creating a revised measure with fewer items and then to assess the reliability and validity of this revised instrument.
RESULTS
Items 1, 8, 9, and 28 showed low difficulty parameters not higher than -2.0. Twelve items showed appropriate or high levels of discrimination (discrimination parameter not lower than 0.2) [
7]. The 16 items with low levels of discrimination were reviewed for deletion or revision. The discrimination parameters and item content were considered together as a group, and items 2, 16, 17, 22, 23, 24, 27, and 29 were excluded from the final version. The other two items 1 and 9 (with difficulty parameters not higher than -3.0), which were also reviewed for deletion or revision as both items showed a correct answer percentage that exceeded 90%. As a result, item content and measured constructs were analyzed, and the relationships of these two items with other items were reviewed. Following this assessment, item 9 was excluded from the test. Although item 1 was identified as too easy, its content addressed issues regarding aging and the health of the elderly, which are highly utilizable in clinical situations. Likewise, item 6 was judged to be an important item for measuring the abilities of interpretation and analysis using contextual circumstances in clinical situations, and so these items were retained. Meanwhile, items 20 and 21, both initially included in the test instrument when it was developed in 2012, were judged to be items based on nursing knowledge and thus were excluded from the revised instrument. The results of calculations of the levels of difficulty and discrimination of the 28 items are shown in
Table 1.
This instrument evaluated subjects ranging from those with low critical thinking ability to those with high critical thinking ability, and showed the maximum test information at points where subjects’ ability parameters equaled -1.0. However, this instrument did not provide sufficient information for subjects with a critical thinking ability of 1.0 or higher. The test information function of the CCTS is shown in
Fig. 1.
Nine items were excluded through IRT analysis. The correlations between items and total score for the 19 items included in the test instrument are shown in
Table 2. Of these 19 items, 18 (item 1 excluded) showed a correlation with total score that exceeded 0.3, and all these correlations were significant at P<0.001. In the case of item 1, the correlation with total score was calculated to be low compared to other items due to its high appropriate level of discrimination and was deemed necessary to include as a result of the content analysis. Cronbach’s α indicated that the reliability of the test instrument was 0.622, and the test reliability when items were removed showed a range of 0.572 to 0.623. The same 22 subjects were requested to respond to the test instrument after an interval of two weeks, and the correlations between the scores at the two time points were measured. The results showed significant correlations: r=0.662 (P=0.001). The degree of agreement between item developer intention and expert judgments were calculated as a percentage for the 19 items (
Table 2). Items showed agreement levels of 50% or higher. Item 7 was first developed as an analysis item, but five experts judged it an inference item and so it was eventually classified as such.
Data on the processes of thinking through which item judgments were made were collected through interviews. Most items scored at least 1.5 points, and the item scores were generally considered healthy with a total average of 1.75 points. This indicates that subjects successfully described responses as intended by the test developer. In addition, when asked the question “Was there any item you could not answer because you had no knowledge or preceding learning?” all students answered, “There was no such item.” The instrument was thus verified as an instrument that measured thinking processes, not knowledge.
Confirmatory factor analyses were conducted in order to validate a model of the test instrument for measuring four factors: ‘analysis,’ ‘understanding,’ ‘inference,’ and ‘evaluation.’ Individual factors and the items for measuring the relevant factors are shown in
Table 3. The goodness of fit of the confirmatory factor analyses for both the 19 items and four factors had excellent fit indices: chi-square, 77.763 (df=69, P=0.219); comparative fit index, 0.949; normed fit index, 0.954; and root mean square error of approximation, 0.021 since values exceeds the followings thresholds: chi-square, P>0.05; comparative fit index and normed fit index equal to or greater than 0.9; and root mean square error of approximation equal to or less than 0.06.
DISCUSSION
This study revised the existing 30-item CCTS instrument for clinical critical thinking ability into a 19-item measure and reported the process of instrument validation. This instrument is the first to measure critical thinking ability in the area of nursing in Korea. Unlike psychological measurements, grounds for the validity of cognitive response processes for the test instrument were set, and a new approach to expert content validity was attempted. First, the results of validation of the response processes were different from the reported levels described during interviews with the subjects. Therefore, more exploration into both difficulty and discrimination levels is considered necessary.
This study showed maximum test information at points where subjects’ ability parameters were -1.0. However, the results did not provide sufficient information for subjects with critical thinking abilities exceeding 1.0, and so the instrument reported in this study is limited to use with subjects with excellent critical thinking ability scores. However, since the instrument has the advantage of identifying those critical thinking abilities necessary for medical personnel, this may be strength when used with this demographic.
Although items with positive correlation coefficients may be interpreted as measuring the same constructs as the test is intended to measure [
8], this is generally considered only applicable to items with correlation coefficients exceeding 0.30. The correlations between item scores and total test score (with the exception of item 1) satisfy both criteria. This means that clinical critical thinking ability may be measured through individual items. In this study, after the number of items was reduced to 19, primarily through the selection of items with high levels of discrimination and a reorganization of the items, the reliability of the instrument was improved to 0.622. In addition, since respondent fatigue presumably decrease resulting in improved concentration following the reduction in the number of items [
9], test-retest reliability showed high, statistically significant correlations (r=0.662).
Whereas existing methods of verifying content validity provide information on the constructs to which items belong and evaluate the suitability of items for those constructs and content, in this study the rates of agreement between item developer intention and expert judgments were developed by having experts evaluate the content of each item for constructs. However, because of the nature of critical thinking, the subareas of interpretation (analysis, inference, and evaluation) do not act independently, but interact in order to more accurately judge given situations and to generate solutions to problems. Therefore, it is difficult to develop items that independently measure the different subareas of critical thinking skills. In this study, when the degree of agreement between the constructs to which items belonged and the constructs evaluated by experts were evaluated, most items showed agreement rates in excess of 50%. These are within acceptable parameters [
10].
This research is the first among nursing studies to present evidence for response process validity. In particular, since this test is a cognitive evaluation instrument, how items are interpreted or accepted by test subjects is important [
5]. Subjects’ critical thinking processes were evaluated through selective items, and response processes were analyzed in order to assess whether such selective type items were well designed. Critical thinking processes are composite processes and are evaluated through multiple-choice measures and open-ended tests. Since constructed response items generally induce complicated thinking processes, while multiple-choice items typically induce low-level cognitive processes, constructed response items are able to measure cognitive processes more directly [
7]. However, well-made multiple-choice measures can be useful in evaluating critical thinking ability [
7,
11], because judgment ability can be measured by presenting situations using item stories through selective type items and having subjects select the best response among the response alternatives presented for the specific situation [
7]. Therefore, in this study, response processes were evaluated in order to determine whether the revised instrument was suitable for measuring critical thinking skills. This was determined by assessing whether subjects underwent the processes of finding responses to relevant items using the critical thinking skills intended by the developer. The high average comparison score of 1.75 supported the notion that the items were suitable for measuring critical thinking skills. These results are similar to the degree of responses for students with high levels of achievement reported in a previous study [
5], in which response data regarding response processes were analyzed using a similar method. This is significant in lending further support to the validity of the revised test instrument for evaluating critical thinking ability.
Finally, confirmatory factor analyses were conducted on the validity of constructs, comparative fit index, normed fit index, and root mean square error of approximation exceeded thresholds, indicating that the collected data supported the factor model of the test. The four factors were named ‘finding the evidence and cause and evaluating,’ ‘interpreting and inferring the meanings,’ ‘inferring and evaluating the relationship,’ and ‘finding the best solution through inference and evaluation.’ These are different from the original theoretical concept subareas (interpretation, analysis, inference, and evaluation). When considering that the reliability of individual subscales of the most widely used instruments for measuring critical thinking ability are unstable at 0.21 through to 0.51, and 0.17 through to 0.74, respectively [
11], construct validity may be deemed to be weak. It appears that subcategories such as interpretation, analysis, inference, and evaluation are applied mutually complementarily rather than being applied independently.
In conclusion, using IRT, the revised 19-item version of the CCTS showed relatively low levels of item difficulty and appropriate or high levels of discrimination. This revised CCTS has the advantage of enabling more convenient measurement of critical thinking skills than the 30-item CCTS [
5] due to its improved reliability and validity. The levels of difficulty and discrimination of the revised CCTS-19 should be verified through retest and analysis so that it can be used to assess clinical critical thinking skills.