Validation of Task-Specific Rating Scale for Open Balloon Catheter Arterial Embolectomy: An Assessor-Blinded Quasi-Experimental Pilot Study

Objective: To develop and validate a task-specific rating scale (TSRS) by comparing with the Global Rating Scale (GRS) for the evaluation of brachial artery embolectomy (BAE). Methods: Participants were divided into expert and novice groups who were oriented on the locally developed simulator model. The following day, an embolectomy procedure was performed independently by the participants and graded by two independent assessors using the GRS and TSRS. Validity was evaluated using Pearson’s correlation coefficient (r), reliability by the interclass correlation coefficient (ICC), and agreement by Bland–Altman plots. A p-value <0.05 was considered significant. Results: Thirty-two participants were enrolled in this study. The overall TSRS was found to be a valid assessment tool (r=0.82; 95% confidence interval [CI]: 0.66, 0.91; p<0.001). Domain-specific analyses showed a moderate positive association between all domains (p<0.05), except for instrument handling (r=0.09; 95%CI: −0.27, 0.42; p=0.642). The ICC for overall scores showed excellent reliability for both instruments, GRS and TSRS, with values of 0.97 and 0.92, respectively. Conclusion: The TSRS was found to be a valid and reliable assessment tool for BAE; however, for some domains, such as instrument handling and time and motion, it has limited reliability.


Introduction
The technical ability of a clinician is one of the most important components of surgical competency. There is an unquestionable need to standardize the evaluation process of residents and fellows during training to ensure robust measurements of efficiency and competence. [1][2][3] It has been found that the subjective assessment of technical skills through direct task observation in the operating room, without a structured objective scale, has poor interobserver reliability. 1) Various surgical evaluation instruments have been developed to achieve this goal. Ample literature now exists recommending the use of the Global Rating Scale (GRS), which could be used to assess a trainee s surgical skills on agreed-upon criteria, to provide an objective and reproducible assessment. 1,3,4) However, the GRS is generic and lacks the ability to assess the specific aspects of diverse surgical procedures performed in various subspecialties, 2) particularly vascular procedures. Although many procedure-specific checklists and the Objective Structured Assessment of Technical Skills (OSATS) have been developed to evaluate vascular surgery procedures, most of them are either in the process of validation or not yet validated. Combining the GRS and procedure-specific checklists by including additional safety and efficacy items has been found to be a more effective and reliable assessment for surgical dexterity. 5) Vascular surgery is a technically demanding specialty where the scrupulous assessment of technical skills is very important. Brachial artery embolectomy (BAE) is an emergency procedure, and patients often present in the middle of the night to the emergency room, providing few opportunities to train and assess residents and fellows involved in care. Although the incidence of upper limb amputation after BAE is low, 6) it is associated with significant emotional, social, and financial consequences. 7) Therefore, op-timal assessment and credentialing are of vital importance to enable residents, fellows, and junior faculty to perform embolectomies safely and independently.
The integration of a validated procedure-specific evaluation checklist can help in the more uniform and reproducible assessment of technical and cognitive skills of trainees. The Intercollegiate Surgical Curriculum Program (ISCP) has developed some procedure-based assessment checklists, including embolectomy, 8) but these are not validated. Considering the lack of availability of a validated checklist for BAE, we aimed to develop and validate a procedure-specific assessment checklist for balloon catheter BAE for surgical trainees. The primary objective of this study was to develop and validate a task-specific rating scale (TSRS) by comparing the TSRS with the GRS to evaluate the procedural steps of BAE. The secondary objective was to estimate criterion cut-off points of the TSRS against overall GRS binary scores to declare trainees as successful candidates.

Materials and Methods
An assessor-blinded quasi-experimental study conducted at the Section of Vascular Surgery, Department of Surgery, Aga Khan University Hospital, Karachi. Ethical approval was obtained from the institutional ethical review committee before initiation of the study.

Development and finalization of assessment tools
Two assessment tools were concurrently used to assess the technical competency of trainees in performing BAE.
1. A TSRS was developed using the Delphi technique by a group comprising three senior vascular surgeons (content experts IN, ZS, ZR), one research specialist (SH), and two medical educationists (QR, AS). The Textbook of Vascular Surgery 9) was also used. During focus group discussions, the group used the procedure-based assessment checklist developed by the ISCP to develop the TSRS. The group also used GRS domains as a reference (Appendix 1) and added items to make a comprehensive assessment of the trainee, ensuring the correct steps of the procedure in the correct sequence in the new TSRS checklist. The seven GRS domains were further expanded in an itemized fashion, ensuring that all key steps of BAE have been captured; thus, a 26-item TSRS checklist was generated for this procedure (Appendix 2). Like the GRS, completion of the task and efficiency of each item were assessed on a 5-point Likert scale, making a total of (26×5) 130 scores (Appendix 1).
2. The GRS is a seven-item validated tool based on the OSATS to evaluate the performance of surgical skills used concurrently with the TSRS checklist to assess the technical skills of the study participants, 3,10) with a total of (7×5) 35 scores (Appendix 2).

Simulator
A locally developed simulator with a rubber arm (lab model; made in the USA), plastic tubes, 6-mm polytetrafluoroethylene grafts that are 8 cm long, artificial blood (normal saline with red color), and clots (made of cellulose) were used to teach and assess the arterial embolectomy procedure (Fig. 1). This cost-effective simulator was tested before the workshop to ensure its smooth and efficient functioning. The approximate cost for the development of this simulator was about USD 50. The model was assembled and reassembled with new clotting material and artificial blood was filled and refilled before the beginning of each procedure.

Assessor training
A week before the planned date of the workshop, a 1-day orientation session for the assessors was conducted on both GRS and TSRS tools by the primary investigators. The assessors (ZS, AB) had at least 10 years of experience of independent practice in vascular surgery. The assessors were educated about the use of GRS and TSRS as evaluation tools. Orientation of the video recording and the role of surgical assistants were ensured and emphasized during these sessions. All assessors were requested to rate at least four pre-recorded procedural videos to assess inter-rater reliability, which fell within an excellent range (interclass correlation coefficient [ICC] >85%).

Participants
Senior residents from general surgery (year three onwards) and fellows of vascular surgery were invited to participate in the workshop (novice group). Vascular surgeons certified by the national/international board and who had at least 3 years of experience performing independent arterial embolectomy (expert group) were also invited.

Assessment technique
Informed consent was obtained from each participant to enroll in the study. On day 1, an orientation video of the simulation model and steps of arterial embolectomy were presented to all participants via an interactive session. Participants were then allowed to practice the procedure on the simulator. During these drills, formative feedback was provided to the participants by the assigned group facilitators. During this session, each participant from both groups was given adequate time (1 h) to practice and become familiar with the simulator. On day 2, participants were asked to perform an embolectomy independently on the same model with the help of an assistant, who was a trained vascular surgery technician, who was instructed to help only when asked by participant. Twenty minutes were assigned to each participant to accomplish the given task. Blinded video recording was performed for each trainee while performing the procedure. The performance of trainees was graded at a later stage by two independent assessors concomitantly using the GRS and TSRS on the pre-recorded videos. Assessors were kept blinded by assigning unique identity codes on participants gloves so that they were easily visible to the assessors without recognizing them.

Results
A total of 32 participants, 22 novices and 10 experts, successfully completed the workshop. The novice group consisted of three vascular surgery fellows and 19 senior general surgery residents. The maximum scores achieved on the GRS and TSRS were 23.92± 6.02 and 116.27± 6.34, respectively. Pearson s correlation coefficients (r) and The TSRS cut point estimated on the ROC was 118, which corresponds to 65% of the overall GRS scores that can be used to discriminate participants who performed well from those who need further improvement (Fig. 2). Ninety percent of experts (n= 9/10) obtained the desired 65% scores compared to the novices, only 50% of whom were declared successful (p= 0.050).
Both the GRS and mean TSRS scores could discriminate the expert group from the novice group in overall scores (GRS: 27.8± 3.13 and 22.16± 6.24; p= 0.011; and TSRS: 32.45± 1.08 and 30.34± 1.73; p= 0.001). Instrument handling and the use of an assistant could not be differentiated by either the GRS or the TSRS. Like GRS, the TSRS also could not detect the difference between the groups in the respect for tissue and time and motion domains ( Table 3). Experience in vascular surgery (months) and the number of embolectomies performed by experts were significantly higher than those in the novice group (95.4± 58.0 vs. 3.64± 3.89; and 97.0± 47.8 vs. 2.64± 47.8, respectively; p<0.001 for both groups).

Discussion
This pilot study aimed to develop and validate a taskspecific checklist to evaluate BAE procedures against GRS scores. To the best of our knowledge, this is the first study that has validated the TSRS for BAE for trainees. The overall TSRS was found to be a valid and reliable tool to evaluate the procedural steps of BAE. However, there was low reliability for some domains, such as time and motion and instrument handling. This warrants further modification of this pilot-tested TSRS checklist to improve its validity and reliability.
The GRS has been widely used to assess different surgical procedures, and despite being a validated tool, it has some inherent limitations of being generic and lacks specificity for various procedures. 2) This is one of the reasons that raised the need to develop procedure-specific evaluation checklists. There is a debate about the superiority * The average of the two raters was analyzed.

Fig. 2 Receiver operating characteristic (ROC) curve analysis
showing the task-specific rating score cut points against 65% of the Global Rating Scale scores.

Validation of Checklist for Brachial Embolectomy
of GRS over procedure-specific validated checklists. Ilgen et al. 12) collated the results of 45 studies in a meta-analysis and reported that the GRS is better able to differentiate expert performance and is more reliable than dichotomous checklists. However, the experts consensus is on using a combination of the GRS and validated procedure-specific checklists as the gold standard for procedure-specific assessments. 13,14) The GRS overestimates respect for tissue and quality of products, whereas the TSRS does so for all other domains. We assumed a difference of <15% as an acceptable range; overall and all task-specific domains estimates fell within this range, except for knotting and suturing that exceeded this range. However, further modification is needed to make this difference even smaller for a robust assessment.
Our study demonstrated that both the GRS and TSRS can differentiate between expert and novice group performances. The differences in overall mean scores in both the groups were significantly different on both the checklists. This is intuitive as a higher mean score by the expert group could be explained by candidates having more experience in the field. Despite a significant difference in overall mean scores in both groups, the GRS and TSRS cannot differentiate instrument handling and use of assistant domains, whereas the TSRS cannot discriminate group differences in respect for tissue. This point may generate a hypothesis that this might be due to the inherent inability of these checklists to adequately assess these constructs. The TSRS, on the other hand, could not adequately differentiate the additional two domains of respect for tissue and time and motion. This can be further explained by the fact that the expert group either generally lacks expertise in these two domains or that this checklist needs further modification for a more robust evaluation in these domains.
This study demonstrated an acceptable validity and reliability of the TSRS; nonetheless, the question regarding its robust utility remains unanswered. Issues such as who will use this checklist under what circumstances and for what purposes raise questions about the utilization of this checklist. Similar issues with procedure-specific checklists have also been reported. 13) For intermediate-stake assessments, such as end-of-course tests, it is recommended that reliability coefficients should range between 0.80 and 0.89. 15) Our overall reliability coefficient for the TSRS fell in these ranges (r= 0.85), thus validating the role of this checklist for its use could be restricted in intermediatestake assessments. We propose that this TSRS should be used in workplace-based assessments, such as direct observations of procedural skills for general surgery and vascular surgery trainees. This checklist can also be used to evaluate an individual s performance of the procedural steps in workshops or courses. High-stake examinations, such as national or international specialty exit examinations, should avoid using this rating scale. Besides the utility of the TSRS in assessment, this checklist can be a valuable tool and reference in the education and training of surgical trainees in simulation labs, wet labs, or operating rooms. Although the impact of utilizing this checklist, ultimately leading to improved outcomes in patients undergoing BAE, seems promising, no definite conclusion can be drawn at this moment because of the lack of available evidence.
Given the paucity of literature on such checklists and this being the first ever attempt to validate a task-specific checklist for BAE, we could not compare our results with other published evidence. Our results should be interpreted with caution because of the following limitations. There was only a small number of participants, especially in the expert group, and we could not assess stratified cut points. This is because of the evolving nature of vascular surgery as a speciality and because very few trained vascular surgery faculty were available for this study. 16) The findings of this study also showed the inability of the TSRS to adequately assess and differentiate some domains, such as instrument handling, use of assistant, and respect for tissue. Studies have shown that the use of nonbinary checklists or the Likert scale for assessments and reliability analyses using the ICC has inherent flaws, 17,18) and this is applicable to our study as well. These limitations caution the use of this checklist in high-stake assessments. Further modification of this existing checklist to incorporate items in the abovementioned domains is certainly desirable.

Conclusion
Overall, the TSRS was found to be a valid and reliable assessment tool for BAE; however, for some domains, such as instrument handling and time and motion, it has limited reliability, whereas for knotting and suturing, it showed a limited agreement between the two instruments. We could not prove the true educational impact of using the TSRS in acquiring expertise in embolectomy. Further work is warranted for the modification of this checklist by recruiting a larger number of participants from multiple centers.