Multicenter Validation to Evaluate the Diagnostic Performance of Deep Learning-based Lung Nodule Detection System in Chest CT
- Nov. 2020
- by Hyunho Park et. al.
To evaluate the robustness of the diagnostic performance of a deep learning-based lung nodule detection system (DLS) on the reference standard established from multiple institutions
METHOD AND MATERIALS
We retrospectively collected 633 cases from 3 different institutions (A: 225, B: 203, C: 205). The slice thickness of CT scans and vendors of CT scanners were variable in each institutions (slice thickness; A: 0.6~1mm, B: 3mm, C: 2.5mm). After nine thoracic radiologists (three from each institution) independently read the CT scans to record the locations and long/short-axis diameters of the nodules, the reference standard was established by consensus of three radiologists of each institution. The diagnostic performance of the DLS was assessed on the reference standard..The DLS consisted of a 2.5D and 3D CNN trained with the LIDC-IDRI dataset.
The reference standard included 372 and 261 subjects, without and with nodules, respectively. In the subjects with nodules, the number of nodules was 1.54±0.83 on average. The mean nodule size was 8.66±4.33 mm in long-axis diameter and 6.42±3.06 mm in short-axis diameter.
DLS detected 4.17±3.15 nodule candidates for each cases. For the per lesion analysis, the 0.840 (95% CI: 0.801-0.875) sensitivity with 2.6 false positive per scan (FPPS), 0.902 (0.839-0.947) with 3.6 FPPS, 0.737 (0.653-0.809) with 2.3 FPPS, and 0.881 (0.815-0.931) with 2.4 FPPS was achieved in the institution A, B, and C, respectively. The difference in the per lesion sensitivity per institution was statistically significant (p<0.001). For the per patient analysis, the sensitivity, specificity, PPV, and NPV was 0.912 (0.871-0.943), 0.903 (0.869-0.931), 0.869 (0.829-0.909), and 0.936 (0.911-0.961), respectively.
The DLS showed reliable but variable performance on the detection of lung nodules of chest CT from three different institutions with various vendors and image acquisition protocols.
This study presents an exemplar of a process of thorough validation necessary for the introduction of DLS into clinical practice from the establishment of reference standards to the evaluation of diagnostic performance.