Added Value of Deep Learning–based Detection System for Multiple Major Findings on Chest Radiographs: A Randomized Crossover Study
- Mar. 2021
- by Jinkyeong Sung et. al.
Using a deep learning–based detection system, observers, including thoracic radiologists, improved detection and localization of major abnormal findings on chest radiographs with reduced reading time, regardless of experience level.
Previous studies assessing the effects of computer-aided detection on observer performance in the reading of chest radiographs used a sequential reading design that may have biased the results because of reading order or recall bias.
To compare observer performance in detecting and localizing major abnormal findings including nodules, consolidation, interstitial opacity, pleural effusion, and pneumothorax on chest radiographs without versus with deep learning–based detection (DLD) system assistance in a randomized crossover design.
Materials and Methods
This study included retrospectively collected normal and abnormal chest radiographs between January 2016 and December 2017 (https://cris.nih.go.kr/; registration no. KCT0004147). The radiographs were randomized into two groups, and six observers, including thoracic radiologists, interpreted each radiograph without and with use of a commercially available DLD system by using a crossover design with a washout period. Jackknife alternative free-response receiver operating characteristic (JAFROC) figure of merit (FOM), area under the receiver operating characteristic curve (AUC), sensitivity, specificity, false-positive findings per image, and reading times of observers with and without the DLD system were compared by using McNemar and paired t tests.
A total of 114 normal (mean patient age ± standard deviation, 51 years ± 11; 58 men) and 114 abnormal (mean patient age, 60 years ± 15; 75 men) chest radiographs were evaluated. The radiographs were randomized to two groups: group A (n = 114) and group B (n = 114). Use of the DLD system improved the observers’ JAFROC FOM (from 0.90 to 0.95, P = .002), AUC (from 0.93 to 0.98, P = .002), per-lesion sensitivity (from 83% [822 of 990 lesions] to 89.1% [882 of 990 lesions], P = .009), per-image sensitivity (from 80% [548 of 684 radiographs] to 89% [608 of 684 radiographs], P = .009), and specificity (from 89.3% [611 of 684 radiographs] to 96.6% [661 of 684 radiographs], P = .01) and reduced the reading time (from 10–65 seconds to 6–27 seconds, P < .001). The DLD system alone outperformed the pooled observers (JAFROC FOM: 0.96 vs 0.90, respectively, P = .007; AUC: 0.98 vs 0.93, P = .003).
Observers including thoracic radiologists showed improved performance in the detection and localization of major abnormal findings on chest radiographs and reduced reading time with use of a deep learning–based detection system.
Jinkyeong Sung, Sohee Park, Sang Min Lee, Woong Bae, Beomhee Park, Eunkyung Jung, Joon Beom Seo, and Kyu-Hwan Jung