Journal of Korean Medical Science | 2018-08-08
A Novel Fundus Image Reading Tool for Efficient Generation of a Multi-dimensional categorical Image Database for Machine Learning Algorithm Training
AbstractBackground We described a novel multi-step retinal fundus image reading system for providing high-quality large data for machine learning algorithms, and assessed the grader variability in the large-scale dataset generated with this system. Methods A 5-step retinal fundus image reading tool was developed that rates image quality, presence of abnormality, findings with location information, diagnoses, and clinical significance. Each image was evaluated by 3 different graders. Agreements among graders for each decision were evaluated. Results The 234,242 readings of 79,458 images were collected from 55 licensed ophthalmologists during 6 months. The 34,364 images were graded as abnormal by at-least one rater. Of these, all three raters agreed in 46.6% in abnormality, while 69.9% of the images were rated as abnormal by two or more raters. Agreement rate of at-least two raters on a certain finding was 26.7%–65.2%, and complete agreement rate of all-three raters was 5.7%–43.3%. As for diagnoses, agreement of at-least two raters was 35.6%–65.6%, and complete agreement rate was 11.0%–40.0%. Agreement of findings and diagnoses were higher when restricted to images with prior complete agreement on abnormality. Retinal/glaucoma specialists showed higher agreements on findings and diagnoses of their corresponding subspecialties. Conclusion This novel reading tool for retinal fundus images generated a large-scale dataset with high level of information, which can be utilized in future development of machine learning-based algorithms for automated identification of abnormal conditions and clinical decision supporting system. These results emphasize the importance of addressing grader variability in algorithm developments.
Journal of American Heart Association | 2018-06-26
An Algorithm Based on Deep Learning for Predicting In‐Hospital Cardiac Arrest
AbstractIn‐hospital cardiac arrest is a major burden to public health, which affects patient safety. Although traditional track‐and‐trigger systems are used to predict cardiac arrest early, they have limitations, with low sensitivity and high false‐alarm rates. We propose a deep learning–based early warning system that shows higher performance than the existing track‐and‐trigger systems. This retrospective cohort study reviewed patients who were admitted to 2 hospitals from June 2010 to July 2017. A total of 52 131 patients were included. Specifically, a recurrent neural network was trained using data from June 2010 to January 2017. The result was tested using the data from February to July 2017. The primary outcome was cardiac arrest, and the secondary outcome was death without attempted resuscitation. As comparative measures, we used the area under the receiver operating characteristic curve (AUROC), the area under the precision–recall curve (AUPRC), and the net reclassification index. Furthermore, we evaluated sensitivity while varying the number of alarms. The deep learning–based early warning system (AUROC: 0.850; AUPRC: 0.044) significantly outperformed a modified early warning score (AUROC: 0.603; AUPRC: 0.003), a random forest algorithm (AUROC: 0.780; AUPRC: 0.014), and logistic regression (AUROC: 0.613; AUPRC: 0.007). Furthermore, the deep learning–based early warning system reduced the number of alarms by 82.2%, 13.5%, and 42.1% compared with the modified early warning system, random forest, and logistic regression, respectively, at the same sensitivity. An algorithm based on deep learning had high sensitivity and a low false‐alarm rate for detection of patients with cardiac arrest in the multicenter study.
Journal of Digital Imaging | 2018-06-12
Laterality Classification of Fundus Images using Interpretable Deep Neural Networks
AbstractIn this paper, we aimed to understand and analyze the outputs of a convolutional neural network model that classifies the laterality of fundus images. Our model not only automatizes the classification process, which results in reducing the labors of clinicians, but also highlights the key regions in the image and evaluates the uncertainty for the decision with proper analytic tools. Our model was trained and tested with 25,911 fundus images (43.4% of macula-centered images and 28.3% each of superior and nasal retinal fundus images). Also, activation maps were generated to mark important regions in the image for the classification. Then, uncertainties were quantified to support explanations as to why certain images were incorrectly classified under the proposed model. Our model achieved a mean training accuracy of 99%, which is comparable to the performance of clinicians. Strong activations were detected at the location of optic disc and retinal blood vessels around the disc, which matches to the regions that clinicians attend when deciding the laterality. Uncertainty analysis discovered that misclassified images tend to accompany with high prediction uncertainties and are likely ungradable. We believe that visualization of informative regions and the estimation of uncertainty, along with presentation of the prediction result, would enhance the interpretability of neural network models in a way that clinicians can be benefitted from using the automatic classification system.
Journal of Digital Imaging
Comparison of Shallow and Deep Learning Methods on Classifying the Regional Pattern of Diffuse Lung Disease
AbstractThis study aimed to compare shallow and deep learning of classifying the patterns of interstitial lung diseases (ILDs). Using high-resolution computed tomography images, two experienced radiologists marked 1200 regions of interest (ROIs), in which 600 ROIs were each acquired using a GE or Siemens scanner and each group of 600 ROIs consisted of 100 ROIs for subregions that included normal and five regional pulmonary disease patterns (ground-glass opacity, consolidation, reticular opacity, emphysema, and honeycombing). We employed the convolution neural network (CNN) with six learnable layers that consisted of four convolution layers and two fully connected layers. The classification results were compared with the results classified by a shallow learning of a support vector machine (SVM). The CNN classifier showed significantly better performance for accuracy compared with that of the SVM classifier by 6–9%. As the convolution layer increases, the classification accuracy of the CNN showed better performance from 81.27 to 95.12%. Especially in the cases showing pathological ambiguity such as between normal and emphysema cases or between honeycombing and reticular opacity cases, the increment of the convolution layer greatly drops the misclassification rate between each case. Conclusively, the CNN classifier showed significantly greater accuracy than the SVM classifier, and the results implied structural characteristics that are inherent to the specific ILD patterns.
Hanyang Medical Reviews