Clinical Validation of a Deep Learning-Based Hybrid (Greulich-Pyle and Modified Tanner-Whitehouse) Method for Bone Age Assessment
- Affiliations
-
- 1Department of Radiology, Korea University Anam Hospital, Seoul, Korea
- 2Department of Pediatrics, Korea University Anam Hospital, Seoul, Korea
- 3Department of Pediatrics, Myongji Hospital, Goyang, Korea
- 4Crescom, Seongnam, Korea
- 5Department of Radiology, Korea University Guro Hospital, Seoul, Korea
- 6Department of Radiology, Korea University Ansan Hospital, Ansan, Korea
Abstract
Objective
To evaluate the accuracy and clinical efficacy of a hybrid Greulich-Pyle (GP) and modified Tanner-Whitehouse (TW) artificial intelligence (AI) model for bone age assessment.
Materials and Methods
A deep learning-based model was trained on an open dataset of multiple ethnicities. A total of 102 hand radiographs (51 male and 51 female; mean age ± standard deviation = 10.95 ± 2.37 years) from a single institution were selected for external validation. Three human experts performed bone age assessments based on the GP atlas to develop a reference standard. Two study radiologists performed bone age assessments with and without AI model assistance in two separate sessions, for which the reading time was recorded. The performance of the AI software was assessed by comparing the mean absolute difference between the AI-calculated bone age and the reference standard. The reading time was compared between reading with and without AI using a paired t test. Furthermore, the reliability between the two study radiologists’ bone age assessments was assessed using intraclass correlation coefficients (ICCs), and the results were compared between reading with and without AI.
Results
The bone ages assessed by the experts and the AI model were not significantly different (11.39 ± 2.74 years and 11.35 ± 2.76 years, respectively, p = 0.31). The mean absolute difference was 0.39 years (95% confidence interval, 0.33– 0.45 years) between the automated AI assessment and the reference standard. The mean reading time of the two study radiologists was reduced from 54.29 to 35.37 seconds with AI model assistance (p < 0.001). The ICC of the two study radiologists slightly increased with AI model assistance (from 0.945 to 0.990).
Conclusion
The proposed AI model was accurate for assessing bone age. Furthermore, this model appeared to enhance the clinical efficacy by reducing the reading time and improving the inter-observer reliability.