Ewha Med J.  2017 Jan;40(1):9-16. 10.12771/emj.2017.40.1.9.

Statistical Methods: Reliability Assessment and Method Comparison

Affiliations
  • 1Clinical Trial Center, Ewha Womans University Mokdong Hospital, Seoul, Korea. kkong@ewha.ac.kr

Abstract

The reliability of clinical measurements is critical to medical research and clinical practice. Newly proposed methods are assessed in terms of their reliability, which includes their repeatability, intra- and interobserver reproducibility. In general, new methods that provide repeatable and reproducible results are compared with established methods used clinically. This paper describes common statistical methods for assessing reliability and agreement between methods, including the intraclass correlation coefficient, coefficient of variation, Bland-Altman plot, limits of agreement, percent agreement, and the kappa statistic. These methods are more appropriate for estimating reliability than hypothesis testing or simple correlation methods. However, some methods of reliability, especially unscaled ones, do not clearly define the acceptable level of error in real size and unit. The Bland-Altman plot is more useful for method comparison studies as it assesses the relationship between the differences and the magnitude of paired measurements, bias (as mean difference), and degree of agreement (as limits of agreement) between two methods or conditions (e.g., observers). Caution should be used when handling heteroscedasticity of difference between two measurements, employing the means of repeated measurements by method in methods comparison studies, and comparing reliability between different studies. Additionally, independence in the measuring processes, the combined use of different forms of estimating, clear descriptions of the calculations used to produce indices, and clinical acceptability should be emphasized when assessing reliability and method comparison studies.

Keyword

Validation studies; Reliability; Reproducibility of results; Agreement; Method comparison

MeSH Terms

Bias (Epidemiology)
Methods*
Reproducibility of Results

Figure

  • Fig. 1 Intraclass correlation coefficient and Pearson's correlation coefficient as indices for intra- or interobserver reliability. ICC, intraclass correlation coefficient; correlation coefficient, Pearson's correlation coefficient.

  • Fig. 2 Graphical presentation of agreement. A case where the greater magnitude of measurements has the greater difference.

  • Fig. 3 Graphical presentation of agreement. A case where an increase in the variability of the differences is based on an increase in the magnitude of measurements.

  • Fig. 4 Measurements of pulmonary nodule size using two radiological methods (shown is a Bland-Altman plot).


Cited by  4 articles

Development and validation of prediction equations for the assessment of muscle or fat mass using anthropometric measurements, serum creatinine level, and lifestyle factors among Korean adults
Gyeongsil Lee, Jooyoung Chang, Seung-sik Hwang, Joung Sik Son, Sang Min Park
Nutr Res Pract. 2021;15(1):95-105.    doi: 10.4162/nrp.2021.15.1.95.

Repeatability and Reproducibility of Tear Meniscus Evaluations Using Two Different Spectral Domain-optical Coherence Tomography
Jin Ha Kim, Kyu Ryong Choi, Roo Min Jun, Kyung Eun Han
J Korean Ophthalmol Soc. 2019;60(10):929-934.    doi: 10.3341/jkos.2019.60.10.929.

Cross-cultural Adaptation and Validation of the eHealth Literacy Scale in Korea
Sun Ju Chang, Eunjin Yang, Hyunju Ryu, Hee Jung Kim, Ju Young Yoon
Korean J Adult Nurs. 2018;30(5):504-515.    doi: 10.7475/kjan.2018.30.5.504.

Comparison of the Utility of dnaJ and 16S rDNA Sequences for Identification of Clinical Isolates of Vibrio Species
In-Sun Choi, Dae Soo Moon, Geon Park, Seong-Ho Kang, Choon-Mee Kim, Young-Joon Ahn, Dong-Min Kim, Na Ra Yun, Dong Hoon Lim, Sung Heui Shin, Joong-Ki Kook, Young-Hyo Chang, Sook-Jin Jang
Lab Med Online. 2018;8(1):7-14.    doi: 10.3343/lmo.2018.8.1.7.


Reference

1. Korean Society for Preventive Medicine. Preventive medicine and public health. 2nd ed. Seoul: Gyechuk Munwhasa;2013.
2. Szklo M, Nieto FJ. Epidemiology: beyond the basics. 2nd ed. Sudbury, MA: Jones and Bartlett Publishers;2007.
3. Atkinson G, Nevill AM. Statistical methods for assessing measurement error (reliability) in variables relevant to sports medicine. Sports Med. 1998; 26:217–238.
4. Bartlett JW, Frost C. Reliability, repeatability and reproducibility: analysis of measurement errors in continuous variables. Ultrasound Obstet Gynecol. 2008; 31:466–475.
5. Petrie A, Sabin C. Medical statistics at a glance. 3rd ed. Chichester, UK: John Wiley & Sons;2009.
6. Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat Methods Med Res. 1999; 8:135–160.
7. Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979; 86:420–428.
8. Cicchetti DV. Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychol Assess. 1994; 6:284–290.
9. Rosner B. Fundamentals of biostatistics. 7th ed. Boston, MA: Duxbury Press;2006.
10. Hirschmann MT, Konala P, Amsler F, Iranpour F, Friederich NF, Cobb JP. The position and orientation of total knee replacement components: a comparison of conventional radiographs, transverse 2D-CT slices and 3D-CT reconstruction. J Bone Joint Surg Br. 2011; 93:629–633.
11. Kim CH, Chung CK, Hong HS, Kim EH, Kim MJ, Park BJ. Validation of a simple computerized tool for measuring spinal and pelvic parameters. J Neurosurg Spine. 2012; 16:154–162.
12. Donner A, Zou G. Testing the equality of dependent intraclass correlation coefficients. J R Stat Soc Ser D Stat. 2002; 51:367–379.
13. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986; 1:307–310.
14. Bland JM, Altman DG. Applying the right statistics: analyses of measurement studies. Ultrasound Obstet Gynecol. 2003; 22:85–93.
15. Johnsson AA, Fagman E, Vikgren J, Fisichella VA, Boijsen M, Flinck A, et al. Pulmonary nodule size evaluation with chest tomosynthesis. Radiology. 2012; 265:273–282.
16. Bland M. Correction to section “Measuring agreement using repeated measurements” in Bland and Altman (1986) [Internet]. 2009. 07. 03. cited 2016 Dec 19. Available from: https://www.users.york.ac.uk/~mb55/meas/repeated.htm.
17. Hanneman SK. Design, analysis, and interpretation of method-comparison studies. AACN Adv Crit Care. 2008; 19:223–234.
18. Bruton A, Conway JH, Holgate ST. Reliability: what is it, and how is it measured? Physiotherapy. 2000; 86:94–99.
19. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977; 33:159–174.
20. Fleiss JL. Statistical methods for rates and proportions. 2nd ed. New York, NY: John Wiley and Sons;1981.
21. Altman DG. Practical statistics for medical research. London, UK: Chapman & Hall/CRC;1991.
22. Fleiss JL, Levin B, Paik MC. Statistical methods for rates and proportions. 3rd ed. Hoboken, NJ: John Wiley & Sons;2003.
23. StataCorp. STATA base reference manual (release 13). College Station, TX: Stata Press;2013.
Full Text Links
  • EMJ
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2024 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr