J Korean Soc Med Inform.  2009 Jun;15(2):175-189. 10.4258/jksmi.2009.15.2.175.

Basic Concepts and Principles of Data Mining in Clinical Practice

Affiliations
  • 1The Catholic University of Korea College of Nursing, Korea.
  • 2Department of Biomedical Informatics, School of Medicine, Ajou University, Korea. veritas@ajou.ac.kr
  • 3Institute for u-health Information Research, Ajou University Medical Center, Korea.

Abstract

Recently, many hospitals have been adopting clinical data warehouses (CDW) as well as electronic medical records. These new hospital information systems are inevitably introducing very large amounts of clinical data that might be useful for further analysis. However, the electronic clinical data in the CDW are usually byproducts of clinical practice rather than the product of research. Therefore, they include inconsistent and sometimes erroneous information that might not have the specific context of the clinical situations. Data miners usually have various academic backgrounds such as electronics, informatics, statistics, biomedicine, and public health. If the complex situations surrounding the clinical data are not well understood, investigators performing data mining in clinical fields may have problems assessing the information they are confronted with. Here, we would like to introduce some basic concepts on the principles of data mining in clinical fields including legal and ethical considerations as well as technical concerns.

Keyword

Clinical Data Mining; Machine Learning

MeSH Terms

Machine Learning
Data Mining
Electronic Health Records
Electronics
Electrons
Hospital Information Systems
Humans
Informatics
Public Health
Research Personnel

Figure

  • Figure 1 An example of descriptive summary of some selected features

  • Figure 2 Data quality and outlier summary

  • Figure 3 An example topology of artificial neural network model

  • Figure 4 An examples of decision tree model

  • Figure 5 An example of bayesian network model


Cited by  5 articles

Classification and Sequential Pattern Analysis for Improving Managerial Efficiency and Providing Better Medical Service in Public Healthcare Centers
Keunho Choi, Sukhoon Chung, Hyunsill Rhee, Yongmoo Suh
Healthc Inform Res. 2010;16(2):67-76.    doi: 10.4258/hir.2010.16.2.67.

Diagnostic Analysis of Patients with Essential Hypertension Using Association Rule Mining
A Mi Shin, In Hee Lee, Gyeong Ho Lee, Hee Joon Park, Hyung Seop Park, Kyung Il Yoon, Jung Jeung Lee, Yoon Nyun Kim
Healthc Inform Res. 2010;16(2):77-81.    doi: 10.4258/hir.2010.16.2.77.

Association Rules to Identify Complications of Cerebral Infarction in Patients with Atrial Fibrillation
Sun-Ju Jung, Chang-Sik Son, Min-Soo Kim, Dae-Joon Kim, Hyoung-Seob Park, Yoon-Nyun Kim
Healthc Inform Res. 2013;19(1):25-32.    doi: 10.4258/hir.2013.19.1.25.

Predictors of Medication Adherence in Elderly Patients with Chronic Diseases Using Support Vector Machine Models
Soo Kyoung Lee, Bo-Yeong Kang, Hong-Gee Kim, Youn-Jung Son
Healthc Inform Res. 2013;19(1):33-41.    doi: 10.4258/hir.2013.19.1.33.

A clinical research strategy using longitudinal observational data in the post-electronic health records era
Rae Woong Park
J Korean Med Assoc. 2012;55(8):711-719.    doi: 10.5124/jkma.2012.55.8.711.


Reference

1. Cios KJ, William Moore G. Uniqueness of medical data mining. Artificial Intelligence in Medicine. 2002. 26(1-2):1–24.
Article
2. Lavrac N, Keravnou E, Zupan B. Lavrac N, Keravnou E, Zupan B, editors. An overview. Intelligent data analysis in medicine and pharmacology. 1997. Boston: Kluwer;1–13.
3. Simon SR, Kaushal R, Cleary PD, Jenter CA, Volk LA, Orav EJ, et al. Physicians and electronic health records: a statewide survey. Archives of Internal Medicine. 2007. 167(5):507–512.
4. Menachemi N, Perkins RM, van Durme DJ, Brooks RG. Examining the adoption of electronic health records and personal digital assistants by family physicians in Florida. Inform Prim Care. 2006. 14(1):1–9.
Article
5. Park RW, Shin SS, Choi YI, Ahn JO, Hwang SC. Computerized physician order entry and electronic medical record systems in Korean teaching and general hospitals: results of a 2004 survey. J Am Med Inform Assoc. 2005. 12(6):642–647.
Article
6. Sittig F, Guappone K, Campbell E, Dykstra R, Ash J. A survey of USA acute care hospitals' computer-based provider order entry system infusion levels. Stud Health Technol Inform. 2007. 129(1):252.
7. DesRoches CM, Campbell EG, Rao SR, Donelan K, Ferris TG, Jha A, et al. Electronic health records in ambulatory care--a national survey of physicians. The New England Journal of Medicine. 2008. 359(1):50–60.
Article
8. Dewitt JG, Hampton PM. Development of a data warehouse at an academic health system: knowing a place for the first time. Acad Med. 2005. 80(11):1019–1025.
Article
9. Schubart JR, Einbinder JS. Evaluation of a data warehouse in an academic health sciences center. International Journal of Medical Informatics. 2000. 60(3):319–333.
Article
10. Silver M, Sakata T, Su HC, Herman C, Dolins SB, O'Shea MJ. Case study: how to apply data mining techniques in a healthcare data warehouse. J Healthc Inf Manag. 2001. 15(2):155–164.
11. Zhang Q, Matsumura Y, Teratani T, Yoshimoto S, Mineno T, Nakagawa K, et al. The application of an institutional clinical data warehouse to the assessment of adverse drug reactions (ADRs). Evaluation of aminoglycoside and cephalosporin associated nephrotoxicity. Methods Inf Med. 2007. 46(5):516–522.
Article
12. Kononenko I. Machine learning for medical diagnosis: history, state of the art and perspective. Artif Intell Med. 2001. 23(1):89–109.
Article
13. Lavrac N. Selected techniques for data mining in medicine. Artif Intell Med. 1999. 16(1):3–23.
Article
14. Kopelman LM. Minimal risk as an international ethical standard in research. The Journal of Medicine and Philosophy. 2004. 29(3):351–378.
Article
15. Cios KJ. Medical data mining and knowledge discovery. IEEE Eng Med Biol Mag. 2000. 19(4):15–16.
16. Cios KJ, Teresinska A, Konieczna S, Potocka J, Sharma S. A knowledge discovery approach to diagnosing myocardial perfusion. IEEE Eng Med Biol Mag. 2000. 19(4):17–25.
Article
17. Yuan YC. Multiple imputation for missing data: concepts and new development. In : Twenty-Fifth Annual SAS Users Group International Conference 2000;
18. Schafer JL, Graham JW. Missing data: our view of the state of the art. Psychol Methods. 2002. 7(2):147–177.
Article
19. Harel O, Zhou XH. Multiple imputation: review of theory, implementation and software. Stat Med. 2007. 26(16):3057–3077.
Article
20. Haykin S. Neural networks and learning machines. 2008. 3rd ed. New York: Prentice Hall.
21. Bishop CM. Pattern recognition and machine learning. 2005. 2nd ed. New York: Springer;291–358.
22. Rokach L, Maimon O. Data mining with decision trees: theroy and applications. 2008. Danvers, MA: World Scientific Publishing Company.
23. Heckerman DE. MSR-TR-94-09. Learning Bayesian networks: The combination of knowledge and statistical data. 1995. Redmond, WA: Microsoft Research.
24. Heckerman DE. Bayesian networks for data mining. Data Mining and Knowledge Discovery. 1997. 1:79–119.
25. Heckerman DE, Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R. Bayesian networks for knowledge discovery. Advances in knowledge discovery and data mining. 1996. Menlo Park, CA: The MIT Press;273–305.
26. Lee SM, Abbott P. Bayesian networks for knowledge discovery in large datasets: basics for nurse researchers. Journal of Biomedical Informatic. 2003. 36(4/5):389–399.
Article
27. SPSS. Clementine 12.0 modeling nodes. 2007. Chicago: SPSS.
28. SPSS. Clementine manual-Basic. 2007. Seoul: SPSS.
29. Menard SW. Applied logistic regression analysis. 2001. 2nd ed. London: Sage Publications.
30. Lee SM, Abbott P, Johantgen M. Logistic regression and bayesian networks to study outcomes using large data sets. Nursing Research. 2005. 54(2):133–138.
Article
31. Tu JV. Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. Journal of Clinical Epidemiology. 1996. 49:1225–1232.
Article
32. Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang CH, Angelo M, et al. Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci U S A. 2001. 98(26):15149–15154.
Article
33. Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics. 2000. 16:906–914.
Article
34. Lauritzen SL, Spiegelhalter DJ. Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society Series B. 1988. 50(2):157–194.
Article
35. Eisenstein EL, Alemi F. A comparison of three techniques for rapid model development: an application in patient risk-stratification. Proceedings/AMIA Annual Fall Symposium. 1996. 443–447.
36. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982. 143(1):29–36.
Article
37. Hanley JA, McNeil BJ. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology. 1983. 148(3):839–843.
Article
38. Rowland T, Ohno-Machado L, Ohrn A. Comparison of multiple prediction models for ambulation following spinal cord injury. Proceedings/AMIA Annual Symposium. 1998. 528–532.
39. Hosmer DW, Lemeshow S. Goodness of fit tests for the multiple logistic regression model. Communications in Statistics. 1980. A9(10):1043–1069.
Article
40. Lemeshow S, Hosmer DW. A review of goodness of fit statistics for use in the development of logistic regression models. American Journal of Epidemiology. 1982. 115(1):92–106.
Article
41. Blum RL. Displaying clinical data from a time-oriented database. Computers in Biology and Medicine. 1981. 11(4):197–210.
Article
42. Elomaa T HN. An experimental comparison of inducing decision trees and decision lists in noisy domains. In : 4th European Working Session on Learning; Dec 4-6, 1989.
43. Lesmo L SL, Torasso P. Gupta MM SE, editor. Learning of fuzzy production rules for medical diagnoses. Approximate reasoning in decision analysis. 1982. Amsterdam: North-Holland;249–260.
44. Hojker S KI, Jauk A, Fidler V, Porenta M. Expert system's development in the management of thyroid diseases. 1988. Sep. In : European Congress for Nuclear Medicine; Milano. Milano:
45. Horn W. AI in medicine on its way from knowledge-intensive to data-intensive systems. Artificial Intelligence in Medicine. 2001. 23(1):5–12.
Article
46. Quinlan R CP, Horn KA, Lazarus L. JR Q, editor. Inductive knowledge acquisition: a case study. Applications of expert systems. 1987. Boston: Addison-Wesley;137–156.
47. Zupan B, Dzeroski S. Acquiring background knowledge for machine learning using function decomposition: a case study in rheumatology. Artif Intell Med. 1998. 14(1-2):101–117.
Article
48. Cohen ME, Hudson DL. Neural network models for biosignal analysis. Conf Proc IEEE Eng Med Biol Soc. 2006. 1:3537–3540.
Article
49. Chun FK, Karakiewicz PI, Briganti A, Walz J, Kattan MW, Huland H, et al. A critical appraisal of logistic regression-based nomograms, artificial neural networks, classification and regression-tree models, look-up tables and risk-group stratification models for prostate cancer. BJU Int. 2007. 99(4):794–800.
Article
50. Rodriguez Alonso A, Pertega Diaz S, Gonzalez Blanco A, Pita Fernandez S, Suarez Pascual G, Cuerpo Perez MA. The utility of artificial neural networks in the prediction of prostate cancer on transrectal biopsy. Actas Urol Esp. 2006. 30(1):18–24.
51. Stephan C, Cammann H, Jung K. Artificial neural networks: has the time come for their use in prostate cancer patients? Nat Clin Pract Urol. 2005. 2(6):262–263.
Article
52. Gamito EJ, Crawford ED. Artificial neural networks for predictive modeling in prostate cancer. Curr Oncol Rep. 2004. 6(3):216–221.
Article
53. Porter CR, Crawford ED. Combining artificial neural networks and transrectal ultrasound in the diagnosis of prostate cancer. Oncology (Williston Park). 2003. 17(10):1395–1399. discussion 1399, 1403-1396.
54. Schwarzer G, Schumacher M. Artificial neural networks for diagnosis and prognosis in prostate cancer. Semin Urol Oncol. 2002. 20(2):89–95.
Article
55. Errejon A, Crawford ED, Dayhoff J, O'Donnell C, Tewari A, Finkelstein J, et al. Use of artificial neural networks in prostate cancer. Mol Urol. 2001. 5(4):153–158.
Article
56. Murphy GP, Snow P, Simmons SJ, Tjoa BA, Rogers MK, Brandt J, et al. Use of artificial neural networks in evaluating prognostic factors determining the response to dendritic cells pulsed with PSMA peptides in prostate cancer patients. Prostate. 2000. 42(1):67–72.
Article
57. Gamito EJ, Stone NN, Batuello JT, Crawford ED. Use of artificial neural networks in the clinical staging of prostate cancer: implications for prostate brachytherapy. Tech Urol. 2000. 6(2):60–63.
58. Snow PB, Smith DS, Catalona WJ. Artificial neural networks in the diagnosis and prognosis of prostate cancer: a pilot study. J Urol. 1994. 152(5 Pt 2):1923–1926.
Article
59. Giles LC, Whitehead CH, Jeffers L, McErlean B, Thompson D, Crotty M. Falls in hospitalized patients: can nursing information systems data predict falls? Computers, Informatics, Nursing. 2006. 24(3):167–172.
60. Tiet Q, Ilgen MA, Byrnes HF, Moos RH. Suicide attempts among substance use disorder patients: an initial step toward a decision tree for suicide management. Alcoholism: Clinical and Experimental Research. 2006. 30(6):998–1005.
Article
61. Modai I, Valevski A, Solomish A, Kurs R, Hines IL, Ritsner M, et al. Neural network detection of files of suicidal patients and suicidal profiles. Medical Informatics and the Internet in Medicine. 1999. 24(4):249–256.
Article
62. Anthony D, Clark M, Dallender J. An optimization of the Waterlow score using regression and artificial neural networks. Clinical Rehabilitation. 2000. 14(1):102–109.
Article
63. Brossette SE, Sprague AP, Hardin JM, Waites KB, Jones WT, Moser SA. Association rules and data mining in hospital infection control and public health surveillance. Journal of the American Medical Informatics Association. 1998. 5(4):373–381.
Article
64. Rapeli CB, Botega NJ. Clinical profiles of serious suicide attempters consecutively admitted to a university-based hospital: a cluster analysis study. Revista Brasileira de Psiquiatria. 2005. 27(4):285–289.
Article
Full Text Links
  • JKSMI
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2024 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr