Ann Lab Med.  2025 Jan;45(1):12-21. 10.3343/alm.2024.0323.

Laboratory Data as a Potential Source of Bias in Healthcare Artificial Intelligence and Machine Learning Models

Affiliations
  • 1Department of Pathology, UT Southwestern Medical Center, Dallas, TX, USA

Abstract

Artificial intelligence (AI) and machine learning (ML) are anticipated to transform the practice of medicine. As one of the largest sources of digital data in healthcare, laboratory results can strongly influence AI and ML algorithms that require large sets of healthcare data for training. Embedded bias introduced into AI and ML models not only has disastrous consequences for quality of care but also may perpetuate and exacerbate health disparities. The lack of test harmonization, which is defined as the ability to produce comparable results and the same interpretation irrespective of the method or instrument platform used to produce the result, may introduce aggregation bias into algorithms with potential adverse outcomes for patients. Limited interoperability of laboratory results at the technical, syntactic, semantic, and organizational levels is a source of embedded bias that limits the accuracy and generalizability of algorithmic models. Population-specific issues, such as inadequate representation in clinical trials and inaccurate race attribution, not only affect the interpretation of laboratory results but also may perpetuate erroneous conclusions based on AI and ML models in the healthcare literature.

Keyword

Aggregation bias; Artificial intelligence; Clinical pathology; Diagnostic error; Health information interoperability; Logical Observation Identifiers Names and Codes; Machine learning; SNOMED CT

Reference

References

1. Harrison JH, Gilbertson JR, Hanna MG, Olson NH, Seheult JN, Sorace JM, et al. 2021; Introduction to artificial intelligence and machine learning for pathology. Arch Pathol Lab Med. 145:1228–54. DOI: 10.5858/arpa.2020-0541-CP. PMID: 33493264.
2. Miller MI, Shih LC, Kolachalama VB. 2023; Machine learning in clinical trials: a primer with applications to neurology. Neurotherapeutics. 20:1066–80. DOI: 10.1007/s13311-023-01384-2. PMID: 37249836. PMCID: PMC10228463.
3. McAlpine ED, Michelow P, Celik T. 2022; The utility of unsupervised machine learning in anatomic pathology. Am J Clin Pathol. 157:5–14. DOI: 10.1093/ajcp/aqab085. PMID: 34302331.
4. Gottesman O, Johansson F, Komorowski M, Faisal A, Sontag D, Doshi-Velez F, et al. 2019; Guidelines for reinforcement learning in healthcare. Nat Med. 25:16–8. DOI: 10.1038/s41591-018-0310-5. PMID: 30617332.
5. Rashidi HH, Tran NK, Betts EV, Howell LP, Green R. 2019; Artificial intelligence and machine learning in pathology: the present landscape of supervised methods. Acad Pathol. 6:2374289519873088. DOI: 10.1177/2374289519873088. PMID: 31523704. PMCID: PMC6727099. PMID: f30294d547b54c09b4634b606c86e476.
6. Gupta S, Tran T, Luo W, Phung D, Kennedy RL, Broad A, et al. 2014; Machine-learning prediction of cancer survival: a retrospective study using electronic administrative records and a cancer registry. BMJ Open. 4:e004007. DOI: 10.1136/bmjopen-2013-004007. PMID: 24643167. PMCID: PMC3963101.
7. Al Fryan LH, Alazzam MB. 2022; Survival analysis of oncological patients using machine learning method. Healthcare (Basel). 11:80. DOI: 10.3390/healthcare11010080. PMID: 36611540. PMCID: PMC9818920.
8. Ngo A, Gandhi P, Miller WG. 2017; Frequency that laboratory tests influence medical decisions. J Appl Lab Med. 1:410–4. DOI: 10.1373/jalm.2016.021634. PMID: 33636802.
9. Myers GL, Miller WG. 2016; The International Consortium for Harmonization of Clinical Laboratory Results (ICHCLR) - A pathway for harmonization. EJIFCC. 27:30–6.
10. Miller WG, Tate JR, Barth JH, Jones GRD. 2014; Harmonization: the sample, the measurement, and the report. Ann Lab Med. 34:187–97. DOI: 10.3343/alm.2014.34.3.187. PMID: 24790905. PMCID: PMC3999316.
11. Tate JR, Myers GL. 2016; Harmonization of clinical laboratory test results. EJIFCC. 27:5–14. PMID: 27683501. PMCID: PMC4975212.
12. Park J, Lee S, Kim Y, Choi A, Lee H, Lim J, et al. 2018; Comparison of four automated carcinoembryonic antigen immunoassays: ADVIA Centaur XP, ARCHITECT I2000sr, Elecsys E170, and Unicel Dxi800. Ann Lab Med. 38:355–61. DOI: 10.3343/alm.2018.38.4.355. PMID: 29611386. PMCID: PMC5895865.
13. van Schrojenstein Lantman M, van de Logt AE, Thelen M, Wetzels JF, van Berkel M. 2022; Serum albumin measurement in nephrology: room for improvement. Nephrol Dial Transplant. 37:1792–9. DOI: 10.1093/ndt/gfaa375. PMID: 33367921.
14. Kidney disease: improving global outcomes (KDIGO) glomerulonephritis work group. 2012; KDIGO clinical practice guideline for glomerulonephritis. Kidney Int. 2:139–274.
15. Mactier R, Hoenich N, Breen C. 2011; Renal association clinical practice guideline on haemodialysis. Nephron Clin Pract. 118(S1):c241–86. DOI: 10.1159/000328072. PMID: 21555899.
16. van de Logt AE, Rijpma SR, Vink CH, Prudon-Rosmulder E, Wetzels JF, van Berkel M. 2019; The bias between different albumin assays may affect clinical decision-making. Kidney Int. 95:1514–7. DOI: 10.1016/j.kint.2019.01.042. PMID: 31053386.
17. Yang HS, Pan W, Wang Y, Zaydman MA, Spies NC, Zhao Z, et al. 2023; Generalizability of a machine learning model for improving utilization of parathyroid hormone-related peptide testing across multiple clinical centers. Clin Chem. 69:1260–9. DOI: 10.1093/clinchem/hvad141. PMID: 37738611.
18. Jacobsen LM, Bocchino LE, Lum JW, Kollman C, Barnes-Lomen V, Sulik M, et al. 2022; Accuracy of three commercial home-use hemoglobin A1c tests. Diabetes Technol Ther. 24:789–96. DOI: 10.1089/dia.2022.0187. PMID: 35763337.
19. Beck RW, Bocchino LE, Lum JW, Kollman C, Barnes-Lomen V, Sulik M, et al. 2021; An evaluation of two capillary sample collection kits for laboratory measurement of HbA1c. Diabetes Technol Ther. 23:537–45. DOI: 10.1089/dia.2021.0023. PMID: 33826420.
20. Bietenbeck A. 2016; Combining medical measurements from diverse sources: experiences from clinical chemistry. Stud Health Technol Inform. 228:58–62.
21. Code of Federal Regulations. §493.1291 standard: test report, 2004. https://www.ecfr.gov/current/title-42/chapter-IV/subchapter-G/part-493/subpart-K/subject-group-ECFR9482366886d579f/section-493.1291. Updated on June, 2024.
22. Dahlweid FM, Kämpf M, Leichtle A. 2018; Interoperability of laboratory data in Switzerland - a spotlight on Bern. J Lab Med. 42:251–8. DOI: 10.1515/labmed-2018-0072.
23. International Consortium for Harmonization of Clinical Laboratory Results. https://www.harmonization.net/measurands/. Updated on March, 2024.
24. Stram M, Seheult J, Sinard JH, Campbell WS, Carter AB, de Baca ME, et al. 2020; A survey of LOINC code selection practices among participants of the College of American Pathologists coagulation (CGL) and cardiac markers (CRT) proficiency testing programs. Arch Pathol Lab Med. 144:586–96. DOI: 10.5858/arpa.2019-0276-OA. PMID: 31603714.
25. Cholan RA, Pappas G, Rehwoldt G, Sills AK, Korte ED, Appleton IK, et al. 2022; Encoding laboratory testing data: case studies of the national implementation of HHS requirements and related standards in five laboratories. J Am Med Inform Assoc. 29:1372–80. DOI: 10.1093/jamia/ocac072. PMID: 35639494. PMCID: PMC9277627.
26. Hauser RG, Quine DB, Iscoe M, Arvisais-Anhalt S. 2022; Development and implementation of a standard format for clinical laboratory test results. Am J Clin Pathol. 158:409–15. DOI: 10.1093/ajcp/aqac067. PMID: 35713605.
27. Lehne M, Sass J, Essenwanger A, Schepers J, Thun S. 2019; Why digital medicine depends on interoperability. NPJ Digit Med. 2:79. DOI: 10.1038/s41746-019-0158-1. PMID: 31453374. PMCID: PMC6702215.
28. Bernstam EV, Warner JL, Krauss JC, Ambinder E, Rubinstein WS, Komatsoulis G, et al. 2022; Quantitating and assessing interoperability between electronic health records. J Am Med Inform Assoc. 29:753–60. DOI: 10.1093/jamia/ocab289. PMID: 35015861. PMCID: PMC9006690.
29. Patel V, McNamara L, Dullabh P, Sawchuk ME, Swain M. 2017; Variation in interoperability across clinical laboratories nationwide. Int J Med Inform. 108:175–84. DOI: 10.1016/j.ijmedinf.2017.09.007. PMID: 29132625. PMCID: PMC6996286.
30. Vest JR, Unruh MA, Shapiro JS, Casalino LP. 2019; The associations between query-based and directed health information exchange with potentially avoidable use of health care services. Health Serv Res. 54:981–93. DOI: 10.1111/1475-6773.13169. PMID: 31112303. PMCID: PMC6736925.
31. Holmgren AJ, Esdar M, Hüsers J, Coutinho-Almeida J. 2023; Health information exchange: understanding the policy landscape and future of data interoperability. Yearb Med Inform. 32:184–94. DOI: 10.1055/s-0043-1768719. PMID: 37414031. PMCID: PMC10751121.
32. Chen M, Esmaeilzadeh P. 2023; Adoption and use of various health information exchange methods for sending inside health information in US hospitals. Int J Med Inform. 177:105156. DOI: 10.1016/j.ijmedinf.2023.105156. PMID: 37487455.
33. Arvisais-Anhalt S, Lehmann CU, Park JY, Araj E, Holcomb M, Jamieson AR, et al. 2021; What the coronavirus disease 2019 (COVID-19) pandemic has reinforced: the need for accurate data. Clin Infect Dis. 72:920–3. DOI: 10.1093/cid/ciaa1686. PMID: 33146707. PMCID: PMC7665390.
34. Hulsen T, Friedecký D, Renz H, Melis E, Vermeersch P. 2022; Fernandez-Calle P. From big data to better patient outcomes. Clin Chem Lab Med. 61:580–6. DOI: 10.1515/cclm-2022-1096. PMID: 36539928.
35. Blatter TU, Witte H, Nakas CT, Leichtle AB. 2022; Big data in laboratory medicine-FAIR quality for AI? Diagnostics (Basel). 12:1923. DOI: 10.3390/diagnostics12081923. PMID: 36010273. PMCID: PMC9406962. PMID: 92558140ec7749b48c2041d59d1d8e1d.
36. Yim WW, Evans HL, Yetisgen M. 2015; Structuring free-text microbiology culture reports for secondary use. AMIA Jt Summits Transl Sci Proc. 2015:471–5. PMID: 26306288. PMCID: PMC4525274.
37. Carter AB, de Baca ME, Luu HS, Campbell WS, Stram MN. 2020; Use of LOINC for interoperability between organisations poses a risk to safety. Lancet Digit Health. 2:e569. DOI: 10.1016/S2589-7500(20)30244-2. PMID: 33328084. PMCID: PMC7613542.
38. Lin MC, Vreeman DJ, McDonald CJ, Huff SM. 2010; Correctness of voluntary LOINC mapping for laboratory tests in three large institutions. AMIA Annu Symp Proc. 2010:447–51. PMID: 21347018. PMCID: PMC3041457.
39. McDonald CJ, Baik SH, Zheng Z, Amos L, Luan X, Marsolo K, et al. 2023; Mis-mappings between a producer's quantitative test codes and LOINC codes and an algorithm for correcting them. J Am Med Inform Assoc. 30:301–7. DOI: 10.1093/jamia/ocac215. PMID: 36343113. PMCID: PMC9846663.
40. Luu HS, Campbell WS, Cholan RA, Edgerton ME, Englund A, Keller A, et al. 2024; Analysis of laboratory data transmission between two healthcare institutions using a widely used point-to-point health information exchange platform: a case report. JAMIA Open. 7:ooae032. DOI: 10.1093/jamiaopen/ooae032. PMID: 38660616. PMCID: PMC11042873.
41. Baorto DM, Cimino JJ, Parvin CA, Kahn MG. 1998; Combining laboratory data sets from multiple institutions using the logical observation identifier names and codes (LOINC). Int J Med Inform. 51:29–37. DOI: 10.1016/S1386-5056(98)00089-6. PMID: 9749897.
42. Chang T, Herman DS, McClintock DS, Durant TJS. 2023; The roadmap to interoperability and laboratory data: current state and next steps. J Appl Lab Med. 8:226–8. DOI: 10.1093/jalm/jfac082. PMID: 36610435.
43. CLSI. 2023. Semantic interoperability for in vitro diagnostic systems. 1st ed. Clinical and Laboratory Standards Institute;CLSI report AUTO17.
44. Sutton RT, Pincock D, Baumgart DC, Sadowski DC, Fedorak RN, Kroeker KI. 2020; An overview of clinical decision support systems: benefits, risks, and strategies for success. NPJ Digit Med. 3:17. DOI: 10.1038/s41746-020-0221-y. PMID: 32047862. PMCID: PMC7005290. PMID: 92e0917c061d4f84b6a7e7e739d84b14.
45. Nestor B, McDermott MBA, Boag W, Berner G, Naumann T, Hughes MC, et al. 2019; Feature robustness in non-stationary health records: caveats to deployable model performance in common clinical machine learning tasks. Proc Mach Learn Res. 106:1–23.
46. Van Calster B, Steyerberg EW, Wynants L, van Smeden M. 2023; There is no such thing as a validated prediction model. BMC Med. 21:70. DOI: 10.1186/s12916-023-02779-w. PMID: 36829188. PMCID: PMC9951847. PMID: f9a6641b7f234b999b5884e3d6f677f4.
47. Shah NH, Halamka JD, Saria S, Pencina M, Tazbaz T, Tripathi M, et al. 2024; A nationwide network of health AI assurance laboratories. JAMA. 331:245–9. DOI: 10.1001/jama.2023.26930. PMID: 38117493.
48. Kurant DE. 2023; Opportunities and challenges with artificial intelligence in genomics. Clin Lab Med. 43:87–97. DOI: 10.1016/j.cll.2022.09.007. PMID: 36764810.
49. Marzinke MA, Greene DN, Bossuyt PM, Chambliss AB, Cirrincione LR, McCudden CR, et al. 2022; Limited evidence for use of a Black race modifier in eGFR calculations: a systematic review. Clin Chem. 68:521–33. DOI: 10.1093/clinchem/hvab279. PMID: 34927677.
50. Ma MA, Gutiérrez DE, Frausto JM, Al-Delaimy WK. 2021; Minority representation in clinical trials in the United States: trends over the past 25 years. Mayo Clin Proc. 96:264–6. DOI: 10.1016/j.mayocp.2020.10.027. PMID: 33413830.
51. Gianfrancesco MA, Tamang S, Yazdany J, Schmajuk G. 2018; Potential biases in machine learning algorithms using electronic health record data. JAMA Intern Med. 178:1544–7. DOI: 10.1001/jamainternmed.2018.3763. PMID: 30128552. PMCID: PMC6347576.
52. Chen IY, Pierson E, Rose S, Joshi S, Ferryman K, Ghassemi M. 2021; Ethical machine learning in healthcare. Annu Rev Biomed Data Sci. 4:123–44. DOI: 10.1146/annurev-biodatasci-092820-114757. PMID: 34396058. PMCID: PMC8362902.
53. Hing E, Burt CW. 2009; Are there patient disparities when electronic health records are adopted? J Health Care Poor Underserved. 20:473–88. DOI: 10.1353/hpu.0.0143. PMID: 19395843.
54. Jorde LB, Wooding SP. 2004; Genetic variation, classification and 'racé. Nat Genet. 36(S11):S28–33. DOI: 10.1038/ng1435. PMID: 15508000.
55. Amin C, Adam S, Mooberry MJ, Kutlar A, Kutlar F, Esserman D, et al. 2015; Coagulation activation in sickle cell trait: an exploratory study. Br J Haematol. 171:638–46. DOI: 10.1111/bjh.13641. PMID: 26511074. PMCID: PMC4782194.
56. Lacy ME, Wellenius GA, Sumner AE, Correa A, Carnethon MR, Liem RI, et al. 2017; Association of sickle cell trait with hemoglobin A1c in African Americans. JAMA. 317:507–15. DOI: 10.1001/jama.2016.21035. PMID: 28170479. PMCID: PMC5713881.
57. Sivasankar S, Cheng AL, Lubin IM, Lankachandra K, Hoffman MA. 2021; Use of large scale EHR data to evaluate A1c utilization among sickle cell disease patients. BMC Med Inform Decis Mak. 21:268. DOI: 10.1186/s12911-021-01632-5. PMID: 34537047. PMCID: PMC8449923. PMID: 85b346f0f52746b8b2cf1e64c67d7db0.
58. Witzig RS, Dery M. 2014; Subjectively-assigned versus self-reported race and ethnicity in US healthcare. Soc Med. 8:32–6. PMID: 3ee0bbf1e69541d083a5fa1ab24d8140.
59. Kusner MJ, Loftus J, Russell C, Silva R. Counterfactual fairness. In : Adv Neural Inf Process Syst; 31st Conference on Neural Information Processing Systems (NIPS 2017); 2017; Long Beach, CA, USA. DOI: 10.2139/ssrn.4329712.
Full Text Links
  • ALM
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2025 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr