Healthc Inform Res.  2020 Oct;26(4):303-310. 10.4258/hir.2020.26.4.303.

Building a Lung and Ovarian Cancer Data Warehouse

Affiliations
  • 1Department of Computer Science, New Jersey Institute of Technology, Newark, New Jersey, USA
  • 2General Department of Larissa, University of Thessaly, Larissa, Greece

Abstract


Objectives
Despite the collection of vast amounts of data by the healthcare sector, effective decision-making in medical practice is still challenging. Data warehousing technology can be applied for the collection and management of clinical data from various sources to provide meaningful insights for physicians and administrators. Cancer data are extremely complicated and massive; hence, a clinical data warehouse system can provide insights into prevention, diagnosis and treatment processes through the use of online analytical processing tools for the analysis of multi-dimensional data at different granularity levels.
Methods
In this study, a clinical data warehouse was developed for lung cancer data, which were kindly provided by the United States National Cancer Institute. Lung and ovarian cancer data were imported in specific formats and cleaned to remove errors and redundancies. SQL server integration services (SSIS) were used for the extract-transform-load (ETL) process.
Results
The design of the clinical data warehouse responds efficiently to all types of queries by adopting the fact constellation schema model. Various online analytical processing queries can be expressed using the proposed approach.
Conclusions
This model succeeded in responding to complex queries, and the analysis of data is facilitated by using online analytical processing cubes and viewing multilevel data details.

Keyword

Data Warehousing, Lung Cancer, Ovarian Cancer, Data Analytics

Figure

  • Query 1 How is the distribution of patients with nodules and biopsies according to the number of daily cigarettes?

  • Query 2 What is the distribution of patients with complication Pnömotoraks, collection of air in the pleural cavity, and those treated with chemotherapy according to age?

  • Query 3 Compare the number of complications in ovarian and lung cancer patients.

  • Query 4 List the PLCO_ID numbers and names of patients who received “non-curative” treatment for both lung and ovarian cancer.

  • Figure 1 Proposed project architecture. Adapted from Sheta and Eldeen [16].

  • Figure 2 Star schema for medical records.

  • Figure 3 Snowflake schema for medical records.

  • Figure 4 Lung and ovarian cancer clinical data warehouse fact constellation schema model.


Reference

References

1. Garani G, Atay CE. Encountering incomplete temporal information in clinical data warehouses. Int J Appl Res Public Health Manag. 2020; 5(1):32–48.
Article
2. Kallmeyer V, Venkat K. Beyond e-health: health and information technology converge. Siliconindia. 2002; 6(4):42.
3. The Global Cancer Observatory [Internet]. Lyon, France: International Agency for Research on Cancer;c2020. [cited at 2020 Sep 10]. Available from: https://gco.iarc.fr/ .
4. Ferlay J, Parkin DM, Steliarova-Foucher E. Estimates of cancer incidence and mortality in Europe in 2008. Eur J Cancer. 2010; 46(4):765–81.
Article
5. Miele S, Shockley R. Analytics: the real-world use of big data. Somers (NY): IBM Global Business Services;2013.
6. Siegel R, Ma J, Zou Z, Jemal A. Cancer statistics, 2014. CA Cancer J Clin. 2014; 64(1):9–29.
Article
7. Atay CE, Garani G. Maintaining dimension’s history in data warehouses effectively. Int J Data Wareh Min. 2019; 15(3):46–62.
Article
8. Arous EJ, McDade TP, Smith JK, Ng SC, Sullivan ME, Zottola RJ, et al. Electronic medical record: research tool for pancreatic cancer? J Surg Res. 2014; 187(2):466–70.
Article
9. Bellaachia A, Guven E. Predicting breast cancer survivability using data mining techniques. In : Proceedings of the 6th SIAM International Conference on Data Mining: Scientific Data Mining; 2006 Apr 20–22; Bethesda, MD.
Article
10. Gorgionne GA, Gangopadhyah A, Adya M. A decision technology system to advance the diagnosis and treatment of breast cancer. Managing healthcare information systems with web-enabled technologies. Hershey (PA): IGI Global;2000. p. 141–50.
Article
11. Krishnaiah V, Narsimha G, Chandra NS. Diagnosis of lung cancer prediction system using data mining classification techniques. Int J Comput Sci Inf Technol. 2013; 4(1):39–45.
12. Wah TY, Sim OS. Development of a data warehouse for lymphoma cancer diagnosis and treatment decision support. WSEAS Trans Inf Sci Appl. 2009; 6(3):530–43.
13. Zubi ZS, Saad RA. Improves treatment programs of lung cancer using data mining techniques. J Softw Eng Appl. 2014; 7(2):42749.
Article
14. Abidi SS, Abidi SR. A case for supplementing evidence based medicine with inductive clinical knowledge: towards a technology-enriched integrated clinical evidence system. In : Proceedings 14th IEEE Symposium on Computer-Based Medical Systems (CBMS); 2001 Jul 26–27; Bethesda, MD. p. 5–10.
Article
15. Wu R, Peters W, Morgan MW. The next generation of clinical decision support: linking evidence to best practice. J Healthc Inf Manag. 2002; 16(4):50–5.
16. Sheta OE, Eldeen AN. Evaluating a healthcare data warehouse for cancer diseases. IRACST Int J Comput Sci Inf Technol Secur. 2013; 3(3):237–41.
17. Ramachandran P, Girija N, Bhuvaneswari T. Early detection and prevention of cancer using data mining techniques. Int J Comput Appl. 2014; 97(13):48–53.
Article
18. Inmon WH. Building the data warehouse. 2nd ed. New York (NY): John Wiley & Sons;1996.
19. Kimball R, Ross M. The data warehouse toolkit. 2nd ed. New York (NY): John Wiley & Sons;2002.
Full Text Links
  • HIR
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2024 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr