Korean J Leg Med.  2014 May;38(2):59-65. 10.7580/kjlm.2014.38.2.59.

Searching for Appropriate Statistical Parameters for Validation of Mitochondrial DNA Database

Affiliations
  • 1Forensic DNA Division, National Forensic Service, Wonju-si, Gangwon, Korea.
  • 2Department of Forensic Medicine, Seoul National University College of Medicine, Seoul, Korea. sdlee@snu.ac.kr
  • 3Institute of Forensic Science, Seoul National University College of Medicine, Seoul, Korea.

Abstract

Recently, studies on mitochondrial DNA (mtDNA) have increased rapidly. Conventional parameters, such as diversity index, pairwise comparison, are used to interpret and validate data on autosomal DNA; however, the use of these parameters to validate data from mitochondrial DNA databases (mtDNA DBs) needs to be verified because of the different transmission patterns of mtDNA. This study was done to verify the use of these conventional parameters and to test the "coverage concept" for a new parameter. The mtDNA DB is not very big; however, it is necessary to check how the change in parameters corresponds to the DB size. For this, we artificially rearranged a Korean DB into several small sub-DBs of variable sizes. The results show that the diversity in nucleotide variations and the different haplotype numbers do not vary as the size of DB increases. However, the "coverage" changed a lot. The coverage increased from 0.113 in a DB of 100 people to 0.260 in a DB of 653 people. Additionally, using the "coverage concept", we predicted how the total number of haplotypes changed with variations in the sub-DB size and compared the predicted result with final result. In conclusion, "coverage", in addition to conventional statistical parameters, can be used to check the usability of an mtDNA DB. Finally, we tried to predict the size of the whole mtDNA number in Korea using "saturation concept".

Keyword

mtDNA DB; Statistical parameter; Coverage; Phylogeny; Saturation curve

MeSH Terms

DNA
DNA, Mitochondrial*
Haplotypes
Korea
Phylogeny
DNA
DNA, Mitochondrial

Figure

  • Fig. 1. Saturation curves of expanded sample sizes. a. Expanded up to 10,000 people b. Expanded up to 100,000 people A result of examining the number of possible observed haplotypes when group size increased up to 10,000, 100,000, the final expected number of haplotypes was 4,500 over. The shaded portion of the graph is the confidence interval upper and lower limits.

  • Fig. 2. Result of simulated saturation curve from N. of observed haplotypes. Graph is obtained by curve expert professional 1.6.5 version. The fit converged to a tolerance of 1e-006 in 5 iterations. No weighting is used.


Reference

1. Torroni A, Achilli A, Macaulay V, et al. Harvesting the fruit of the human mtDNA tree. Trends Genet. 2006; 22:339–45.
Article
2. Torroni A, Schurr TG, Cabell MF, et al. Asian affinities and continental radiation of the four founding Native American mtDNAs. Am J Hum Genet. 1993; 53:563–90.
3. Egeland T, B�velstad HM, Storvik GO, et al. Inferring the most likely geographical origin of mtDNA sequence profiles. Ann Hum Genet. 2004; 68:461–71.
Article
4. Chao A, Lee SM. Estimating the number of classes via sample coverage. JASA. 1992; 87:210–7.
Article
5. Huang SP, Weir BS. Estimating the total number of alleles using a sample coverage method. Genetics. 2001; 159:1365–73.
Article
6. Egeland T, Salas A. Estimating haplotype frequency and coverage of databases. PLoS one. 2008; 3:e3998.
Article
7. Tajima F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989; 123:585–95.
Article
8. Pereira L, Cunha C, Amorim A. Predicting sampling saturation of mtDNA haplotypes: an application to an enlarged Portuguese database. Int J Legal Med. 2004; 118:132–6.
Article
9. Pfeiffer H, Brinkmann B, Hu ¨hne J, et al. Expanding the forensic German mitochondrial DNA control region database: genetic diversity as a function of sample size and mi-crogeography. Int J Legal Med. 1999; 112:291–8.
Article
10. Haas PJ, Ko ¨nig C. A bi-level Bernoulli scheme for database sampling. In proceedings of the 2004 ACM SIGMOD international conference on Management of data. ACM. 2004. 275–86.
11. Mao CX. Predicting the conditional probability of discovering a new class. JASA. 2004; 99:1108–18.
Article
12. Nei M, Li WH. Mathematical model for studying genetic variation in terms of restriction endonucleases. Proc Natl Acad Sci USA. 1979; 76:5269–73.
Article
13. Bunge J, Fitzpatrick M. Estimating the number of species: a review. JASA. 1993; 88:364–73.
Article
Full Text Links
  • KJLM
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2024 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr