The absolute and relative accuracy of haplotype inferral methods and a consensus approach to haplotype inferral. S. Orzack1, D. Gusfield2, V.P. Stanton, Jr.3. 1) Fresh Pond Research Institute, Cambridge, MA; 2) Department of Computer Science University of California Davis, California 95616; 3) Variagenics, Inc. Cambridge, MA 02139.
Theoretical and practical considerations suggest that a more complete causal understanding of complex traits like many human diseases will be gained by haplotype analysis, since such traits may usually be the partial result of many genetic determinants. Several important algorithms for haplotype inferral have been developed but there has been little assessment of their performance with respect to data. We describe the results of our analyses of the accuracy of presently-available computer algorithms for random genotypic data for the ApoE locus in humans. There are 9 SNP sites and 80 individuals in our data set; all individuals have experimentally-inferred haplotypes. Algorithms studied include those based on the Expectation-Maximization approach and on the rule-based approach. We compared the frequency distributions of haplotypes predicted by the various algorithms. They differed significantly with regard to their predicted frequency distributions and also with regard to their success at predicting the list of real haplotypes. Accuracy was also assessed absolutely by comparing the identities of predicted haplotype pairs with the identities determined by direct molecular analysis. The algorithms differed significantly with regard to their success at predicting haplotype pairs; most predicted less than 80 percent of the haplotype pairs correctly. We conclude that present algorithms cannot serve by themselves as accurate haplotype predictors. To this extent, we describe consensus methods for dividing data sets into those genotypes in need of experimental inferral and those genotypes not in need of such inferral because their algorithmic inferral has a very high probability of being correct. In this way, one can hope to obtain correct identification of the haplotypes of all of the individuals in a sample. The consensus method is based upon a rule-based algorithm which can generate alternative haplotype pairs for a given individual. (Research supported by a National Science Foundation Award to D.G.).