A common framework for model-based or model-free, twopoint or multipoint, linkage and/or linkage disequilibrium analysis of complex traits. H.H.H. Goring1, J. Ott1,2, J.D. Terwilliger1,3. 1) Columbia U., NY; 2) Rockefeller U., NY; 3) NY State Psychiatric Inst., NY.
In linkage and/or linkage disequilibrium (LD) analysis, one computes the joint probability of a set of trait phenotypes (Ph) and a set of observed marker genotypes (GM) in one's data. In conventional model-based analysis, this probability is computed as P(Ph,GM) = P(Ph|GM)P(GM) = SGD P(Ph|GD)P(GD|GM)P(GM), by partitioning over all possible trait-locus genotype combinations for all individuals in the dataset (GD). P(Ph|GD) is a function of the mode of inheritance, while P(GD|GM) is a function of linkage and LD. By contrast, in model-free analysis methods, one computes this probability as P(GM,Ph) = P(GM|Ph)P(Ph). One can estimate P(GM|Ph) without assuming anything about the genotype-phenotype relationships, if the ascertained samples have the same pedigree and phenotype structure (Ph) (e.g. affected sib-pairs, trios, or singletons). Though one does not model the genotype-phenotype relationships explicitly, the likelihood certainly depends on them, since P(GM|Ph) = SGD P(GM|GD)P(GD|Ph). P(GM|GD) is a function of linkage and LD, and P(GD|Ph) is a function of how well the trait phenotypes predict the trait-locus genotypes. Since it is possible to find markers arbitrarily close to any trait locus, thereby increasing P(GM|GD), the power of a study is dominated by P(GD|Ph), and the ascertainment and study design can thus be more important than the choice of statistical analysis method. We show how model-free analysis of linkage and/or LD can be performed by using deterministically assigned pseudomarker genotypes. In contrast to most conventional model-free methods, the pseudomarker analogs can be applied to different data structures jointly, thus using the total data more efficiently. These methods also have better statistical properties and are more powerful. We further show that twopoint analysis and multipoint analysis using complex-valued recombination fractions are algebraically isomorphic. This allows multipoint analysis, either model-based or model-free using pseudomarkers, to be performed with the same degree of robustness to trait-locus genotype errors as twopoint analysis.