Computational methods for evaluating clone sampling efficiency in normalized cDNA libraries. J. Hoh1, D. Gordon2, T. Lints2, T.C. Matise2, J. Ott2. 1) Columbia University, New York, NY; 2) Rockefeller University, New York, NY.
Large-scale gene expression analysis using cDNA microarrays promises to yield new insights into the biological properties at the molecular level. Such microarrays can be constructed from normalized cDNA libraries derived from tissues of particular interest (TOI) (Library I), or from more complex libraries in which a substantial proportion of all genes expressed by the organism are represented (Library II). Library I provides a more efficient means to sample the genes in TOI, whereas Library II is valuable to investigators with broader interests. Here, we develop quantitative methods to assess the relative efficiency of sampling from these two libraries.
To perform this analysis, we address the following 2 problems: 1) how many clones from the tissue-specific normalized library (I) must be sampled in order to acquire a specified proportion of the genes expressed in the TOI? 2) how many clones from the more complex normalized library (II) must be sampled in order to acquire a specified proportion of the genes expressed in the TOI and, in so doing, what proportion of the genes expressed in other tissues (non-TOI) would be acquired? For each of problems 1 and 2, we have developed analytical expressions for the probabilities of acquiring all genes expressed in TOI for a given sample size (the solution is reminiscent of, but not identical to, the well-known coupon collecting problem in combinatorics). Computer simulation will be applied to investigate more general questions. We will assume both perfect normalization (each gene in the library has the same frequency) and more realistic normalizations where the frequency of each gene falls within, for example, a 10-fold or 100-fold range.