2007 Volume 19 Pages 95-106
With the launch of the international HapMap project, the haplotype inference problem has attracted a great deal of attention in the computational biology community recently. In this paper, we study the question of how to efficiently infer haplotypes from genotypes of individuals related by a pedigree without mating loops, assuming that the hereditary process was free of mutations (i. e. the Mendelian law of inheritance) and recombinants. We model the haplotype inference problem as a system of linear equations as in [10] and present an (optimal) linear-time (i. e. O (mn) time) algorithm to generate a particular solution to the haplotype inference problem, where m is the number of loci (or markers) in a genotype and n is the number of individuals in the pedigree. Moreover, the algorithm also provides a general solution in O (mn2) time, which is optimal because the size of a general solution could be as large as Θ (mn2). The key ingredients of our construction are (i) a fast consistency checking procedure for the system of linear equations introduced in [10] based on a careful investigation of the relationship between the equations (ii) a novel linear-time method for solving linear equations without invoking the Gaussian elimination method. Although such a fast method for solving equations is not known for general systems of linear equations, we take advantage of the underlying loop-free pedigree graph and some special properties of the linear equations.