TIMBER - personalized computationally-efficient filtering of GERMLINE-discovered putative IBD segments. M. Barber1, K. Noto1, Y. Wang1, R. E. Curtis2, J. M. Granka1, J. K. Byrnes1, N. M. Myres2, C. A. Ball1, K. G. Chahine2 1) AncestryDNA, San Francisco, CA; 2) AncestryDNA, Provo, UT.

   Discovering regions of pairs of genomes that are identical by descent (IBD) is an important part of many genetic analyses. Given a large sample size (over 100K people and beyond), the problem is for IBD discovery to be both computationally feasible and accurate. GERMLINE is a computationally efficient algorithm (as it uses a hashing approach and a separate extension algorithm) but it lacks the accuracy of other approaches on its own. RefinedIBD is an accurate algorithm, as it evaluates all putative IBD segments with a haplotype model. However, RefinedIBD is not computationally efficient, even though it uses the GERMLINE algorithm to discover the putative IBD segments. TIMBER is an algorithm that decides if a putative IBD segment discovered by GERMLINE has a sufficient level of evidence for IBD to be retained. TIMBER uses GERMLINE-discovered putative IBD segments to filter the very same segments. Within GERMLINE, the genome is split up into non-overlapping windows, where TIMBER calculates a weight for each of these windows. Each weight provides the relative level of evidence of IBD from that window in a GERMLINE-discovered putative IBD segment. TIMBERs main action is to down-weight windows of the genome that show excessively high degree of putative IBD segments. TIMBER is made possible given the large amount of putative IBD segments that are discovered in running GERMLINE on a large data set (over 100K people). We have evidence that TIMBER is a very useful IBD filter from both real and simulated data given a large data set of over 100K people. While TIMBER is not as accurate as a method such as RefinedIBD, it provides a significant improvement in accuracy over running GERMLINE on its own and it is computationally efficient for a large data set.

You may contact the first author (during and after the meeting) at