Prioritizing Sequence Variants Using Statistical Evidence: Not All Measures are Alike. W. Li1, 2, L. Strug2, 1, D. Pal3 1) Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada; 2) Program in Child Health Evaluative Sciences, Research Institute, Hospital for Sick Children, Toronto, Ontario, Canada; 3) Department of Clinical Neurosciences, Institute of Psychiatry, Kings College London, London, United Kingdom.
Genetic association studies of sequence variants require prioritization for follow-up. Statistical evidence weighs heavily in the prioritization and standard statistical methods rank individual rare variants based on ordering p-values from Fishers exact tests (Pexact) or chi-square tests with continuity correction (Pcwc). But not all measures of statistical evidence result in the same rankings, and there has been debate over whether p-values adequately measure statistical evidence. For rare variants where the disease-variant distributions can be summarized into 2x2 tables, we propose to rank variants using the ratio of conditional likelihoods evaluated at the maximum conditional likelihood estimate (MCLE) of the odds ratio versus at an odds ratio of one (maxLRc). In the special but common case where a 2x2 table contains at least one empty cell, the MCLE does not always exist; however, we show analytically that the maxLRc is always well defined, and is equal to the inverse of the hypergeometric probability of the observed data. Using sequence data from a study of speech disorder in Rolandic Epilepsy with 10 cases and 17 controls, we show that the rankings by maxLRc, Pexact and Pcwc can be quite different for the same set of variants. Through simulation studies, we show that the maxLRc achieves better rankings than p-values based on several metrics: (1) the rankings assigned by the maxLRc correlate better with the true rankings, where the true rankings are defined by ordering the underlying effect sizes, (2) given K variants are to be selected for follow-up, the maxLRc results in a greater number of truly associated variants; and (3) the truly associated variants are, on average, ranked higher by maxLRc than by Pexact or Pcwc. The maxLRc uses only information in the observed data, while p-values further incorporate the probability of more extreme data that could have been observed. The maxLRc, a likelihood ratio, is a measure of statistical evidence as defined in the Evidential statistical paradigm as opposed to the Frequentist or Bayesian paradigms. Theoretical developments show it has good operational characteristics to measure evidence, even in small samples. Our findings suggest that the maxLRc outperforms p-value-based prioritizations for rare variants. It is straightforward to implement, and extends to the prioritization of common variants and data with different configurations.
You may contact the first author (during and after the meeting) at