Posted By: HGG Advances
Each month, the editors of Human Genetics and Genomics Advances interview an early-career researcher who has published work in the journal. This month we check in with Charleston Chiang (@CharlestonCWKC) to discuss his paper “Inverted genomic regions between reference genome builds in humans impact imputation accuracy and decrease the power of association testing”.
HGGA: What motivated you to start working on this project?
CC: I would say the project started with the first author of the paper, Xin Sheng, who is a staff scientist here in the Center for Genetic Epidemiology as USC. It was really her unrelentless tenacity to track down a quality control anomaly that would have been so easy to dismiss. When experimenting with the TOPMED imputation server, Xin observed seemingly nuanced allele mismatches and strand issues when comparing imputation results between GRCh37 and GRCh38. The LiftOver error on TOPMed imputation server is just another manifest of the same underlying issue. We were searching for an answer which can explain all of these anomalies, which eventually led to an approach to easily detect the SNV sites that are affected as presented in the paper. This underlying issue, as we eventually come to realize, impacts a small proportion of the genome (~ 0.2%). Even some of our co-authors suggested just dropping the SNVs from analysis. It was really Xin’s persistence that made the manuscript a reality.
HGGA: What about this paper/project most excites you?
CC: The most exciting part of this project is the experience of digging really deeply into an anomaly that started out so puzzling. We were able to put all the pieces together, figure out the root cause and work with dedicated colleagues and students who care about making things right even when it only accounts for a small portion of the genome.
HGGA: What do you hope is the impact of this work for the human genetics community?
CC: As we enter the third decade of GWAS, we have now become less focused on quality controls and less interested to ensure that genetic studies are performed optimally. QC steps are now often buried in the Supplemental Methods, and are barely mentioned in talks and presentations. We hope that our work re-invigorate the care one needs to have when handling genetic data and when designing genetic studies. If half of science is asking the right question, the other half of science ought to be the rigorous quality controls.
HGGA: What are some of the biggest challenges you’ve faced as a young scientist?
CC: The biggest challenge I have faced as an early career investigator is to identify and retain interested students and postdocs. This appears to be more of an issue at the postdoc level than at the student level currently. In many ways the freedom afforded by academia is difficult to obtain elsewhere. This particular paper may be an example of it too – what drove us was ultimately a curiosity, an anomaly that bugged us. We could have been told to just ignore the problem and move on, but the freedom we have to dig really deeply into this anomaly brings a feeling of reward that cannot be obtained if we are just executing per protocol.
HGGA: And for fun, what is one of the most fascinating things in genetics you’ve learned about in the past year or so?
CC: As a geneticist, one of the most fascinating recent developments in genetics for me has been the ability to start inferring genome-wide genealogical trees that connect everyone in a population. It was only a few years ago when we could only infer these trees for tens of individuals. Now we can do it for tens of thousands. The inference methods are still not quite perfect, and errors in estimates are not always properly accounted for, but in theory, these trees would record all there is to know about the population’s history, in terms of demographic events, mutations and recombinations, etc. This opens up the possibility of many downstream statistical and population genetic applications, which we have worked on (e.g. Fan, Mancuso and Chiang, AJHG 2022) and others are innovating as well. I have enjoyed learning of these developments over the last year or so. It is also when applying these methods in empirical analysis that we come to realize the need to carefully curate genetic data as downstream inferences can be sensitive to arbitrary choices during QC.