The Impact of GRCh38 on Clinical Sequencing. D. M. Church, J. Harris, G. Bartha, M. Pratt, A. Patwardhan, S. Chervitz, S. Kirk, M. Clark, S. Garcia, J. West, R. Chen Personalis, Inc, Menlo Park, CA.
Our ability to analyze and interpret individual human genomes relies upon comparison to a high quality reference assembly, the guidepost upon which we identify variants and interpret sequence data. GRCh37 has been the reference assembly for over four years. During this time a number of large-scale projects such as 1000 genomes, ENCODE and GO-ESP have placed detailed annotation in the context of GRCh37. Additionally, this assembly has been a workhorse in the clinical sequencing arena and is still the standard in most clinical testing labs doing genome wide testing. In December of 2013, the Genome Reference Consortium (GRC) released an updated version of the reference assembly called GRCh38. The new assembly adds several megabases of sequence not present in GRCh37, corrects numerous misassembled regions, provides better representation of several hundred medically relevant genes and adds previously unrepresented genes. Despite the marked improvement of this assembly, clinical labs face numerous challenges transitioning to GRCh38. Annotation content needs to be added to GRCh38 before a lab can consider re-validating clinical protocols using the new assembly. Additionally, new characterization methods along the entire analysis pipeline that take advantage of the full assembly, which now includes over 170 regions with alternate sequence representations, need to be developed. Addressing these challenges, we have continued work on our variant calling and annotation pipeline. Employing a stepwise approach, we are initially developing a pipeline to take advantage of the FIX patches that the GRC releases on a quarterly basis. This allows us to start investigating some GRCh38 sequence in the larger context of GRCh37 annotation and positions us to take advantage of GRCh38 FIX patches when they are released. Preliminary data shows that these sequences improve alignments both within the patch regions and outside of the patch regions as off-target alignments are reduced globally. We are also investigating approaches that will allow us to use the alternate loci that are released as part of the full assembly. Lastly we are transitioning annotation content from GRCh37 to both the FIX patches as well to GRCh38. This process has identified regions of the new assembly that are completely devoid of biological information, and has uncovered sets of data that need to be re-evaluated in light of the new assembly.
You may contact the first author (during and after the meeting) at