ASHG recently submitted a response to the NOT-OD-22-029 Request for Information (RFI) from the National Institutes of Health (NIH) pertaining to its Genomic Data Sharing Policy (GDS policy). The GDS policy regulates the expectations and procedures for researchers who generate genomic data as part of their NIH-funded research, as well as for researchers who wish to gain access to individual-level genomic information housed in NIH databases. The NIH is exploring changes to this policy to account for technology and research advances since the policy was first established in 2014, and has released this RFI to receive public comments on specific aspects of the GDS policy.
- De-identification. The risks and benefits of expanding de-identification options, including adding the expert determination described at HIPAA 45 CFR 164.514 (b)(1)(the HIPAA Privacy Rule), as an acceptable method for de-identification under the GDS Policy, and whether other de-identification strategies exist that may be acceptable in lieu of HIPAA standards.
The American Society of Human Genetics (ASHG) is a society of more than 8,000 genetics professional members, with the mission to advance human genetics and genomics in science, health and society through excellence in research, education, and advocacy. Many of our members conduct NIH-funded human genome research and are therefore subject to the Genomic Data Sharing Policy (GDSP).
Broad data sharing is fundamental to the advancement of genomic sciences. At the same time, given that human genome information is personal and sensitive, it is important that data is shared in a way that preserves the privacy of research participants. ASHG welcomes the NIH exploring how the GDSP might be improved to achieve this.
We believe that the requirement to remove the HIPAA identifiers from genomic data submitted to the NIH does impose limitations on the potential to draw correlations between genotype, phenotype, and environmental information. Therefore, we see the significant benefit to the GDSP allowing for flexibility in how data should be de-identified if individuals’ privacy can be preserved.
- Use of potentially identifiable information.The circumstances under which submission of data elements considered potentially identifiable to repositories under the GDS Policy would be acceptable, any additional protections (including for security) that would be warranted, and whether there is certain potentially identifiable information that would not be acceptable to submit.
In general, ASHG believes removal of HIPAA identifiers is important to protect research participant privacy. However, the inclusion of select HIPAA identifiers may enable further scientific insight without significantly increasing the risk of re-identification. We therefore think there is value in providing alternatives to de-identification policies while recognizing the need to consider the greater risks of re-identification for individuals within particular populations. Some HIPAA identifiers, such as zip codes, age, and other dates related to an individual, could provide valuable data for examining biological and environmental correlations. For example, access to ages and dates of symptom onset, diagnoses, or treatments would be highly beneficial for research on age-related diseases/phenotypes or longitudinal studies of disease etiology. Additionally, access to environmental identifiers would allow researchers to compare potential sociological or environmental factors in association with genotypes and diagnoses to better elucidate gene-environment interactions.
To protect the privacy of research participants, we urge additional caution regarding inclusion of HIPAA identifiers in datasets in certain circumstances:
- Where individuals are at a greater risk of re-identification due to their geographic location, individuals should not have their zip code retained in de-identified GDS databases. Geographic properties that pose a higher risk to re-identification include zip codes within lower population-dense regions or within or near tribal reservations/land. We recommend implementing a minimum population threshold for the inclusion of an individual’s zip code. We also recommend exclusion of zip codes that overlap with tribal geographic jurisdictions.
- Where the datasets include individuals with rare diseases, we advise additional caution and consideration of the re-identification risks associated with harboring rare genetic variants and unusual clinical records.
- Where ages and dates of diagnosis/treatment are included in datasets, Date of Birth (DOB) should not be retained as this could significantly increase the risk of re-identification.
- Data linkage.Whether the GDS Policy should permit data linkage between datasets that meet GDS Policy expectations (e.g., data obtained with consent for research use and de-identification), and whether the GDS Policy should support such linkages to datasets that do not meet all GDS Policy expectations (e.g., data may have come from a clinical setting, may not have been collected with consent, may retain certain potentially identifiable information). Feedback is also requested on risks and benefits to any such approaches.
ASHG supports broader data linkage and sharing, as data linkage has proven highly beneficial for human genetics and genomics. However, implementation of data linkage should not pose any additional risks to participant privacy or re-identification, and methods of data linkage should ensure privacy of the individual.
- Consent for data linkage. Whether data linkage should be addressed when obtaining consent for sharing and future use of data under the GDS Policy, as well as in IRB consideration of risks associated with submission of data to NIH genomic data repositories. And if so, how to ensure such consent is meaningful.
ASHG agrees informed consent must include the potential for future data sharing under the GDS Policy, as well as disclosure of the risks associated with inter-repository sharing. To ensure this disclaimer is meaningful and understood by the individual, it should be clearly outlined in the consent along with the methods of protection in place to prevent re-identification.
- Harmonizing GDS and DMS Policies.Any aspect of the approach to harmonize GDS and DMS Policies and Plans described in the Notice, including for non-human genomic data.
ASHG supports the harmonization of GDS and DMS policies, thereby alleviating the administrative burden for researchers having to comply with two separate policies. We recommend that data sharing plans be submitted in concordance with the DMS Policy requirement, requiring submission as a component of the initial funding application. Such a modification in the policy would highlight the importance of management and sharing of genomic data with all other aspects of study design.
- Types of research covered by the GDS Policy.
- Whether there are other types of research and/or data beyond the current scope of the GDS Policy that should be considered sensitive or warrant the type of protections afforded by the GDS Policy (e.g., with consent for future use and to be shared broadly, as well as IRB review of risks associated with submitting data to NIH), even when data are de-identified.
- Whether small scale studies (e.g., studies of fewer than 100 participants) and those involving other data types (e.g., microbiomic, proteomic) should be covered under the GDS Policy, and if training and development awards (e.g., F, K, and T awards) should be covered by the GDS Policy (“Implementation of the NIH Genomic Data Sharing Policy for NIH Grant Applications and Awards,” NOT-OD-14-111).
- Whether NIH-funded research that generates large-scale genomic data but where NIH’s funding does not directly support the sequencing itself should be covered by the GDS Policy.
ASHG affirms that the sharing of non-genomic –omics data (e.g., transcriptome, proteome, microbiome, etc.) does not pose the same kind of risk of re-identification risk as the sharing of genomic information. Given the state of current science, we do not believe that they warrant the same level of protection at this time. However, we would welcome sharing of additional data types that present a more comprehensive profile of phenotypic involvement, and do not see other types of –omics data as posing additional identifiability risks. This should be revisited if the technology advances and re-identifiability becomes a risk.