Understanding the context and origin of genomic datasets is critical for discovery, appropriate reuse, and giving credit to data generators. Yet, basic provenance information (who created a dataset, under what project, with what funding, where it has been published) is often incomplete or inconsistent across repositories.
This working session will:
- map the current landscape of provenance standards (W3C PROV, FAIR principles, domain-specific approaches);
- identify concrete gaps where GA4GH could add value;
- examine how Beacon summaries and other Discovery products could expose provenance information;
- develop GA4GH recommendations for applying existing standards to genomic data sharing.
We are seeking input from data producers, repository managers, and researchers who need to answer questions like: "What project generated this cohort?"; "Where was this dataset collected?"; "What publications describe or use this data?"; "Who funded this work?"; and "How large is this resource?" The goal is not to create new standards, but to provide clear guidance on using existing provenance frameworks effectively in genomic contexts.
Please log in to join sessions virtually.