Understanding the context and origin of genomic datasets is critical for discovery, appropriate reuse, and giving credit to data generators. Yet, basic provenance information (who created a dataset, under what project, with what funding, where it has been published) is often incomplete or inconsistent across repositories.
This working session will:
- map the current landscape of provenance standards (W3C PROV, FAIR principles, domain-specific approaches);
- identify concrete gaps where GA4GH could add value;
- examine how Beacon summaries and other Discovery products could expose provenance information;
- develop GA4GH recommendations for applying existing standards to genomic data sharing.
We are seeking input from data producers, repository managers, and researchers who need to answer questions like: "What project generated this cohort?"; "Where was this dataset collected?"; "What publications describe or use this data?"; "Who funded this work?"; and "How large is this resource?" The goal is not to create new standards, but to provide clear guidance on using existing provenance frameworks effectively in genomic contexts.