Name
Provenance metadata for genomic datasets
Date & Time
Thursday, April 16, 2026, 4:00 PM - 5:30 PM
Description

Understanding the context and origin of genomic datasets is critical for discovery, appropriate reuse, and giving credit to data generators. Yet, basic provenance information (who created a dataset, under what project, with what funding, where it has been published) is often incomplete or inconsistent across repositories.

This working session will:

  • map the current landscape of provenance standards (W3C PROV, FAIR principles, domain-specific approaches);
  • identify concrete gaps where GA4GH could add value;
  • examine how Beacon summaries and other Discovery products could expose provenance information;
  • develop GA4GH recommendations for applying existing standards to genomic data sharing.

We are seeking input from data producers, repository managers, and researchers who need to answer questions like: "What project generated this cohort?"; "Where was this dataset collected?"; "What publications describe or use this data?"; "Who funded this work?"; and "How large is this resource?" The goal is not to create new standards, but to provide clear guidance on using existing provenance frameworks effectively in genomic contexts.

Session topic(s)
Discovery
Session format(s)
Working session: collaborative work toward a specific goal or consensus, Group Discussion: particular topic to gather feedback, perspectives, or ideas
Suggested level of familiarity
Level 2: Have a basic understanding to follow the conversation and contribute thoughtfully