Sponsor Details: GA4GH 13th Plenary

x

Sponsor Details

Poster Number

45

Poster Title

Haplotypes and Human Diversity in Proteomics

Authors

Jakub Vašíček (1,2) , Dafni Skiadopoulou (1,2,3), Ksenia G. Kuznetsova (1,2), Pål R Njølstad (1,4), Stefan Johansson (1,5), Stefan Bruckner (6), Lukas Käll (7), Marc Vaudel (1,2,3)

1 - Department of Clinical Science, University of Bergen, Bergen, Norway
2 - Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
3 - Department of Genetics and Bioinformatics, Health Data and Digitalization, Norwegian Institute of Public Health, Oslo, Norway
4 - Children and Youth Clinic, Haukeland University Hospital, Bergen, Norway
5 - Department of Medical Genetics, Haukeland University Hospital, Bergen, Norway
6 - Institute for Visual and Analytic Computing, University of Rostock, Rostock, Germany
7 - Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH - Royal Institute of Technology, Stockholm, Sweden

Abstract

Genomic research has long benefited from using diverse population panels, increasing the statistical power of association studies for participants from admixed populations. However, mass spectrometry-based proteomic workflows often project all data to a set of reference protein sequences. Consequently, we obscure a portion of the proteome, and risk introducing a bias against populations with a different haplotypic structure.

Alleles co-occurring in the protein-coding regions of the same gene produce a unique protein sequence - protein haplotype. These haplotypes are present in biological samples, and detectable by mass spectrometry. We have demonstrated that thousands of amino acid substitutions can be discovered in a single sample, sometimes featuring alleles in linkage disequilibrium within the same peptide after a tryptic digestion of the protein. We have recently released ProHap, a bioinformatic pipeline that allows building protein sequence databases from panels of phased genotypes.

Initially, we generated proteomic databases from the 1000 Genomes Project and showed that participants of the African superpopulation diverge from the reference proteome more than others, while all the included ancestry groups show notable differences from the reference proteome. However, ProHap can be run on public as well as local reference panels, with great flexibility in terms of types of genetic variants and haplotype frequency, empowering researchers to tailor their proteomic studies to populations. This provides a great opportunity to translate the benefits of genomic data sharing into the field of proteomics, empowering researchers to tailor their proteomic studies to populations.

Finally, to allow a rapid insight into the complexity of such proteogenomic datasets, we have developed a web-based visual interface mapping identified peptides to genes, haplotypes, and spliced transcripts. ProHap Explorer allows researchers to browse the influence of common haplotypes on any gene, and view the coverage of encoded proteoforms in mass spectrometry data.

Close