PGS Catalog - About the Polygenic Score Catalog

About the PGS Catalog

This page contains information regarding the PGS Catalog Project.

What is a Polygenic Score?

A polygenic score (PGS) aggregates the effects of many genetic variants into a single number which predicts genetic predisposition for a phenotype. PGS are typically composed of hundreds-to-millions of genetic variants (usually SNPs) which are combined using a weighted sum of allele dosages multiplied by their corresponding effect sizes, as estimated from a relevant genome-wide association study (GWAS).

PGS nomenclature is heterogeneous: they can also be referred to as genetic scores or genomic scores, and as polygenic risk scores (PRS) or genomic risk scores (GRS) if they predict a discrete phenotype, such as a disease.

The PGS Catalog Project

The PGS Catalog is an open database of published polygenic scores (PGS). Each PGS in the Catalog is consistently annotated with relevant metadata; including scoring files (variants, effect alleles/weights), annotations of how the PGS was developed and applied, and evaluations of their predictive performance. See the PGS Catalog Data Description page for a complete description of the metadata captured for PGS, Samples, Performance Metrics, Traits, and Publications.

Citation

The PGS Catalog development is led by Samuel Lambert under the supervision of Michael Inouye (University of Cambridge & Baker Institute) in collaboration with Health Data Research - UK (Laurent Gil) and the EBI Samples, Phenotypes and Ontologies team / NHGRI-EBI GWAS Catalog (Helen Parkinson, Aoife McMahon, Laura Harris).

The Catalog is under active development, and we continue to add new features and curate new data. If you use the Catalog or Calculator in your research we ask that you cite our below flagship publications:

Samuel A. Lambert, Benjamin Wingfield, Joel T. Gibson, Laurent Gil, Santhi Ramachandran, Florent Yvon, Shirin Saverimuttu, Emily Tinsley, Elizabeth Lewis, Scott C. Ritchie, Jingqin Wu, Rodrigo Canovas, Aoife McMahon, Laura W. Harris, Helen Parkinson, Michael Inouye

Enhancing the Polygenic Score Catalog with tools for score calculation and ancestry normalization

Nature Geneticsdoi: 10.1038/s41588-024-01937-x (2024).

Samuel A. Lambert, Laurent Gil, Simon Jupp, Scott C. Ritchie, Yu Xu, Annalisa Buniello, Aoife McMahon, Gad Abraham, Michael Chapman, Helen Parkinson, John Danesh, Jacqueline A. L. MacArthur, Michael Inouye

The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation

Nature Genetics volume 53, pages420–425doi: 10.1038/s41588-021-00783-5 (2021).

Individual PGS obtained from the database should also be cited appropriately, and used in accordance with any licensing restrictions set by the authors (see our Terms of Use for more information).

All PGS Catalog Publications:

Lambert, Wingfield et al, 2024 Enhancing the Polygenic Score Catalog with tools for score calculation and ancestry normalization. Nature Genetics.
Sollis et al, 2023 The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic Acids Research 51(D1):D977-D985.
Lambert et al, 2021 The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nature Genetics volume 53, pages420–425.

PGS Catalog Inclusion Criteria

For a publication's data to be included in the PGS Catalog it must contain one of the following:

A newly developed PGS. This includes the following information about the score and its predictive ability (evaluated on samples not used to develop the score):
- Variant information necessary to apply the PGS to new samples (variant rsID and/or genomic position, weights/effect sizes, effect allele, genome build).
- Information about how the PGS was developed (computational method, variant selection, relevant parameters).
- Descriptions of the samples used to develop the score (e.g. discovery of the variant associations [GWAS samples, which can usually be extracted directly from the GWAS Catalog using GCST IDs], as well as other samples used to develop/train the PGS) and external evaluation.
- Establishment of the PGS' analytic validity, and a description of its predictive performance (e.g. effect sizes [beta, OR, HR, etc.], classification accuracy, proportion of the variance explained (R²), and any covariates evaluated in the PGS prediction).
An evaluation of a previously developed PGS. This would include the evaluation of PGS already present in the catalog (or eligible for inclusion), on samples not used for PGS development. The requirements for description would be the same as for the evaluation of a new PGS.

A complete description of the data captured for each PGS and publication can be found here.

The PGS Catalog is based on data extracted from publications, as well as data deposited by authors directly. A weekly literature search against PubMed identifies peer-reviewed journal publications that meet the PGS Catalog eligibility criteria (detailed above). Literature search and triage is performed using a machine-learning assisted triage system LitSuggest, developed at NCBI. Scores, samples, traits and performance metrics, are extracted from PubMed-indexed journal publications. Authors are encouraged to submit their PGS and evaluations to us by e-mail for curation and inclusion in the PGS Catalog - we are developing a streamlined interface to submit these data in the future.

Data Submission

If you have a PGS or publication that meets the Catalog's eligibility requirements we invite you to submit your data by e-mail ( pgs-info@ebi.ac.uk). To ensure a speedy curation and inclusion into the catalog it would be helpful if you provide the following information about your study:

Source Publication. PubMed identifier and/or publication doi.
The Polygenic Score(s). If you've developed a new PGS we'll need the variant-level information required to calculate the score on new samples (see our Scoring Files documentation for more description). If an existing PGS was used it would be helpful if you provided its PGS ID, or links to the original publication.
- Once your scoring files are ready for submission, we encourage you to use our Scoring File Validator prior to sending them to us to accelerate the processing of your files.
A completed PGS Catalog Curation Template - Optional.
1. Download the current PGS Catalog Curation Template .xlsx on GoogleDocs - this template forms the basis of our curation pipeline and future PGS deposition framework.
2. Fill out the downloaded PGS Catalog curation template with your study metadata. We provide a set of PGS Catalog Curation Guidelines .docx with detailed instructions, and examples of how to record your data in the PGS Catalog template.
3. (Optional) Validate the filled out PGS Catalog curation template using the PGS Curation Template Validator.

Pre-publication submissions: The PGS Catalog also allows pre-publication submissions that authors may wish to embargo until publication. In this case the journal name can be provided, and a filled out curation template is required. Scores can then be assigned PGS Catalog IDs so that they may be added to the manuscript.

Missing PGS studies: You can also report/recommend studies for inclusion in the PGS Catalog using this form: Report missing PGS study. However, please send us the PGS by e-mail if you are the paper’s author and can share the variant-level score information.

PGS Catalog Software/Tools

All the code developed in PGS Catalog is publicly available on GitHub [PGSCatalog]. Here are some of the tools that can be useful for the community:

pgsc_calc: a reproducible workflow to calculate both PGS Catalog and custom polygenic scores. The workflow automates PGS downloads from the Catalog, variant matching between scoring files and target genotyping samplesets, and the parallel calculation of multiple PGS. Genetic ancestry assignment and PGS normalisation methods are also supported.
- See the full documentation here.
- You can also find more information about the calculator from our recent webinar: Calculating polygenic scores with the Polygenic Score Catalog Calculator .
pgscatalog_utils: python package providing a collection of useful tools for working with data from the PGS Catalog, such as scoring files download, combining multiple scoring files or matching target variants against scoring files. More information on PyPi.
PGS Catalog Curation Template Validator: web tool for validating PGS Catalog curation templates.
PGS Catalog Scoring File Validator: web tool for validating PGS Catalog scoring files.

Features Under Development

Including more PGS & developing a deposition interface.
We are actively curating more PGS for inclusion in the Catalog, responding to submissions from authors, and are committed to increasing the diversity of traits. We are also developing an interface so that authors can easily deposit PGS or PGS evaluations into the PGS Catalog using a standard template.

Feedback & Contact Information

To submit a PGS to the catalog, provide feedback, or ask questions please contact the PGS Catalog team at pgs-info@ebi.ac.uk.

Acknowledgements

We wish to acknowledge the help of the following people & teams for their support of the PGS Catalog:

PGS Catalog Team: Sam Lambert¹, Laurent Gil², Benjamin Wingfield³, Florent Yvon⁴, Joel Gibson^1,4, Aoife McMahon^1,5, Santhi Ramachandran⁵, Elizabeth Lewis⁵, Laura Harris⁵, Helen Parkinson⁶, Richard Houghton², Prof. John Danesh², Michael Inouye⁴

Previous Contributors: Emily Tinsley³, Shirin Saverimuttu³, Jackie MacArthur³, Simon Jupp³, James Hayhurst³, Trish Whetzel³, Michael Chapman², Jonathan Marten⁴, Petar Scepanovic⁴, Gad Abraham⁴

1: PGS Catalog Data Curators
2: Health Data Research UK, Cambridge
3: European Bioinformatics Institute
4: Inouye Lab
5: NHGRI-EBI GWAS Catalog Team
6: EMBL-EBI Samples Phenotypes and Ontologies Team

The PGS Catalog is delivered by collaboration between the EMBL-EBI and University of Cambridge and funded by NHGRI (1U24HG012542-01), Health Data Research UK and the Baker Heart & Diabetes Institute.