Frequently Asked Questions

Application

Q. How do I apply for access to the datasets?

A. You can fill out the below SG10K_Pilot or SG10K_Health data access form and submit the completed form to contact_npco@gis.a-star.edu.sg.Please ensure you have read and understood the data access polices before applying. All applications will be subjected to the NPM Data Access Committee for review and approval.

SG10K_Health data access policy

SG10K_Health data access form

SG10K pilot data access policy

SG10K pilot data access form


Q. How long is the application review process?

A. The NPM Data Access Committee will carefully review each application form. The average processing time is 4-6 weeks.


Q. How do I amend my approved application form?

A. You may amend your application and resubmit your application form to contact_npco@gis.a-star.edu.sg. All amendments will be subjected to the same review process by the NPM Data Access Committee.

Q. What research studies have been approved to use the SG10K datasets?

A. You can click on the links to find out the list of approved research studies for SG10K_Pilot and SG10K_Health datasets.

Datasets

Q. What is the SG10K_Pilot dataset?

A. The SG10K_Pilot dataset refers to the EGAD00001005337 joint variant calling of 4,180 whole-genome sequencing data deposited on the EGA database. All datasets have been pseudonymised and so considered de-identified as described in the paper. Two files are available for access: 1) the genotype data arranged by chromosomes in VCF format, and 2) a metadata file containing the self-reported ethnicity.

Reference
Wu et al. Large-scale whole-genome sequencing of three diverse Asian populations in Singapore. Cell. 2019


Q. What is the SG10K_Health dataset?

A. The SG10K_Health data is a collection of integrated genomic and phenotypic data of 10,000 healthy and consented individuals of Chinese, Malay and Indian ethnicities. The SG10K_Health data is contributed from six  cohorts in Singapore: (1) Multi-Ethnic Cohort (MEC) study, (2) Health for Life in Singapore (HELIOS) study, (3) Growing Up in Singapore Towards healthy Outcomes (GUSTO) study, (4) TTSH Personalised Medicine Normal Controls (TTSH) study, (5) Singapore Epidemiology of Eye Diseases (SEED) study and (6) Biobank/SingHEART, SingHealth Duke-NUS Institute of Precision Medicine (PRISM) study.

S/N Dataset Description
1 SG10K_Health metadata A metadata file containing the self-reported ethnicity, sex and other research phenotypic variables.
2 SG10K_Health VCF (r5.3) Whole genome GATK joint variant calling of 9,770 individuals of Chinese, Indian and Malay ethnicities containing 179,418,971 variants.
3 SG10K_Health phased VCF for imputation (r5.5) Phased VCF of SG10K_Health_r5.3 for the purpose of genome imputation.
4 SG10K_Health DNA methylation array Whole genome DNA methylation on Illumina Infinium Methylation EPIC array (850K)
5 SG10K_Health Structural Variants (r1.4) 73,035 structural variants derived from 5,487 SG10K_Health participants using Manta, MELT and SurVindel (Tan et al, Nat Comms, 2024 )

Reference
Wong et. al. The Singapore National Precision Medicine Strategy. Nature Genetic. 2023

Q. What is the data platform to access the SG10K_Health dataset?

A. Approved researchers can access the SG10K_Health dataset via the RAPTOR data access platform for data analytics. Click on the below link to learn more about the RAPTOR platform

Genomic web services

Q. How can I have access to the genomic web services?

A. Register an account with the SG10K_Health web portal. If you already have an account, you can apply to access one or more genomic web services via the “Submit New Application- Genomic web services” function. To complete the web services application, you will need to provide a summary of how you will be using the services for your research.

Q. How do the different genomic web services work?

CHORUS Variant browser

The CHORUS variant browser provides access to the collection of variants found in SG10K_Health dataset composed of 10,000 whole genomes. It heavily leverages on the work of McAurthur lab and the gnomAD team, borrowing its back-end infrastructure (Spark/Hail sample-level genotyping data storage and manipulation, ElasticSearch storage of gender and ethnicity aggregated data, exposed via graphQL and React web application) and extending the gnomAD variant browser UI and APIs.

The CHORUS variant browser supports the download:
  1. SG10K_Health aggregated allele frequency.
  2. SG10K_Health aggregated structural variants (A Catalogue of Structural Variation across Ancestrally Diverse Asian Genomes).
HOW CHORUS variant browser WORKS
  1. Users can search by gene name and ID, transcript ID, variant ID or genomic region.
  2. It displays variant allele frequencies aggregated by gender and ethnicities.
  3. It provides variant level metrics and functional annotations (synonymous/missense, HGVS nomenclature and SIFT/Polyphen scores).
CHORUS BEACON

The CHORUS beacon provide information about the catalogue of variants found in SG10K_Health dataset composed of 10,000 whole genomes. A beacon is a standard for genetic mutation developed by the Global Alliance for Genomics and Health. CHORUS Beacon leverages on graphQL to query an ElasticSearch database in which genomic data collections are stored, translating beacon API version 1.1.0 RESTFUL queries to ElasticSearch via graphQL.

HOW CHORUS BEACON WORKS:
  1. Users request for a specific variant (chromosome, position, reference allele, alternate allele).
  2. Beacon respond “Yes” or “No” this variant is present in the dataset.
  3. Additional information such as allele frequencies aggregated by gender and ethnicities are displayed.
SNPDRUG3D

SNPdrug3D is a web application that lets users explore the effects SNPs (single-nucleotide polymorphism) have on two protein levels: sequence and structure, with special respect to drug binding. This enables both identifications of known and new SNPs with pharmacogenetic effects and annotate variants of unknown significance (VUS).

HOW SNPDRUG3D WORKS:
  1. Users can search for SNPs using 10 different categories including SNP coordinate, protein ID, gene name, drug name etc.
  2. On the sequence level, users can observe if a SNP falls in protein functional features and estimate its effect.
  3. On a structural level, users can observe if a SNP falls in a drug binding pocket and estimate its effect on the binding.
  4. For visualization, users can choose between the sequence feature viewer or a structure feature viewer. Selecting a SNP opens up a protein viewer tab showing the protein data associated with the SNP and visualize them in the 3D viewer.
  5. In the structure feature viewer, users can see the selected SNP and associated drug and select different protein drug binding (PDB) structures associated with the current protein.
IMPUTATION SERVER

The Imputation Server allows researchers to estimate missing genotypes on haplotype data. The Imputation Server is currently limited to access by members of Agency for Science, Technology and Research (A*STAR), National University of Singapore (NUS) and Nanyang Technological University (NTU), Singapore.

HOW AN IMPUTATION SERVER WORKS:
  1. The multi-level parallelization genotype imputation service uses supercomputers of the National Super Computing Centre (NSCC) to provide fast GWAS genotype imputation.
  2. The server uses Minimac4 to generate imputed genomes and lets users download the result.
  3. Users can upload GWAS genotypes, select reference panels, phasing methods for unphased data, and select specific populations.
PRS WEB SERVICE

The Polygenic Risk Scores (PRS) web service is an intuitive tool for exploring PRS on the SG10K_Health cohort.

HOW PRS WEB SERVICE WORKS:
  1. Users can examine and visualize distributions of the scores and their associations with available phenotypes on the cohort.
  2. Associations of PRS and phenotypes will be performed securely without the need for access to individual level data.
  3. Users can perform analyses on a subset of the cohort by specifying age, gender, and ethnicity inclusion criteria.

 

Contact us

If you have any questions that we haven’t been able to answer. You may contact the SG10K_Health team at contact_npco@gis.a-star.edu.sg.