Frequently Asked Questions

Application

Q. How do I apply for access to the datasets?

A. You need to first register with the SG10K_Health web portal. If you already have a user account with us, you can apply to access the SG10K_Pilot or SG10K_Health datasets via the ‘Submit New Application’ function. To complete an application form, please ensure you have read and understood the Terms and Conditions, Data Access Policies and Access agreement. The NPM Data Access Committee will review the submitted application form and notify you of the outcome via the web portal.

SG10K_Health data access policy

SG10K_Health data access form

SG10K pilot data access policy

SG10K pilot data access form


Q. How long is the application review process?

A. The NPM Data Access Committee will carefully review each application form. The average processing time is 4-6 weeks.


Q. How do I amend my approved application form?

A. To amend an approved application form, you can sign in to your account and access your application form via ‘My Application’. Click on ‘Amend’ next to your approved application form to make the changes and submit the edited form.

Q. What research studies have been approved to use the SG10K datasets?

A. You can click on the links to find out the list of approved research studies for SG10K_Pilot and SG10K_Health datasets.

Datasets

Q. What is the SG10K_Pilot dataset?

A. The SG10K_Pilot dataset refers to the EGAD00001005337 joint variant calling of 4,180 whole-genome sequencing data deposited on the EGA database. All datasets have been pseudonymised and so considered de-identified as described in the paper. Two files are available for access: 1) the genotype data arranged by chromosomes in VCF format, and 2) a metadata file containing the self-reported ethnicity.


Q. What is the SG10K_Health dataset?

A. The SG10K_Health data is a collection of integrated genomic and phenotypic data of 10,000 healthy and consented individuals of Chinese, Malay and Indian ethnicities. The SG10K_Health data is contributed from six  cohorts in Singapore: (1) Multi-Ethnic Cohort (MEC) study, (2) Health for Life in Singapore (HELIOS) study, (3) Growing Up in Singapore Towards healthy Outcomes (GUSTO) study, (4) TTSH Personalised Medicine Normal Controls (TTSH) study, (5) Singapore Epidemiology of Eye Diseases (SEED) study and (6) Biobank/SingHEART, SingHealth Duke-NUS Institute of Precision Medicine (PRISM) study.

Genomic web services

Q. How can I have access to the genomic web services?

A. Register an account with the SG10K_Health web portal. If you already have an account, you can apply to access one or more genomic web services via the ‘Submit New Application- Genomic web services’ function. To complete the web services application, you will need to provide a summary of how you will be using the services for your research.

Q. How do the different genomic web services work?

CHORUS Variant browser

The CHORUS variant browser provides access to the collection of variants found in SG10K_Health dataset composed of 10,000 whole genomes. It heavily leverages on the work of McAurthur lab and the gnomAD team, borrowing its back-end infrastructure (Spark/Hail sample-level genotyping data storage and manipulation, ElasticSearch storage of gender and ethnicity aggregated data, exposed via graphQL and React web application) and extending the gnomAD variant browser UI and APIs.

HOW CHORUS variant browser WORKS
  1. Users can search by gene name and ID, transcript ID, variant ID or genomic region.
  2. It displays variant allele frequencies aggregated by gender and ethnicities.
  3. It provides variant level metrics and functional annotations (synonymous/missense, HGVS nomenclature and SIFT/Polyphen scores).
CHORUS BEACON

The CHORUS beacon provide information about the catalogue of variants found in SG10K_Health dataset composed of 10,000 whole genomes. A beacon is a standard for genetic mutation developed by the Global Alliance for Genomics and Health. CHORUS Beacon leverages on graphQL to query an ElasticSearch database in which genomic data collections are stored, translating beacon API version 1.1.0 RESTFUL queries to ElasticSearch via graphQL.

HOW CHORUS BEACON WORKS:
  1. Users request for a specific variant (chromosome, position, reference allele, alternate allele).
  2. Beacon respond ‘Yes’ or ‘No’ this variant is present in the dataset.
  3. Additional information such as allele frequencies aggregated by gender and ethnicities are displayed.
SNPDRUG3D

SNPdrug3D is a web application that lets users explore the effects SNPs (single-nucleotide polymorphism) have on two protein levels: sequence and structure, with special respect to drug binding. This enables both identifications of known and new SNPs with pharmacogenetic effects and annotate variants of unknown significance (VUS).

HOW SNPDRUG3D WORKS:
  1. Users can search for SNPs using 10 different categories including SNP coordinate, protein ID, gene name, drug name etc.
  2. On the sequence level, users can observe if a SNP falls in protein functional features and estimate its effect.
  3. On a structural level, users can observe if a SNP falls in a drug binding pocket and estimate its effect on the binding.
  4. For visualization, users can choose between the sequence feature viewer or a structure feature viewer. Selecting a SNP opens up a protein viewer tab showing the protein data associated with the SNP and visualize them in the 3D viewer.
  5. In the structure feature viewer, users can see the selected SNP and associated drug and select different protein drug binding (PDB) structures associated with the current protein.
IMPUTATION SERVER

The Imputation Server allows researchers to estimate missing genotypes on haplotype data. The Imputation Server is currently limited to access by members of Agency for Science, Technology and Research (A*STAR), National University of Singapore (NUS) and Nanyang Technological University (NTU), Singapore.

HOW AN IMPUTATION SERVER WORKS:
  1. The multi-level parallelization genotype imputation service uses supercomputers of the National Super Computing Centre (NSCC) to provide fast GWAS genotype imputation.
  2. The server uses Minimac4 to generate imputed genomes and lets users download the result.
  3. Users can upload GWAS genotypes, select reference panels, phasing methods for unphased data, and select specific populations.
PRS WEB SERVICE

The Polygenic Risk Scores (PRS) web service is an intuitive tool for exploring PRS on the SG10K_Health cohort.

HOW PRS WEB SERVICE WORKS:
  1. Users can examine and visualize distributions of the scores and their associations with available phenotypes on the cohort.
  2. Associations of PRS and phenotypes will be performed securely without the need for access to individual level data.
  3. Analyses can be performed on a selection of published PRS or user-specified custom PRS.
  4. Users can perform analyses on a subset of the cohort by specifying age, gender, and ethnicity inclusion criteria.

 

Contact us

If you have any questions that we haven’t been able to answer. You may contact the SG10K_Health team at contact_npco@gis.a-star.edu.sg.