Genomics research data is the type of data generated from human genomics research. This data may be either the output of academic research or a result of research in industry, for example from clinical trials in the pharmaceutical industry.

Sharing of human genomics data has challenges similar to the issues related to sharing of health data[1] and the handling, managing and sharing of the data must be in accordance with patient consent.

Data formats

Standardisation of annotation vocabularies is important for sharing data in an interoperable fashion, the gene ontology is an open standard enabling linking and access to different data sources via a shared vocabulary. Data annotated with a OWL ontology like the gene ontology and marked up using RDF can be queried in a unified way, effectively providing a standardised, machine readable API.

Here are some examples of data formats used in regards to genomics [2]

  • HGNC gene symbols for naming and identifying human genes
  • MGI gene symbols for naming and identifying mouse genes
  • Microarray probe identifiers
  • ORF identifiers in yeast

Data repositories

There are a number of genomic data resources already in existence and some larger scale projects planned in the next 3 to 5 years. Many of the existing projects are not publicly available but only available through request for academics or researchers. Examples of current genomics projects include People of the British Isles and the International HapMap project. Some larger scale project are being developed by governments prospecting that large genomic databases will be valuable resources for health science and pharmaceuticals. Such large scale projects are planned in the UK (100 thousand genomes project) and The Qatar Genome Project.[3] As these large scale projects progress issues of data sharing will become more poignant.

