Monday, August 29, 2011

Data: The Ensembl Project opens up genome data

The Ensembl project presents the human genome as well as the genetic information for many other vertebrae species. The human genome itself is very large and has taken many years and millions of dollars to originally map. Ensemble provides access to the genomes of over 50 different species.

The data presented can be downloaded in a variety of formats including many specialized genome formats and the very common mysql database format. In addition to being available via the web in browsable form, it can also be fetched in piecemeal via the sites api. The api can only be used via a downloadable perl library on the site.

One interesting part of the site is the searchable nature of the genes. To search for "red hair" on the human genome returns a list of genes responsible for the red pigmentation in humans hair, eye, and skin.

The Ensemble project was started in 1999 with the goal of providing genome data to researchers in an annotated form and combine annotation from multiple researchers.