Friday, August 26, 2011

Data: The PubMed Database


Pubmed is an online repository of over twenty million citations from biomedical literature and related fields taken from hundreds of journals and databases. The research topics range as broadly as biomedical research, computational applications to biology and chemistry, population dynamic studies, or chemical synthesis, and represent a wide range of interests across the scientific community.
Originally created by the National Library of Medicine (NLM) through a grant from the National Institutes of Public Health for quick government access to current publications, this database has quickly grown to represent a large number of interested parties, from public interest groups, government entities, and educational co-operatives, to private ventures like pharmaceutical and bioengineering companies.  This vast storage of biological and scientific knowledge and literature represents the sum of collective science in a field developed over decades by these same groups. Though the information is publically accessable, rights are held by the publishing journal and so on abstracts and older articles are freely available. Despite this, with such a powerful tool at our fingertips, scientific research can be facilitated by extending knowledge quickly and efficiently by searching through on specific and related scientific endeavors. 
But beyond even that, there has been a prolonged and increasing attempt to discover new scientific knowledge by automatically and intelligently searching through data rich repositories, like PubMed, and finding connections in the data that had previously not been elucidated. Often called data mining, PubMed facilitates this by having a robust and varied search and access system. The most basic element of all systems of information retrieval within PubMed relies on something called a Medical Subject Heading more commonly referred to as a MeSH term. A MeSH is the NLM controlled vocabulary thesaurus used for indexing the articles found on PubMed.  Using these MeSH terms, one is able to find articles of relevance by searching through the articles within the database that are tagged with these terms. This can be done either by using the web-based search interface, which is more intricate, refined, and rigorous an interface than most people are used to, or by downloading subsections of the entire database of PubMed articles (or at least the abstracts) and search through them using additional software tools like EndNote keyed to the MeSH terms.