Monday, August 29, 2011

Data: DBpedia dataset

Over the years Wikipedia has grown to become the main source of knowledge. Wikipedia stores knowledge in the form of infobox templates, categorization info, images, geo-coordinates, links to external web-pages. DBpedia makes use of this structured information and builds a huge data set describing over 3.64 million "things" and over half a billion facts. The DBpedia data set has its data stored in the RDF format. Resource Description Format has data stored in the form of triples. Each triple has a subject-object-predicate structure. SPARQL query language is used to query this data.

As far as the vastness of DBpedia goes, the DBpedia knowledge base describes over 3.64 million things comprising of 416,000 persons, 526,000 places, 106,000 music albums, 60,000 films, 169,000 organizations, 183,000 species and 5,400 diseases in over 97 different languages. The DBpedia knowledge base altogethter consists of over 1.2 billion triples. This is the status of the DBpedia 3.6 as of July 2011. DBpedia project provides faceted search to query this enormous data set. Facets act as filters so that a user interested in a specific article can get to it. One can enter totally unrelated facets and still DBpedia due to its vast structured information gets information pertinent to all these facets.

This is a snapshot of a faceted search(

This is a snapshot of a SPARQL query: