Monday, August 29, 2011

Data: - A Data Repository

Recently I had a chance to participate in an event conducted by a bunch of startups in the web mining area. One of the start ups that made an impression on me was Its a startup which works in the area of providing tabulated data to consumers. They have hosted a repository of tabulated data on the web. They have a lot of datasets sorted based on the topics. These topics range from wine to websites. They provide ways to search for datasets based on keywords too. 

We can browse these datasets see how many rows they have and what are fields recorded in the table. We can do other operations like sort based on a particular row, we can also remove columns that we donot need and customize the dataset in many similar ways. Once we complete the customization, We can download the customized dataset in the form of comma-seperated values or  we can choose to use  a URL to query the dataset and get a JSON reply. This data can then be parsed and used to generate different visualizations. Right now this data is available for free. All these features make it a really good data source for us.

As I have mentioned is a startup so there is still a lot that needs to be done in this product. I personally felt that the JSON replies should have even the column headers that would make it really easy to work with this data and make mashups. However the fact that it has a simple interface, has a good amount of datasets and provides access to this data in different formats its a good candidate to look for interesting datasets.