Saturday, August 27, 2011

Data : Stock Quotes

Eoddata provides stock market data downloads in variety of formats.

Download can be done by exchange, or by exchange group.
It also provides downloading of historical data from last 5, 10, 15 or 20 year data.

Some of the different download formats are spreadsheets, CSV, delimited data.

Downloading of stock data by listing symbols is also available.


Viz: Puzzle - This Shell by Gambit

As I was searching for interesting html5 visualization example, I came across this example. It consists of joining the pieces of a running video to solve the puzzle before the song ends. If you manage to complete the puzzle, you get a free download of the song. I already got mine! yay!!!

'This Shell' is a song by The Gamits, which is a 3-piece punk rock band from Denver. This particular puzzle example is Created by Legwork Studios who design web sites with intuitive interface. This particular example is developed in html5. The most interesting thing about this is that the pieces of videos run coherently when they are apart and as they are dragged. This may not be one of the very useful visualization, but interesting for sure.

The visualization:
Legwork Studio Website:

Tool : Highcharts JS

Highcharts JS is a JavaScript charting library that supports line, spline, area, areaspline, column,bar, pie and scatter chart types.It offers support for live charts and combination charts as well.

Some other features which make this tool a very good option are compatibility with multiple browsers, the ability to zoom in and out of charts and tooltips.

The API is written in Javascript and allows for easy customization. The authors of Highcharts are HighslideSoftware.

Some examples created using Highcharts can be found here.

Data: Internet Movie Database (IMDb)

The database I have taken as example is the Internet Movie Database, popularly known as IMDb. This, as the name indicates is a database of all the movies and related areas of movies. This huge database is available online. The items in this database range from movies, its cast and crew information to tv shows information and also characters which are featured in the movies and shows.

This database which was initially started as just a hobby was later acquired by, and has now grown into a tremendously huge database with millions of users registering every month. There are various features which are supported by this database. The most popular feature is the movie ratings. All the movies are given ratings by the registered users and these ratings are properly linked with the list data. Rating for all the movies are available in this database. This huge database also supports a special feature, which gives you the list of the entire cast and crew involved in every episode of a tv show. This feature increased the database titles by almost double. As mentioned earlier, this database also has feature, where we can search for a character of a movie or tv serial and it will give us the character details like, who played the character, the popular quotes of the character etc. It also provides filmography of huge number of people involved with movies and telivision shows. One more feature which is popular is the Top 250 IMDb movie list. The ranking system for selecting this is based on a formula which will give Bayesian posterior mean as a result. The formula is as follows:
Ranking = {(Rating*no. of votes) + mean vote*m}/{no. of votes+m}
m = minimum votes required to be listed in the Top 250 (currently 3000).
This website is Perl Based. The data of this database can be downloaded as textfiles in compressed format and can be extracted using CLI (Command Line Interface) tools. Java based GUI is used to search and display the information. It also features information of other language movies and items related to those movies. For easy access to the dataset, a new package of Python called IMDbPY was introduced.
The cons of this database website is that, it does not provide API for automated queries and also the exact process involved for giving the rankings is not disclosed, leading to some contradicting ranking for few movies.
This is now also available as a mobile application.
The link for viewing this online database is :

Tool: Dipity

Dipity is an interactive and free digital timeline tool. It allows one to use texts, links, pictures and video in their personal timelines.

This tool is intended for anyone who uses internet. One can customize the look of their embedded time line by adding as much data and in any form. Users can even create, share, embed and work together on interactive and visually engaging timelines that integrate video, audio, images, text, links, social media, location and timestamps. It also makes public time lines searchable and increase traffic and user engagement on one's website.

Dipity allows users to gather real-time sources from social media, search engines and RSS, converting them in a very user interactive visualization. It combines the power of multimedia and social media content with timestamps, geolocation and realtime updates. One can even zoom in and out of the timeline to hours or years. It makes the historical and present data highly visual and attractive.

Data:UNData-world of information

The United Nations Statistics Division (UNSD) of the Department of Economic and Social Affairs (DESA) introduced a web-based data access system to bring UN statistical databases within easy reach of users by having a single entry point.Objectives here are to provide free access to global statistics, to educate people about the value of statistics for evidence-based policy and decision-making. It also assist Statistical Offices across several countries in strengthening their data distribution and record-keeping capabilities. Database consists of variety of statistical resources
covering the following areas:
Areas of statistics include: Sources include:
Economic Education UN Statistics Division
Demographic Employment WHO
Health National Account UNESCO
Environment Human Development World Bank
UNdata provide two views for accessing and querying data:- record view and table presentation.Users can also sort through the various databases by looking at their metadata which is especially useful if you are looking for historical records.You can filter databases by topics. Each result has a "Download" link to download the data as .xls sheet or xml document, a "Preview"

link to preview database and an "Explore" link to access database.The UNdata wiki provides links to the sources' homepages and includes information about the methodology by which data sets are collected.
There are many ways to extract and use the UNdata. A Geospatial Librarian's World offers some tips on how to process data downloaded from UNdata. In many cases when users download data they get multiple records for each country; one record for each year for each data point. In order to bring this data into geographic information system, the data set needs to be re-arranged, so that there is only one record for each country with multiple columns for years.The first UNdata API project is a community effort to make this data mashable and reusable in a variety of ways.The service uses a straightforward REST API hosted on Google’s Java AppEngine and makes UNDATA sets easily queryable from any application.The second UNdata API project is made available via Microsoft's cloudcomputing service called Dallas.
UNData can be viewed here

Tool: d3.js (Data-Driven Documents)

D3.js is javascript library for visualization on websites or for web application. It uses JSON or DOM objects(as any other javascript) as input for generating Visualization. The author of this tool is Mike Bostov. This tool tries just to manipulate data in document object model by exploiting complete capabilities of HTML5 and CSS3.It uses SVG graphics with javascript very efficiently, because of which it can render large datasets with extremely fast with animations and and interactions. It is divided in modules to make web application light by including only required module.

Sample: This is sample code of usage of tool.
var svg ="#chart")
.attr("width", 400)
.attr("height", 600)
.attr("class", "PiYG");
.attr("transform", function(d) { return "translate(" + d + ")"; })
.attr("r", 2);
Above code is trying to create svg element of visualization.Later on it is drawing circles with properties depending data.

Reference :

Tool: Google Fusion Table

This is a visualization tool which can visualize data into a chart or a map. The tool supports various file formats while uploading. Also for visualizing, the various options available are table, map, heatmap, line chart, bar graph, pie chart, scatter plot, timeline, storyline or motion (animation over time)
Fusion table supports editing of data and also table functions like joins, filter, sort, etc. The data uploaded on to Google Fusion can be made private or public. Also it enables the viewers of the data to comment on it. Fusion tables can also map multiple polygons with variations in color based on underlying data. Yet another handy feature of Fusion is the use of Fusion Table's "templating" export to generate a JSON file from data in other formats. The tool supports Geographic Information Systems (GIS) and can geocode addresses automatically.
Thus as an overall review, this tool is suitable for beginners who can be comfortable analysing data using a web browser. It is apt for people who want to analyse data without any coding.
The tool needs to be accessed with a valid Google ID. It also has some examples which are shared by users publicly.
The example illustrated is generated using Google Fusion. It shows the goals scored by various teams at 2010 World Cup. There is another option of depicting the goals as per the receiving team. We can also export the data visualised in the form a map into a Keyhole Markup Language (KML)

Friday, August 26, 2011

Data:America's Children: Key National Indicators of Well-Being

America's Children: Key National Indicators of Well-Being report provides summary of national indicators of children's well-being and monitors changes in these indicators.

The report identifies seven domains which characterize the well-being of child. The seven domains are family and social environment, economic circumstances, health care, physical environment and safety, behavior, education, and health. These domains are interrelated and can have great effect on well-being.

The report provides tables related to these seven domains and also gives the graphs that show the change of these monitors over the years.

Here is the snapshot of a table that has details of childcare.

The tables can be viewed here: tables
The tables contain available data from 1950-2010.