Saturday, August 27, 2011

Data: Internet Movie Database (IMDb)

The database I have taken as example is the Internet Movie Database, popularly known as IMDb. This, as the name indicates is a database of all the movies and related areas of movies. This huge database is available online. The items in this database range from movies, its cast and crew information to tv shows information and also characters which are featured in the movies and shows.


This database which was initially started as just a hobby was later acquired by Amazon.com, and has now grown into a tremendously huge database with millions of users registering every month. There are various features which are supported by this database. The most popular feature is the movie ratings. All the movies are given ratings by the registered users and these ratings are properly linked with the list data. Rating for all the movies are available in this database. This huge database also supports a special feature, which gives you the list of the entire cast and crew involved in every episode of a tv show. This feature increased the database titles by almost double. As mentioned earlier, this database also has feature, where we can search for a character of a movie or tv serial and it will give us the character details like, who played the character, the popular quotes of the character etc. It also provides filmography of huge number of people involved with movies and telivision shows. One more feature which is popular is the Top 250 IMDb movie list. The ranking system for selecting this is based on a formula which will give Bayesian posterior mean as a result. The formula is as follows:
Ranking = {(Rating*no. of votes) + mean vote*m}/{no. of votes+m}
where:
m = minimum votes required to be listed in the Top 250 (currently 3000).
This website is Perl Based. The data of this database can be downloaded as textfiles in compressed format and can be extracted using CLI (Command Line Interface) tools. Java based GUI is used to search and display the information. It also features information of other language movies and items related to those movies. For easy access to the dataset, a new package of Python called IMDbPY was introduced.
The cons of this database website is that, it does not provide API for automated queries and also the exact process involved for giving the rankings is not disclosed, leading to some contradicting ranking for few movies.
This is now also available as a mobile application.
The link for viewing this online database is : http://www.imdb.com/.

0 comments: