Monday, August 29, 2011

Data: IMDb(Internet Movie Database)

IMDb ( is an online database of information related to movies, television shows, actors, production crew personnel, video games and fictional characters featured in visual entertainment media (Wikipedia).
It was acquired by Amazon in 1998.

The content of the data includes,
  • information type for movies: title, admissions, countries broadcasted, languages, filming dates, genres, keywords, locations filmed, plot, production dates, quotes, rating, votes, runtimes, top 250 rank, bottom 10 rank, book
  • information type for person: character, birth date, birth notes, death date, death notes, height, interviews, mini biography, nick names, spouse, trivia, where now, award(can be accessed by using the API described below)
  • type of person: actor, actress, cinematographer, composer, costume designer, director, editor, guest, miscellaneous crew, producer, production designer, writer
  • cast information (person-movie-person role relationship)
  • company's name (e.g. Warner Bros., MTV ...)
  • company's type: production, special effect, distributor, miscellaneous
  • company and movie relationship
  • cast type (whether a person is cast or crew)
  • type of movie or tv: movie, tv mini series, tv movie, tv series, video game, video movie
  • linkage between the movies: edited from, featured in, followed by, referenced in, remade of, spoofed in, version of, etc.

The importance of the data reveals when you are able to extract the information that you want to analyze and visualize. For example, it would be useful if one can find a common location which several recent oscar awarded movies were filmed in. Also, it would be fun to know the movies which actual spouses were casted together.

The data can be downloaded as plain text files ( and there are tools to extract the information (e.g. python package called IMDbPY

The authors can be anyone who loves movies/tv series and wants to contribute to the website.
The owner of is Amazon and the copyright information is at