Thursday, August 16, 2012

Find: Twitter revs up mTurk with Clockwork Raven

Looks handy!

Crowdsourced data analysis with Clockwork Raven

Today, we’re excited to open source Clockwork Raven, a web application that allows users to easily submit data to Mechanical Turk for manual review and then analyze that data. Clockwork Raven steps in to do what algorithms cannot: it sends your data analysis tasks to real people and gets fast, cheap and accurate results. We use Clockwork Raven to gather tens of thousands of judgments from Mechanical Turk users every week.


We’re huge fans of human evaluation at Twitter and how it can aid data analysis. In the past, we’ve used systems like Mechanical Turk and CrowdFlower, as well as an internal system where we train dedicated reviewers and have them come in to our offices. However, as we scale up our usage of human evaluation, we needed a better system. This is why we built Clockwork Raven and designed it with several important goals in mind:

Wednesday, August 15, 2012

Viz: Distant Shape - 10 Years of Daring Fireball

Cassovary: A Big Graph-Processing Library

Cassovary: A Big Graph-Processing Library

We are open sourcing Cassovary, a big graph-processing library for the Java Virtual Machine (JVM) written in Scala. Cassovary is designed from the ground up to efficiently handle graphs with billions of edges. It comes with some common node and graph data structures and traversal algorithms. A typical usage is to do large-scale graph mining and analysis.

At Twitter, Cassovary forms the bottom layer of a stack that we use to power many of our graph-based features, including "Who to Follow" and “Similar to.” We also use it for relevance in Twitter Search and the algorithms that determine which Promoted Products users will see. Over time, we hope to bring more non-proprietary logic from some of those product features into Cassovary.

Please use, fork, and contribute to Cassovary if you can. If you have any questions, ask on the mailing list or file issues on GitHub. Also, follow @cassovary for updates.

-Pankaj Gupta (@pankaj)

Viz: Visualizing Hadoop with HDFS-DU

Visualizing Hadoop with HDFS-DU

We are a heavy adopter of Apache Hadoop with a large set of data that resides in its clusters, so it’s important for us to understand how these resources are utilized. At our July Hack Week, we experimented with developing HDFS-DU to provide us an interactive visualization of the underlying Hadoop Distributed File System (HDFS). The project aims to monitor different snapshots for the entire HDFS system in an interactive way, showing the size of the folders and the rate at which the size changes. It can also effectively identify efficient and inefficient file storage and highlight nodes in the file system where this is happening.

HDFS-DU provides the following in a web user interface:

  • A TreeMap visualization where each node is a folder in HDFS. The area of each node can be relative to the size or number of descendents

  • A tree visualization showing the topology of the file system
  • HDFS-DU is built using the following front-end technologies:

    • D3.js: for tree visualization

  • JavaScript InfoVis Toolkit: for TreeMap visualization
  • Details

    Below is a screenshot of the HDFS-DU user interface (directory names scrubbed). The user interface is made up of two linked visualizations. The left visualization is a TreeMap and shows parent-child relationships through containment. The right visualization is a tree layout, which displays two levels of depth from the current selected node in the file system. The tree visualization displays extra information for each node on hover.

    You can drill down on the TreeMap by clicking on a node, this would create the same effect as clicking on any tree node. There are two possible layouts for the TreeMap. The default one encodes file size in the area of each node. The second one encodes number of descendents in the area of each node. In the second view it's interesting to spot nodes where storage is inefficient.

    Find: visualizing the evolution of the web

    Behind the scenes: visualizing the evolution of the web



    This guest post is by Sergio Alvarez, Vizzuality, and Deroy Peraza, Hyperakt, in collaboration with Min Li Chan, Chrome Team

    At Google I/O this year, we launched a new version of The Evolution of the Web, a project visualizing the history and pace of innovation in web technologies and browsers. The Evolution of the Web traces how web technologies have evolved in the last two decades and highlights the web community’s continuous efforts to improve the web platform and enable developers to create new generations of immersive web experiences. In collaboration with the Google Chrome team, the team at Hyperakt designed the interactive visualization while Vizzuality built it using HTML5, SVG, and CSS3.

    The visualization included 43 web technology "strands" across 7 browser timelines to represent major developments on the web platform. On hover or tap, each strand is highlighted to reveal intersections that tell the story of when browser support was implemented for each new web technology. To provide additional context, we developed a secondary visualization to illustrate the growth of Internet users and traffic.

    Sunday, August 12, 2012

    Viz: Tracking athletes' Twitter mentions over the Olympics

    Tracking athletes' Twitter mentions over the Olympics

    New York Times Twitter olympics mentions infographic

    The New York Times is back with another excellent infographic about the 2012 Olympics in London — this time showing Twitter activity on athletes' accounts. The graphic visualizes the number of mentions 140 verified accounts received over the games so far per 1,000 followers, honing in on when different athletes' mindshare peaked on Twitter. So, who won? Malaysian track cyclist Azizulhasni Awang (@AzizulAWANG) looks to have received the the most mentions per 1,000 followers (2,308) after his public apology for failing to obtain any medals. Michael Phelps' mentions, meanwhile, were dwarfed by his over 1,000,000 follower count.