Tuesday, September 6, 2011

Data: SNAP Library

Stanford Network Analysis Package (SNAP) is a library that presents datasets of massive networks and graphs. Several applications today- websites, search engines or social networks result in enormous amounts of data that can be represented in the form of graphs. SNAP is a graph mining library in C++ that presents data of these graphs and networks in a way that is easy to use, analyze and manipulate. It is a continually updating library for research on large social and information networks that has its data by crawling/ scraping various websites.

It supports different kinds of networks like un-directed, directed, bipartite, multiple etc. These different graphs may represent different kind of interactions. Ex: In case of social networks, the edges of the graph represent the interactions between people. If the data is of road network, the edges represent the various intersections or if it is a web graph, it shows the relation between various web pages and the hyperlinks that connect them.

For one of my projects last semester, I used the Amazon product purchase dataset. Each node represented a product and every edge connected those products that were co-purchased. So, if product 'i' is frequently purchased with product 'j', the graph contains an edge from 'i' to 'j'. Apart from the graph data itself, the SNAP library also provides other network statistics like the number of edges, nodes, triangles, strongly and weakly connected components, average clustering co-efficient and other details that are helpful in understanding the scale of the network.

I think the SNAP library would be helpful for users who would want to mine, analyze and visualize such enormous graphs to understand patterns and extract information from them as most of our current applications are leading towards humongous data.  

0 comments: