Stanford Large Network Dataset Collection

The Stanford Large Network Dataset Collection was published as part of the Stanford Network Analysis Project (SNAP). It consists of an interesting collection of large networks. The aim of SNAP is provide a general purpose network analysis and graph mining library.

The SNAP Collection contains datasets from various domains such as:

  • Social networks : online social networks, edges represent interactions between people
  • Networks with ground-truth communities : ground-truth network communities in social and information networks
  • Communication networks : email communication networks with edges representing communication
  • Citation networks : nodes represent papers, edges represent citations
  • Collaboration networks : nodes represent scientists, edges represent collaborations (co-authoring a paper)
  • Web graphs : nodes represent webpages and edges are hyperlinks
  • Amazon networks : nodes represent products and edges link commonly co-purchased products
  • Internet networks : nodes represent computers and edges communication
  • Road networks : nodes represent intersections and edges roads connecting the intersections
  • Autonomous systems : graphs of the internet
  • Signed networks : networks with positive and negative edges (friend/foe, trust/distrust)
  • Location-based online social networks : Social networks with geographic check-ins
  • Wikipedia networks and metadata : Talk, editing and voting data from Wikipedia
  • Twitter and Memetracker : Memetracker phrases, links and 467 million Tweets
  • Online communities : Data from online communities such as Reddit and Flickr
  • Online reviews : Data from online review systems such as BeerAdvocate and Amazon
  • Information cascades : …

It is definitely worth of having a look at the different datasets. What dataset are you missing? What would you like to be added to the collection? Leave a comment below!

(via Hacker News)