Absolutely fascinating paper on community formation in blogspace, by Ravi Kumar, Prabhakar Raghavan, Jasmine Novak, and Andrew Tomkins, called On the Bursty Evolution of Blogspace
. (Free ACM account
required -- it's so
worth it, just for this article.)
The authors develop a method of measuring time-stamped link-space, so that blogspace can be mapped based not just on links, but links by date, allowing them to track the formation of communities, defined here as a dense cluster of weblogs all pointing back and forth to one another.
Using this method, they put some meat on the bones of what everyone knows:
Within a community of interacting bloggers, a
given topic may become the subject of intense debate for
a period of time, then fade away. These bursts of activity
are typified by heightened hyperlinking amongst the blogs
involved -- within a time interval.
They then go on to identify several examples of communities coalescing in a brief period of time around a set of posts -- WannaBeGirl's blog poetry
in 2000, or Dawn's Funniest/Sexiest Blogger poll
from 2002. (Unsurprisingly, both examples used posts about other people to get those people's attention.)
They outline their method for crawling and analysing blogspace while looking for these burst-forming communities, and the algorithm looks like a useful feature for ongoing exploration of blogspace. (Paging David Sifry. David Sifry to the white courtesy telephone...) They also segment blogs by in-bound links:
...pages linked-to by an enormous number of other pages
are too well-known for the type of communities we seek to
discover; so, we summarily remove all pages that contain
more than a certain number of in-links.
in order to differentiate between community participation and publishing (and argument I've been groping towards in Communities, Audiences and Scale
, and Weblogs, Power Laws and Inequality
, but the algorithms here are far more precise than my descriptions.)
Finally, they analyze the changes in their data set overall, and come to two remarkable conclusions: first, 2001, really was the unusual year, with the link structure at both a macro and micro level taking a remarkable jump in density.
Second, there is a core set of blogs that form a Strongly Connected Cluster, and is growing rapidly:
But up to this point, blogspace is not
a coherent entity -- the overall size has grown but the interconnectedness
is not significant. At the start of 2001, the
largest component begins to grow in size relative to the rest
of the graph, and by the end of 2001 it contains about 3%
of all nodes. In 2002, however, a threshold behavior arises,
and the size of the component increases dramatically, to over
20% by the present day. This giant component still appears
to be expanding rapidly, doubling in size approximately every
three months. Clearly this growth cannot continue and
must plateau within two years.
Oh, and they prove that blogspace is not a random graph, and conclude that blogspace can better be analyzed as a set of inter-networking communities than as a set of stand-alone blogs.
It's too early to tell for sure, but this paper feels absolutely seminal. I know its a pain to set up another online account, but do it anyway, and then go read the paper. (Thanks, Hylton)