posted on 2006-07-01, 00:00authored byMary McGlohon, Jure Leskovec, Christos Faloutsos, Matthew Hurst, Natalie Glance
Can we cluster blogs into types by considering their typical posting and linking behavior? How do blogs evolve over
time? In this work we answer these questions, by providing
several sets of blog and post features that can help distinguish between blogs. The first two sets of features focus on
the topology of the cascades that the blogs are involved in,
and the last set of features focuses on the temporal evolution, using chaotic and fractal ideas. We also propose to use
PCA to reduce dimensionality, so that we can visualize the
resulting clouds of points.
We run all our proposed tools on the ICWSM dataset. Our
findings are that (a) topology features can help us distinguish blogs, like ‘humor’ versus ‘conservative’ blogs (b) the
temporal activity of blogs is very non-uniform and bursty but
(c) surprisingly often, it is self-similar and thus can be compactly characterized by the so-called bias factor (the ‘80’ in
a recursive 80-20 distribution).