posted on 2012-01-01, 00:00authored byAlessandro Rinaldo, Aarti Singh, Rebecca Nugent, Larry Wasserman
<p>High density clusters can be characterized by the connected components of a level set <em>L(λ) = {x: p(x)>λ}</em> of the underlying probability density function <em>p</em> generating the data, at some appropriate level <em>λ ≥ 0</em>. The complete hierarchical clustering can be characterized by a cluster tree <em>T= ∪<sub>λ</sub>L(λ)</em>. In this paper, we study the behavior of a density level set estimate <em>L̂(λ)</em> and cluster tree estimate <em>T̂</em> based on a kernel density estimator with kernel bandwidth <em>h</em>. We define two notions of instability to measure the variability of <em>L̂(λ)</em> and <em>T̂</em> as a function of <em>h</em>, and investigate the theoretical properties of these instability measures.</p>