posted on 2007-01-01, 00:00authored byNathan D. Ratliff, J. Andrew Bagnell
We propose a novel variant of the conjugate gradi-
ent algorithm, Kernel Conjugate Gradient (KCG),
designed to speed up learning for kernel machines
with differentiable loss functions. This approach
leads to a better conditioned optimization problem
during learning. We establish an upper bound on
the number of iterations for KCG that indicates it
should require less than the square root of the number of iterations that standard conjugate gradient requires. In practice, for various differentiable ker-
nel learning problems, we find KCG consistently,
and significantly, outperforms existing techniques.
The algorithm is simple to implement, requires no
more computation per iteration than standard ap-
proaches, and is well motivated by Reproducing
Kernel Hilbert Space (RKHS) theory. We further
show that data-structure techniques recently used
to speed up kernel machine approaches are well
matched to the algorithm by reducing the dominant
costs of training: function evaluation and RKHS in-
ner product computation.