An Efficient Proximal-Gradient Method for Single and Multi-task Regression with Structured Sparsity
We consider the optimization problem of learning regression models with a mixed-norm penalty that is defined over overlapping groups to achieve structured sparsity. It has been previously shown that such penalty can encode prior knowledge on the input or output structure to learn an structuredsparsity pattern in the regression parameters. However, because of the non-separability of the parameters of the overlapping groups, developing an efficient optimization method has remained a challenge. An existing method casts this problem as a second-order cone programming (SOCP) and solves it by interior-point methods. However, this approach is computationally expensive even for problems of moderate size. In this paper, we propose an efficient proximal-gradientmethod that achieves a faster convergence rate and is much more efficient and scalable than solving the SOCP formulation. Our method exploits the structure of the non-smooth structured-sparsity-inducing norm, introduces its smooth approximation, and solves this approximation function instead of optimizing the original objective function directly. We demonstrate the efficiency and scalability of our method on simulated datasets and show that our method can be successfully applied to a very large-scale dataset in genetic association analysis.