FlexiFaCT: Scalable Flexible Factorization of Coupled Tensors on Hadoop

Beutel, Alex; Kumar, Abhimanu; Papalexakis, Evangelos E.; Talukdar, Partha Pratim; Faloutsos, Christos; P Xing, Eric

doi:10.1184/R1/6475637.v1

file.pdf (474.13 kB)

FlexiFaCT: Scalable Flexible Factorization of Coupled Tensors on Hadoop

journal contribution

posted on 2014-04-01, 00:00 authored by Alex Beutel, Abhimanu Kumar, Evangelos E. Papalexakis, Partha Pratim Talukdar, Christos Faloutsos, Eric P Xing

Given multiple data sets of relational data that share a number of dimensions, how can we efficiently decompose our data into the latent factors? Factorization of a single matrix or tensor has attracted much attention, as, e.g., in the Netflix challenge, with users rating movies. However, we often have additional, side, information, like, e.g., demographic data about the users, in the Netflix example above. Incorporating the additional information leads to the coupled factorization problem. So far, it has been solved for relatively small datasets.

We provide a distributed, scalable method for decomposing matrices, tensors, and coupled data sets through stochastic gradient descent on a variety of objective functions. We offer the following contributions: (1) Versatility: Our algorithm can perform matrix, tensor, and coupled factorization, with flexible objective functions including the Frobenius norm, Frobenius norm with an ℒ₁ induced sparsity, and non-negative factorization. (2) Scalability: FlexiFaCT scales to unprecedented sizes in both the data and model, with up to billions of parameters. FlexiFaCT runs on standard Hadoop. (3) Convergence proofs showing that Flexi-FaCT converges on the variety of objective functions, even with projections.

History

Publisher Statement

Date

2014-04-01

Usage metrics

Keywords

Machine Learning

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

FlexiFaCT: Scalable Flexible Factorization of Coupled Tensors on Hadoop

History

Publisher Statement

Date

Usage metrics

Categories

Keywords

Licence

Exports