Carnegie Mellon University
Browse
file.pdf (474.13 kB)

FlexiFaCT: Scalable Flexible Factorization of Coupled Tensors on Hadoop

Download (474.13 kB)
journal contribution
posted on 2014-04-01, 00:00 authored by Alex Beutel, Abhimanu Kumar, Evangelos E. Papalexakis, Partha Pratim Talukdar, Christos Faloutsos, Eric P Xing

Given multiple data sets of relational data that share a number of dimensions, how can we efficiently decompose our data into the latent factors? Factorization of a single matrix or tensor has attracted much attention, as, e.g., in the Netflix challenge, with users rating movies. However, we often have additional, side, information, like, e.g., demographic data about the users, in the Netflix example above. Incorporating the additional information leads to the coupled factorization problem. So far, it has been solved for relatively small datasets.

We provide a distributed, scalable method for decomposing matrices, tensors, and coupled data sets through stochastic gradient descent on a variety of objective functions. We offer the following contributions: (1) Versatility: Our algorithm can perform matrix, tensor, and coupled factorization, with flexible objective functions including the Frobenius norm, Frobenius norm with an ℒ1 induced sparsity, and non-negative factorization. (2) Scalability: FlexiFaCT scales to unprecedented sizes in both the data and model, with up to billions of parameters. FlexiFaCT runs on standard Hadoop. (3) Convergence proofs showing that Flexi-FaCT converges on the variety of objective functions, even with projections.

History

Publisher Statement

Copyright © SIAM

Date

2014-04-01

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC