Disponível somente no TrabalhosFeitos
  • Páginas : 43 (10534 palavras )
  • Download(s) : 0
  • Publicado : 11 de outubro de 2012
Ler documento completo
Amostra do texto
Spectral Regularization Algorithms for Learning Large Incomplete Matrices
Rahul Mazumder
Department of Statistics Stanford University

Trevor Hastie
Statistics Department and Department of Health, Research and Policy Stanford University

Robert Tibshirani
Department of Health, Research and Policy and Statistics Department Stanford

Editor: Submitted for publication July 30, 2009

We use convex relaxation techniques to provide a sequence of regularized low-rank solutions for large-scale matrix completion problems. Using the nuclear norm as a regularizer, we provide a simple and very efficient convex algorithm for minimizing the reconstruction error subject to a bound on the nuclear norm. Ouralgorithm Soft-Impute iteratively replaces the missing elements with those obtained from a soft-thresholded SVD. With warm starts this allows us to efficiently compute an entire regularization path of solutions on a grid of values of the regularization parameter. The computationally intensive part of our algorithm is in computing a low-rank SVD of a dense matrix. Exploiting the problem structure, weshow that the task can be performed with a complexity linear in the matrix dimensions. Our semidefinite-programming algorithm is readily scalable to large matrices: for example it can obtain a rank-80 approximation of a 106 × 106 incomplete matrix with 105 observed entries in 2.5 hours, and can fit a rank 40 approximation to the full Netflix training set in 6.6 hours. Our methods show very goodperformance both in training and test error when compared to other competitive state-of-the art techniques.

1. Introduction
In many applications measured data can be represented in a matrix Xm×n , for which only a relatively small number of entries are observed. The problem is to “complete” the matrix based on the observed entries, and has been dubbed the matrix completion problem [CCS08, CR08, RFP07,CT09, KOM09, RS05]. The “Netflix” competition (e.g. [SN07]) is a popular example, where the data is the basis for a recommender system. The rows correspond to viewers and the columns to movies, with the entry Xij being the rating ∈ {1, . . . , 5} by viewer i for movie j. There are 480K viewers and 18K movies, and hence 8.6 billion (8.6 × 109 ) potential entries. However, on average each viewerrates about 200 movies, so only 1.2% or 108 entries are observed. The task is to predict the ratings that viewers would give to movies they have not yet rated. 1

These problems can be phrased as learning an unknown parameter (a matrix Zm×n ) with very high dimensionality, based on very few observations. In order for such inference to be meaningful, we assume that the parameter Z lies in a muchlower dimensional manifold. In this paper, as is relevant in many real life applications, we assume that Z can be well represented by a matrix of low rank, i.e. Z ≈ Vm×k Gk×n , where k ≪ min(n, m). In this recommender-system example, low rank structure suggests that movies can be grouped into a small number of “genres”, with Gℓj the relative score for movie j in genre ℓ. Viewer i on the other handhas an affinity Viℓ for genre ℓ, and hence the modeled score for viewer i on k movie j is the sum ℓ=1 Viℓ Gℓj of genre affinities times genre scores. Typically we view the observed entries in X as the corresponding entries from Z contaminated with noise. Recently [CR08, CT09, KOM09] showed theoretically that under certain assumptions on the entries of the matrix, locations, and proportion of unobservedentries, the true underlying matrix can be recovered within very high accuracy. [SAJ05] studied generalization error bounds for learning low-rank matrices. For a matrix Xm×n let Ω ⊂ {1, . . . , m}×{1, . . . , n} denote the indices of observed entries. We consider the following optimization problem: minimize subject to

rank(Z) (Xij − Zij )2 ≤ δ, (1)

where δ ≥ 0 is a regularization...
tracking img