Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems
Transcript of Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems
![Page 1: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems](https://reader034.fdocument.pub/reader034/viewer/2022042714/55721427497959fc0b93e4db/html5/thumbnails/1.jpg)
発表資料Takuya Makino
Saturday, March 23, 13
![Page 2: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems](https://reader034.fdocument.pub/reader034/viewer/2022042714/55721427497959fc0b93e4db/html5/thumbnails/2.jpg)
紹介する論文
• Scalable Coordinate Descent Approached to Parallel Matrix Factorization for Recommender Systems (ICDM 2012)
• Hsiang-Fu, Cho-Jui Hsieh, Si Si, and Inderjit Dhillon
• Best Paperです
Saturday, March 23, 13
![Page 3: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems](https://reader034.fdocument.pub/reader034/viewer/2022042714/55721427497959fc0b93e4db/html5/thumbnails/3.jpg)
Motivation
• 行列分解 (Matrix factorization)は、行列の要素に欠損値がある場合、推薦システムにおいて良いテクニック
• web-scaleのデータを処理するための、並列・分散化が容易で、かつ効率的な行列分解の計算方法が必要
Saturday, March 23, 13
![Page 4: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems](https://reader034.fdocument.pub/reader034/viewer/2022042714/55721427497959fc0b93e4db/html5/thumbnails/4.jpg)
The matrix factorization problem
Saturday, March 23, 13
![Page 5: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems](https://reader034.fdocument.pub/reader034/viewer/2022042714/55721427497959fc0b93e4db/html5/thumbnails/5.jpg)
The matrix factorization problem観測の出来るユーザiの商品jに対する評価
Saturday, March 23, 13
![Page 6: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems](https://reader034.fdocument.pub/reader034/viewer/2022042714/55721427497959fc0b93e4db/html5/thumbnails/6.jpg)
The matrix factorization problem
k次元の素性空間におけるユーザiの素性と商品jの素性の内積(rank-k (k < m, k < n) 行列分解)
観測の出来るユーザiの商品jに対する評価
Saturday, March 23, 13
![Page 7: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems](https://reader034.fdocument.pub/reader034/viewer/2022042714/55721427497959fc0b93e4db/html5/thumbnails/7.jpg)
The matrix factorization problem
L2正則化
k次元の素性空間におけるユーザiの素性と商品jの素性の内積(rank-k (k < m, k < n) 行列分解)
||・||_{F}は、フロベニウスノルムといい、行列の全要素の二乗の総和
観測の出来るユーザiの商品jに対する評価
Saturday, March 23, 13
![Page 8: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems](https://reader034.fdocument.pub/reader034/viewer/2022042714/55721427497959fc0b93e4db/html5/thumbnails/8.jpg)
つまり• (推定に役に立たない素性の重みは0になるようにしつつ、)未観測な要素も含め、Aを近似行列WH^Tで推定できるように誤差を最小化W, Hを求める
• 制約なしの凸計画問題なのでStochastic
Gradient Descent (SGD)などの数値解法でW, Hを求める
• (1)が凸計画問題である証明はパス (See T村本)Saturday, March 23, 13
![Page 9: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems](https://reader034.fdocument.pub/reader034/viewer/2022042714/55721427497959fc0b93e4db/html5/thumbnails/9.jpg)
Coordinate Descent
• ある一つ(以上)の変数を更新する際に、他のすべての変数を定数とみなす手法
• 変数を一つとみたときの目的関数は?
• どういう順番で変数を更新する?
Saturday, March 23, 13
![Page 10: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems](https://reader034.fdocument.pub/reader034/viewer/2022042714/55721427497959fc0b93e4db/html5/thumbnails/10.jpg)
Coordinate Descent
• ある一つ(以上)の変数を更新する際に、他のすべての変数を定数とみなす手法
• 変数を一つとみたときの目的関数は?
• どういう順番で変数を更新する?実はここをうまく考えると計算量を削減できる!
Saturday, March 23, 13
![Page 11: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems](https://reader034.fdocument.pub/reader034/viewer/2022042714/55721427497959fc0b93e4db/html5/thumbnails/11.jpg)
変数を一つとみたときの目的関数は?
(4)はw_{it}をzとした時の目的関数
Saturday, March 23, 13
![Page 12: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems](https://reader034.fdocument.pub/reader034/viewer/2022042714/55721427497959fc0b93e4db/html5/thumbnails/12.jpg)
変数を一つとみたときの目的関数は?
(4)はw_{it}をzとした時の目的関数
(1)を、内積の中のw_{it}が関係している項をzに置き換えただけ
Saturday, March 23, 13
![Page 13: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems](https://reader034.fdocument.pub/reader034/viewer/2022042714/55721427497959fc0b93e4db/html5/thumbnails/13.jpg)
式(4)を解くと
Saturday, March 23, 13
![Page 14: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems](https://reader034.fdocument.pub/reader034/viewer/2022042714/55721427497959fc0b93e4db/html5/thumbnails/14.jpg)
式(4)を解くとk�
t=1
withjt
素直にz*を計算するとO(|Ω_i|k)
f ’(z)=0とおくと得られます
Saturday, March 23, 13
![Page 15: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems](https://reader034.fdocument.pub/reader034/viewer/2022042714/55721427497959fc0b93e4db/html5/thumbnails/15.jpg)
residual matrix Rk�
t=1
withjtを毎回計算したくないのでRを保持
Saturday, March 23, 13
![Page 16: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems](https://reader034.fdocument.pub/reader034/viewer/2022042714/55721427497959fc0b93e4db/html5/thumbnails/16.jpg)
パラメータの更新k�
t=1
withjt はここで保持されている
h_{jt}も同様にして更新可能
O(|Ω_i|k)から O(|Ω_i|)に
Saturday, March 23, 13
![Page 17: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems](https://reader034.fdocument.pub/reader034/viewer/2022042714/55721427497959fc0b93e4db/html5/thumbnails/17.jpg)
更新の効率化
• residual matrix Rを保持することで計算時間が O(|Ω|k)から O(|Ω|)に
• ここは提案手法ではないです
Saturday, March 23, 13
![Page 18: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems](https://reader034.fdocument.pub/reader034/viewer/2022042714/55721427497959fc0b93e4db/html5/thumbnails/18.jpg)
どういう順番で変数を更新する?
• Item/User-wise Update
• Feature-wise Update
( ) ( )1i or j
m or n
1 t k
1i or j
m or n
1 t k
Saturday, March 23, 13
![Page 19: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems](https://reader034.fdocument.pub/reader034/viewer/2022042714/55721427497959fc0b93e4db/html5/thumbnails/19.jpg)
Item/User-wise Update
( )
Saturday, March 23, 13
![Page 20: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems](https://reader034.fdocument.pub/reader034/viewer/2022042714/55721427497959fc0b93e4db/html5/thumbnails/20.jpg)
Feature-wise Update観点を変えて、Aをk個の行列の積の総和と考える
t番目の素性によるm×n行列m×1行列と1×n行列の積はm×n行列
提案手法では、これを求めることを考えますSaturday, March 23, 13
![Page 21: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems](https://reader034.fdocument.pub/reader034/viewer/2022042714/55721427497959fc0b93e4db/html5/thumbnails/21.jpg)
u, vを求めるsubproblem
とすると(15)は
と変形できるSaturday, March 23, 13
![Page 22: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems](https://reader034.fdocument.pub/reader034/viewer/2022042714/55721427497959fc0b93e4db/html5/thumbnails/22.jpg)
Feature-wiseの何がおいしいのかR̂ij = Rij + wtihtj
wit = wti hjt = htj なので注目しているtに関する項は下線部で相殺して消去される
つまり、u_iとv_jの更新のたびにR^を計算し直す必要がなくなる
= Aij �k�
t�=1
wit�hjt� + wtihtj
Saturday, March 23, 13
![Page 23: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems](https://reader034.fdocument.pub/reader034/viewer/2022042714/55721427497959fc0b93e4db/html5/thumbnails/23.jpg)
Feature-wise Update
( )一度のsubploblemについて、Rの計算量はT CCD
iterations中の変数の計算量に比べてO(1/T)倍
Saturday, March 23, 13
![Page 24: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems](https://reader034.fdocument.pub/reader034/viewer/2022042714/55721427497959fc0b93e4db/html5/thumbnails/24.jpg)
Feature-wise Update
( )一度のsubploblemについて、Rの計算量はT CCD
iterations中の変数の計算量に比べてO(1/T)倍
O(1 + 11 + 1
T
) = O(2T
T + 1) 倍速くなる
T回CCDをおこなうと、1回だけCCDをおこなった時より
Saturday, March 23, 13
![Page 25: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems](https://reader034.fdocument.pub/reader034/viewer/2022042714/55721427497959fc0b93e4db/html5/thumbnails/25.jpg)
Saturday, March 23, 13
![Page 26: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems](https://reader034.fdocument.pub/reader034/viewer/2022042714/55721427497959fc0b93e4db/html5/thumbnails/26.jpg)
( )
Saturday, March 23, 13
![Page 27: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems](https://reader034.fdocument.pub/reader034/viewer/2022042714/55721427497959fc0b93e4db/html5/thumbnails/27.jpg)
( )
12
rp
p個の小さなベクトルに分けて
Saturday, March 23, 13
![Page 28: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems](https://reader034.fdocument.pub/reader034/viewer/2022042714/55721427497959fc0b93e4db/html5/thumbnails/28.jpg)
( )
12
rp
p個の小さなベクトルに分けて並列で更新
(16)はu_iは他のuと独立Saturday, March 23, 13
![Page 29: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems](https://reader034.fdocument.pub/reader034/viewer/2022042714/55721427497959fc0b93e4db/html5/thumbnails/29.jpg)
Saturday, March 23, 13
![Page 30: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems](https://reader034.fdocument.pub/reader034/viewer/2022042714/55721427497959fc0b93e4db/html5/thumbnails/30.jpg)
関連研究• Alternating Least Square (ALS)
Hを固定してWを求める、Wを固定してHを求める、を繰り返す
並列化は容易だけど計算量が多い
• Stochastic Gradient Descent (SGD)
計算量は少ないが並列化が難しい
収束が学習率に依存、性能が変数の更新の順序に依存
Saturday, March 23, 13
![Page 31: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems](https://reader034.fdocument.pub/reader034/viewer/2022042714/55721427497959fc0b93e4db/html5/thumbnails/31.jpg)
Saturday, March 23, 13
![Page 32: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems](https://reader034.fdocument.pub/reader034/viewer/2022042714/55721427497959fc0b93e4db/html5/thumbnails/32.jpg)
Saturday, March 23, 13
![Page 33: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems](https://reader034.fdocument.pub/reader034/viewer/2022042714/55721427497959fc0b93e4db/html5/thumbnails/33.jpg)
Saturday, March 23, 13
![Page 34: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems](https://reader034.fdocument.pub/reader034/viewer/2022042714/55721427497959fc0b93e4db/html5/thumbnails/34.jpg)
Conclusions
欠損があるAにおいて、CCD++ (Feature
wise-Update)は計算量が既存手法に比べて少なく、かつマルチコア環境、分散環境においてともに並列化が容易
Saturday, March 23, 13