Post on 22-Dec-2015
1
Transfer Learning for Link Prediction
Qiang Yang, 楊強,香港科大 Hong Kong University of Science and Technology
Hong Kong, China
http://www.cse.ust.hk/~qyang
Transfer Learning? 迁移学习…
People often transfer knowledge to novel situations Chess Checkers C++ Java Physics Computer Science
6
Transfer Learning:The ability of a system to recognize and apply knowledge and skills learned in previous tasks to novel tasks (or new domains)
7
But, direct application will not work
Machine Learning: Training and future (test) data
follow the same distribution, and are in same feature space
Machine Learning: Training and future (test) data
follow the same distribution, and are in same feature space
When features and distributions are different
Classification Accuracy (+, -) Training: 90% Testing: 10%
9
Training
Training
Cross-Domain Sentiment ClassificationCross-Domain Sentiment Classification
Classifier
Test
DVD
Classifier
Test
Electronics
82.55%
71.85%
DVD
DVD
11
Wireless sensor networks Different time periods,
devices or space
Device,Space, orTimeNight time Day time
Device 1 Device 2
When distributions are different
When Features are different
Heterogeneous: different feature spaces
12
The apple is the pomaceous fruit of the apple tree, species Malus domestica in the rose family Rosaceae ...
Banana is the common name for a type of fruit and also the herbaceous plants of the genus Musa which produce this commonly eaten fruit ...
Training: Text Future: Images
Apples
Bananas
Transfer Learning: Source Domains
LearningInput Output
Source Domains
13
Source Domain Target Domain
Training Data Labeled/Unlabeled Labeled/Unlabeled
Test Data Unlabeled
Transfer Learning
Multi-task Learning
Transductive Transfer Learning
Unsupervised Transfer Learning
Inductive Transfer Learning
Domain Adaptation
Sample Selection Bias /Covariance Shift
Self-taught Learning
Labeled data are available in a target domain
Labeled data are available only in a
source domain
No labeled data in both source and target domain
No labeled data in a source domain
Labeled data are available in a source domain
Case 1
Case 2Source and
target tasks are learnt
simultaneously
Assumption: different
domains but single task
Assumption: single domain and single task
An overview of An overview of various settings of various settings of transfer learningtransfer learning
目標數據
源數據
Talk Outline
Transfer Learning: A quick introduction Link prediction and collaborative filtering
problems Transfer Learning for Sparsity Problem
Codebook Method CST Method Collective Link Prediction Method
Conclusions and Future Works
15
Network Data
Social Networks Facebook, Twitter, Live messenger, etc.
Information Networks The Web (1.0 with hyperlinks and 2.0 with
tags) Citation network, Wikipedia, etc.
Biological Networks Gene network, etc.
Other Networks
16
A Real World Study [Leskovec-Horvitz WWW ‘08]
Who talks to whom on MSN messenger Network: 240M nodes, 1.3 billion edges
Conclusions: Average path length is 6.6 90% of nodes is reachable <8 steps
17
Network Structures: Links
Global network structures Small world Long tail distributions
Local network structures Relational learning (relation==link) Link prediction (e.g. recommendation)
Group (Cluster) structures
18
Global Network Structures
Small world Six degrees of separation [Milgram 60s]
Clustering and communities Evolutionary patterns Long tail distributions
Personalized recommendation: Amazon Also brings a challenge in recommendation
and relational learning: sparseness
19
Local Network Structures
Link Prediction A form of Statistical Relational Learning Object classification: predict category of an object
based on its attributes and links Is this person a student?
Link classification: predict type of a link Are they co-authors?
Link existence: predict whether a link exists or not Does he like this book?
Link cardinality estimation: predict the number of links of a node
An Introduction to statistical relational learning by Taskar and Getoor
20
??
??
?
?
Examples: Collaborative FilteringExamples: Collaborative Filtering
Training Data:
Dense Rating Matrix
Density >=2.0%,
CF model
Test Data:
Rating Matrix
RMSE:0.8721
Ideal Setting
1 3 1
5 3
3 1 2
2 3 4
4 4
?
?
?
?
? ?
Examples: Collaborative FilteringExamples: Collaborative Filtering
Training Data:
Sparse Rating Matrix
Density <=0.6%,
CF model
Test Data:
Rating Matrix
RMSE:0.9748
Realistic Setting
1
5
3
4
4
?
?
?
?
? ?
10%
Codebook Transfer
Bin Li, Qiang Yang, Xiangyang Xue. Can Movies and Books Collaborate? Cross-Domain
Collaborative Filtering for Sparsity Reduction. In Proceedings of the Twenty-First International Joint
Conference on Artificial Intelligence (IJCAI '09),
Pasadena, CA, USA, July 11-17, 2009.
Limitations of Codebook Transfer Same rating range
Source and target data must have the same range of ratings [1, 5]
Homogenous dimensions User and item dimensions must be similar
In reality Range of ratings can be 0/1 or [1,5] User and item dimensions may be very
different37
Coordinate System Transfer
Weike Pan, Evan Xiang, Nathan Liu and Qiang Yang. Transfer Learning in Collaborative Filtering for
Sparsity Reduction. In Proceedings of the 24th AAAI Conference on
Artificial Intelligence (AAAI-10). Atlanta, Georgia, USA. July 11-15, 2010.
Types of User Feedback
Explicit feedback: express preferences explicitly 1-5: Rating ?: Missing
Implicit feedback: express preferences implicitly 1: Observed browsing/purchasing 0: Unobserved browsing/purchasing
Target domain R: explicit ratings
One auxiliary domain (two or more data sources) : user u has implicitly browsed/purchased item j
Problem Formulation
One target domain , n users and m items.
One auxiliary domain (two data sources) user side: , shares n common
users with R. item side: , shares m common
items with R.
Goal: predict the missing values in R.
Our Solution: Coordinate System Transfer
Step 1: Coordinate System Construction ( ) Step 2: Coordinate System Adaptation
Step 1: Coordinate System Construction
Sparse SVD on auxiliary data
where the matrices give the top d principle coordinates.
Definition (Coordinate System): A coordinate system is a matrix with columns of orthonormal bases
(principle coordinates), where the columns are located in descending order according to their corresponding eigenvalues.
Coordinate System Transfer
Step 1: Coordinate System Adaptation
Adapt the discovered coordinate systems from the auxiliary domain to the target domain,
The effect from the auxiliary domain Initialization: take as seed model in target domain, Regularization:
Coordinate System Transfer
Data Sets and Evaluation Metrics
Data sets (extracted from MovieLens and Netflix)
Mean Absolute Error (MAE) and Root Mean Square Error (RMSE),
Where and are the true and predicted ratings, respectively,
and is the number of test ratings.
Experimental Results
Baselines and Parameter Settings
Baselines Average Filling Latent Matrix Factorization (Bell and Koren, ICDM07) Collective Matrix Factorization (Singh and Gordon, KDD08) OptSpace (Keshavan, Montanari and Oh, NIPS10)
Parameter settings Tradeoff parameters
LFM: { 0.001, 0.01, 0.1, 1, 10, 100 } CMF: { 0.1, 0.5, 1, 5, 10 } CST:
Latent dimensions: { 5, 10, 15 }
Average results over 10 random trials are reported
Experimental Results
Results(1/2)
Observations: CST performs 99% significantly better (t-test) than all baselines in all
sparsity levels, Transfer learning methods (CST, CMF) beats two non-transfer learning
methods (AF, LFM).
Experimental Results
Results(2/2)
Observations: CST consistently improves as d increases, demonstrating the usefulness
(avoid overfitting) using the auxiliary knowledge, The improvement (CST vs. OptSpace) is more significant with fewer
observed ratings, demonstrating the effectiveness of CST in alleviating data sparsity.
Experimental Results
Related Work (Transfer Learning)
Cross-domain collaborative filtering
Domain adaptation methods One-sided: Two-sided:
Related Work
Limitation of CST and CBT
Users behave differently in different domains
Rating bias Users tend to rate items that they like
Our Solution: Collective Link Prediction (CLP)
Jointly learn multiple domains together Learning the similarity of different
domains consistency between domains indicates
similarity.
Bin Cao, Nathan Liu and Qiang Yang. Transfer Learning for Collective Link
Prediction in Multiple Heterogeneous Domains.
In Proceedings of 27th International Conference on Machine Learning (ICML 2010), Haifa, Israel. June 2010.
Recommendation with Heterogeneous Domains Items span multiple, heterogeneous
domains for many recommendation systems. ebay, GoogleReader, douban, etc.
Example: Collective Link PredictionExample: Collective Link Prediction
Predicting
Test Data:
Rating Matrix
RMSE:0.8855
1
5
3
4
4
?
?
?
?
? ?
2 3 1 5
2 4
1 3
1 2 5
3 4 2
1 0 0 1 0
0 1 0 0 1
1 0 1 1 0
1 1 0 0 1
1 0 1 0 0
Dense Auxiliary Book Rating
Matrix
Sparse Training Movie Rating
Matrix
Dense Auxiliary Movie Tagging
Matrix
Transfer Learning
Training Different Rating
Domains
Different
User behaviors
A Nonparametric Bayesian Approach
Based on Gaussian process models Key part is the kernel modeling user
relation as well as task relation
Making Prediction
Similar to memory-based approach
Similarity between tasks
Similarity between itemsMean of prediction