1 Transfer Learning for Link Prediction Qiang Yang, 楊強，香港科大 Hong Kong University of...

Transfer Learning for Link Prediction

Qiang Yang, 楊強，香港科大 Hong Kong University of Science and Technology

Hong Kong, China

http://www.cse.ust.hk/~qyang

We often find it easier …

We often find it easier…

Transfer Learning? 迁移学习…

People often transfer knowledge to novel situations Chess Checkers C++ Java Physics Computer Science

Transfer Learning:The ability of a system to recognize and apply knowledge and skills learned in previous tasks to novel tasks (or new domains)

But, direct application will not work

Machine Learning: Training and future (test) data

follow the same distribution, and are in same feature space

Machine Learning: Training and future (test) data

follow the same distribution, and are in same feature space

When distributions are different

Classification Accuracy (+, -) Training: 90% Testing: 60%

When features and distributions are different

Classification Accuracy (+, -) Training: 90% Testing: 10%

Training

Cross-Domain Sentiment ClassificationCross-Domain Sentiment Classification

Classifier

Electronics

82.55%

71.85%

Wireless sensor networks Different time periods,

devices or space

Device,Space, orTimeNight time Day time

Device 1 Device 2

When distributions are different

When Features are different

Heterogeneous: different feature spaces

The apple is the pomaceous fruit of the apple tree, species Malus domestica in the rose family Rosaceae ...

Banana is the common name for a type of fruit and also the herbaceous plants of the genus Musa which produce this commonly eaten fruit ...

Training: Text Future: Images

Apples

Bananas

Transfer Learning: Source Domains

LearningInput Output

Source Domains

Source Domain Target Domain

Training Data Labeled/Unlabeled Labeled/Unlabeled

Test Data Unlabeled

Transfer Learning

Multi-task Learning

Transductive Transfer Learning

Unsupervised Transfer Learning

Inductive Transfer Learning

Domain Adaptation

Sample Selection Bias /Covariance Shift

Self-taught Learning

Labeled data are available in a target domain

Labeled data are available only in a

source domain

No labeled data in both source and target domain

No labeled data in a source domain

Labeled data are available in a source domain

Case 1

Case 2Source and

target tasks are learnt

simultaneously

Assumption: different

domains but single task

Assumption: single domain and single task

An overview of An overview of various settings of various settings of transfer learningtransfer learning

目標數據

源數據

Talk Outline

Transfer Learning: A quick introduction Link prediction and collaborative filtering

problems Transfer Learning for Sparsity Problem

Codebook Method CST Method Collective Link Prediction Method

Conclusions and Future Works

Network Data

Social Networks Facebook, Twitter, Live messenger, etc.

Information Networks The Web (1.0 with hyperlinks and 2.0 with

tags) Citation network, Wikipedia, etc.

Biological Networks Gene network, etc.

Other Networks

A Real World Study [Leskovec-Horvitz WWW ‘08]

Who talks to whom on MSN messenger Network: 240M nodes, 1.3 billion edges

Conclusions: Average path length is 6.6 90% of nodes is reachable <8 steps

Network Structures: Links

Global network structures Small world Long tail distributions

Local network structures Relational learning (relation==link) Link prediction (e.g. recommendation)

Group (Cluster) structures

Global Network Structures

Small world Six degrees of separation [Milgram 60s]

Clustering and communities Evolutionary patterns Long tail distributions

Personalized recommendation: Amazon Also brings a challenge in recommendation

and relational learning: sparseness

Local Network Structures

Link Prediction A form of Statistical Relational Learning Object classification: predict category of an object

based on its attributes and links Is this person a student?

Link classification: predict type of a link Are they co-authors?

Link existence: predict whether a link exists or not Does he like this book?

Link cardinality estimation: predict the number of links of a node

An Introduction to statistical relational learning by Taskar and Getoor

Link Prediction

Task: predict missing links in a network Main challenge: Network Data Sparsity

Product Recommendation (Amazon.com)

Examples: Collaborative FilteringExamples: Collaborative Filtering

Training Data:

Dense Rating Matrix

Density >=2.0%,

CF model

Test Data:

Rating Matrix

RMSE:0.8721

Ideal Setting

Examples: Collaborative FilteringExamples: Collaborative Filtering

Training Data:

Sparse Rating Matrix

Density <=0.6%,

CF model

Test Data:

Rating Matrix

RMSE:0.9748

Realistic Setting

Transfer Learning for Collaborative Filtering?

IMDB Database

Amazon.com

Codebook Transfer

Bin Li, Qiang Yang, Xiangyang Xue. Can Movies and Books Collaborate? Cross-Domain

Collaborative Filtering for Sparsity Reduction. In Proceedings of the Twenty-First International Joint

Conference on Artificial Intelligence (IJCAI '09),

Pasadena, CA, USA, July 11-17, 2009.

Limitations of Codebook Transfer Same rating range

Source and target data must have the same range of ratings [1, 5]

Homogenous dimensions User and item dimensions must be similar

In reality Range of ratings can be 0/1 or [1,5] User and item dimensions may be very

different37

Coordinate System Transfer

Weike Pan, Evan Xiang, Nathan Liu and Qiang Yang. Transfer Learning in Collaborative Filtering for

Sparsity Reduction. In Proceedings of the 24th AAAI Conference on

Artificial Intelligence (AAAI-10). Atlanta, Georgia, USA. July 11-15, 2010.

Types of User Feedback

Explicit feedback: express preferences explicitly 1-5: Rating ?: Missing

Implicit feedback: express preferences implicitly 1: Observed browsing/purchasing 0: Unobserved browsing/purchasing

Target domain R: explicit ratings

One auxiliary domain (two or more data sources) : user u has implicitly browsed/purchased item j

Problem Formulation

One target domain , n users and m items.

One auxiliary domain (two data sources) user side: , shares n common

users with R. item side: , shares m common

items with R.

Goal: predict the missing values in R.

Our Solution: Coordinate System Transfer

Step 1: Coordinate System Construction ( ) Step 2: Coordinate System Adaptation

Step 1: Coordinate System Construction

Sparse SVD on auxiliary data

where the matrices give the top d principle coordinates.

Definition (Coordinate System): A coordinate system is a matrix with columns of orthonormal bases

(principle coordinates), where the columns are located in descending order according to their corresponding eigenvalues.

Step 1: Coordinate System Adaptation

Adapt the discovered coordinate systems from the auxiliary domain to the target domain,

The effect from the auxiliary domain Initialization: take as seed model in target domain, Regularization:

AlgorithmCoordinate System Transfer

Data Sets and Evaluation Metrics

Data sets (extracted from MovieLens and Netflix)

Mean Absolute Error (MAE) and Root Mean Square Error (RMSE),

Where and are the true and predicted ratings, respectively,

and is the number of test ratings.

Experimental Results

Baselines and Parameter Settings

Baselines Average Filling Latent Matrix Factorization (Bell and Koren, ICDM07) Collective Matrix Factorization (Singh and Gordon, KDD08) OptSpace (Keshavan, Montanari and Oh, NIPS10)

Parameter settings Tradeoff parameters

LFM: { 0.001, 0.01, 0.1, 1, 10, 100 } CMF: { 0.1, 0.5, 1, 5, 10 } CST:

Latent dimensions: { 5, 10, 15 }

Average results over 10 random trials are reported

Results(1/2)

Observations: CST performs 99% significantly better (t-test) than all baselines in all

sparsity levels, Transfer learning methods (CST, CMF) beats two non-transfer learning

methods (AF, LFM).

Results(2/2)

Observations: CST consistently improves as d increases, demonstrating the usefulness

(avoid overfitting) using the auxiliary knowledge, The improvement (CST vs. OptSpace) is more significant with fewer

observed ratings, demonstrating the effectiveness of CST in alleviating data sparsity.

Related Work (Transfer Learning)

Cross-domain collaborative filtering

Domain adaptation methods One-sided: Two-sided:

Related Work

Limitation of CST and CBT

Users behave differently in different domains

Rating bias Users tend to rate items that they like

Our Solution: Collective Link Prediction (CLP)

Jointly learn multiple domains together Learning the similarity of different

domains consistency between domains indicates

similarity.

Bin Cao, Nathan Liu and Qiang Yang. Transfer Learning for Collective Link

Prediction in Multiple Heterogeneous Domains.

In Proceedings of 27th International Conference on Machine Learning (ICML 2010), Haifa, Israel. June 2010.

Recommendation with Heterogeneous Domains Items span multiple, heterogeneous

domains for many recommendation systems. ebay, GoogleReader, douban, etc.

Example: Collective Link PredictionExample: Collective Link Prediction

Predicting

Test Data:

Rating Matrix

RMSE:0.8855

2 3 1 5

1 0 0 1 0

0 1 0 0 1

1 0 1 1 0

1 1 0 0 1

1 0 1 0 0

Dense Auxiliary Book Rating

Matrix

Sparse Training Movie Rating

Matrix

Dense Auxiliary Movie Tagging

Matrix

Transfer Learning

Training Different Rating

Domains

Different

User behaviors

A Nonparametric Bayesian Approach

Based on Gaussian process models Key part is the kernel modeling user

relation as well as task relation

Making Prediction

Similar to memory-based approach

Similarity between tasks

Similarity between itemsMean of prediction

Conclusions and Future Work

Transfer Learning ( 舉一反三 ) Transfer Learning has many applications

Sensor Based Location Estimation Sensor Based Activity Recognition

Future: When to transfer, and when not to transfer Finding source domains on World Wide Web Heterogeneous transfer learning

1 Transfer Learning for Link Prediction Qiang Yang, 楊強，香港科大 Hong Kong University of...

Documents

Transcript of 1 Transfer Learning for Link Prediction Qiang Yang, 楊強，香港科大 Hong Kong University of...

Hong Kong…?

Macau & Hong Kong

Hong Kong Company

Old Hong Kong

Crecer en China aprovechando las ventajas de Hong Kong ventajas de Hong Kong Stefano Passarello y Judith Padrós. Hong Kong 02 1 país, 2 sistemas ... La casa matriz de Hong Kong puede

英会話教室・スクール AEON イーオンSetting Japan Japan setting Hong Kong Hong Kong Hong Kong Hong Kong Hong Kong Hong Kong Japan Japan Japan Japan Japan pg. 99 Phone pg.

Hong Kong, patrimoine, activisme et décolonisation (Hong Kong, cultural heritage, activism and decolonization)

Hong Kong Business

International ICT Expo Hong Kong › download... · International ICT Expo Hong Kong . Schedule: 2010.04.12 ～04.16 200.01.12 Chubu4 ⇒Hong Kong . 2010.04.13 Visit Hong KongCyber

Hong Kong Itinerary

CNKI 科学技術イノベーション 知識サービス ・プランTunku Abdul Rahman University Hong Kong ： 39 The University of Hong Kong Chinese University of Hong Kong Hong

Tesis Hong Kong

HONG KONG FOOD EX PO 2012 HONG KONG FOOD ... › koucho › fureai › documents › ...HONG KONG FOOD EXPO 2012 HONG KONG FOOD EXPO 2012 HONG KONG FOOD EXPO 2012知事がトップセールス香港で県産果実を

11 Selected Applications of Transfer Learning 杨强， Qiang Yang Department of Computer Science and Engineering The Hong Kong University of Science and Technology.

Hong Kong 1946

INTB (Hong Kong)

Agroalimentare hong kong

Old Hong Kong

A Year of - The Hong Kong Institute of Bankers...The Hong Kong Institute of Bankers Hong Kong Head O˜ce 3/F Guangdong Investment Tower 148 Connaught Road Central, Hong Kong Tel (852)

University of Hong Kong - Asian Institute of …...Alan C. W. Tang, Partner, Shinewing (Hong Kong) CPA Limited (Hong Kong) Michelle Taylor, Partner, Jones Day (Hong Kong) Wong Yong

CNKI 科学技術イノベーション知識サービス・プランTunku Abdul Rahman University Hong Kong ： 39 The University of Hong Kong Chinese University of Hong Kong Hong