Multi-Task Learning and Web Search Ranking

11
The Big Apple Trip. By Ivonne Corichi.

description

 

Transcript of Multi-Task Learning and Web Search Ranking

Page 1: Multi-Task Learning and Web Search Ranking

websearch

Multi-Task Learning and Web Search Ranking

Gordon Sun (孙国政 )Yahoo! Inc

March 2007

Page 2: Multi-Task Learning and Web Search Ranking

Page 2 websearch

Outline:

1. Brief Review: Machine Learning in web search ranking and Multi-Task learning.

2. MLR with Adaptive Target Value Transformation – each query is a task.

3. MLR for Multi-Languages – each language is a task.

4. MLR for Multi-query classes – each type of queries is a task.

5. Future work and Challenges.

Page 3: Multi-Task Learning and Web Search Ranking

Page 3 websearch

MLR (Machine Learning Ranking)

• General Function Estimation and Risk Minimization:Input: x = {x1, x2, …, xn}Output: y Training set: {yi, xi}, i = 1, …, nGoal: Estimate mapping function y = F(x)

• In MLR work:x = x (q, d) = {x1, x2, …, xn} --- ranking featuresy = judgment labeling: e.g. {P E G F B} mapped to {0, 1, 2, 3,

4}.Loss Function: L(y, F(x)) = (y – F(x))2

Algorithm: MLR with regression.

))](,([minarg ,* xFyLEF xy

F

Page 4: Multi-Task Learning and Web Search Ranking

Page 4 websearch

• Rank features construction• Query features:

• query language, query word types (Latin, Kanji, …), …

• Document features:• page_quality, page_spam, page_rank,…

• Query-Document dependent features: • Text match scores in body, title, anchor text (TF/IDF, proximity), ...

• Evaluation metric – DCG (Discounted Cumulative Gain)

where grades Gi = grade values for {P, E, G, F, B} (NDCG – 2n) DCG5 -- (n=5), DCG10 -- (n=10)

n

ii iGDCG

1

)1ln(/

Page 5: Multi-Task Learning and Web Search Ranking

Page 5 websearch

Distribution of judgment grades

))(,(minarg ,* xFyLEF xy

F

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Perfect Excellent Good Fair Bad

% o

f u

rls

JP

CN

DE

UK

KR

Page 6: Multi-Task Learning and Web Search Ranking

Page 6 websearch

• Milti-Task Learning• Single-Task Learning (STL)

- One prediction task (classification/regression): to estimate a function based on oneTraining/testing set: T= {yi, xi}, i = 1, …, n

• Multi-Task Learning (MTL)- Multiple prediction tasks, each with their own

training/testing set: Tk= {yki, xki}, k = 1, …, m, i = 1, …, nk

- Goal is to solve multiple tasks together:- Tasks share the same input space (or at least partially):- Tasks are related (say, MLR -- share one mapping

function)

))](,([minarg ,* xFyLEF xy

F

Page 7: Multi-Task Learning and Web Search Ranking

Page 7 websearch

• Milti-Task Learning: Intuition and Benefits

• Empirical Intuition- Data from “related” tasks could help -- - Equivalent to increase the effective sample sizeGoal: Share data and knowledge from task to task --- Transfer Learning.

• Benefits- when # of training examples per task is limited- when # of tasks is large and can not be handled by

MLR for each task.- when it is difficult/expensive to obtain examples for

some tasks- possible to obtain meta-level knowledge

Page 8: Multi-Task Learning and Web Search Ranking

Page 8 websearch

• Milti-Task Learning: “Relatedness” approaches.

• Probabilistic modeling for task generation[Baxter ’00], [Heskes ’00], [The, Seeger, Jordan ’05],

[Zhang, Gharamani, Yang ’05]

• Latent Variable correlations – Noise correlations [Greene ’02]– Latent variable modeling [Zhang ’06]

• Hidden common data structure and latent variables.– Implicit structure (common kernels) [Evgeniou,

Micchelli, Pontil ’05]– Explicit structure (PCA) [Ando, Zhang ’04]

• Transformation relatedness [Shai ’05]

Page 9: Multi-Task Learning and Web Search Ranking

Page 9 websearch

• Milti-Task Learning for MLRDifferent levels of relatedness.• Grouping data based on queries, each query could be one task.

• Grouping data based on languages of queries, each language is a task.

• Grouping data based on query classes

))],((,([minarg* dqxFyLEF qdF

q

))],((,([minarg ),(* dqxFyLEF langqd

Flang

))],((,([minarg* dqxFyLEF classqF

class

Page 10: Multi-Task Learning and Web Search Ranking

Page 10 websearch

Outline:

1. Brief Review: Machine Learning in web search ranking and Multi-Task learning.

2. MLR with Adaptive Target Value Transformation – each query is a task.

3. MLR for Multi-Languages – each language is a task.

4. MLR for Multi-query classes – each type of queries is a task.

5. Future work and Challenges.

Page 11: Multi-Task Learning and Web Search Ranking

Page 11 websearch

Adaptive Target Value Transformation• Intuition:

Rank features vary a lot from query to query.

Rank features vary a lot from sample to sample with same labeling.

MLR is a ranking problem, but regression is to minimize prediction errors.

• Solution: Adaptively adjust training target values:

• Where linear (monotonic) transformation is required

(nonlinear g() may not reserve orders of E(y|x))

]]|)),((),(([[minarg* qdqxFygLEEF qdqF

qqq yyg )(

Page 12: Multi-Task Learning and Web Search Ranking

Page 12 websearch

Adaptive Target Value Transformation• Implementation: Empirical Risk Minimization

• Where the linear transformation weights are regularized,

• λα and λβ are regularization parameters, the p-norm.

• The solution will be

ppQ

q

N

iiqq

q

dqxFyFL

1]),(([),,( 2

1 1

]1,...,1,1[1 21 Q],...,,[ 21 Q

p

),,(minarg*}*,*,{0,0,

FLFHF

Page 13: Multi-Task Learning and Web Search Ranking

Page 13 websearch

Adaptive Target Value Transformation• Norm p=2 solution: for each (λα and λβ )

1. For initial (αβ) , find F(x) by solving:

2. For given F(x), solve for each (αq, βq), q = 1, 2, … Q.

3. Repeat 1 until

• Norm p=1 solution, solve conditional quadratic programming [Lasso/lars]

• Convergence Analysis: Assuming

22

1

2

0,0

** )1()],(([minarg),(

q

qq

N

iiqqqq dqxFy

),,(minarg*}{ FLFF

kq

kq

kq

kq

11

RKHSxF )(

Page 14: Multi-Task Learning and Web Search Ranking

Page 14 websearch

Adaptive Target Value Transformation

Experiments data:

Page 15: Multi-Task Learning and Web Search Ranking

Page 15 websearch

Adaptive Target Value Transformation

Evaluation of aTVT on US and CN data

Page 16: Multi-Task Learning and Web Search Ranking

Page 16 websearch

Adaptive Target Value Transformation

Page 17: Multi-Task Learning and Web Search Ranking

Page 17 websearch

Adaptive Target Value Transformation

Page 18: Multi-Task Learning and Web Search Ranking

Page 18 websearch

Adaptive Target Value Transformation

Observations:1. Relevance gain (DCG5 ~ 2%) is

visible.2. Regularization is needed.3. Different query types gain differently

from aTVT.

Page 19: Multi-Task Learning and Web Search Ranking

Page 19 websearch

Outline:

1. Brief Review: Machine Learning in web search ranking and Multi-Task learning.

2. MLR with Adaptive Target Value Transformation – each query is a task.

3. MLR for Multi-Languages – each language is a task.

4. MLR for Multi-query classes – each type of queries is a task.

5. Future work and Challenges.

Page 20: Multi-Task Learning and Web Search Ranking

Page 20 websearch

Multi-Language MLR

Objective:

• Make MLR globally scalable: >100 languages, >50 regions.

• Improve MLR for small regions/languages using data from other languages.

• Build a Universal MLR for all regions that do not have data and editorial support.

Page 21: Multi-Task Learning and Web Search Ranking

Page 21 websearch

Multi-Language MLR

Part 1

1. Feature Differences between Languages

2. MLR function differences between Languages.

Page 22: Multi-Task Learning and Web Search Ranking

Page 22 websearch

Multi-Language MLR Distribution of Text Score

Legend: JP, CN, DE, UK, KR

Perf+Excellent urls Bad urls

Page 23: Multi-Task Learning and Web Search Ranking

Page 23 websearch

Multi-Language MLR Distribution of Spam Score

Legend: JP, CN, DE, UK, KR

Perf+Excellent urls Bad urls

JP, KR similar

DE, UK similar

Page 24: Multi-Task Learning and Web Search Ranking

Page 24 websearch

Multi-Language MLR Training and Testing on Different Languages

UK DE KR JP CN

UK 6.22 2.29 -0.21 2.96 0.32

DE 6.96 13.1 6.25 6.05 3.94

KR 1.50 -0.55 5.69 4.49 3.86

JP -1.25 -3.79 -0.30 4.48 1.30

CN 1.91 -3.53 0.29 2.50 7.47

Train Language

Tes

t Lan

guag

e

% DCG improvement over base function

Page 25: Multi-Task Learning and Web Search Ranking

Page 25 websearch

Multi-Language MLR

Language Differences: observations

Feature difference across languages is visible but not huge.

MLR trained for one language does not work well for other languages.

Page 26: Multi-Task Learning and Web Search Ranking

Page 26 websearch

Multi-Language MLR

Part 2

Transfer Learning with Region features

Page 27: Multi-Task Learning and Web Search Ranking

Page 27 websearch

Multi-Language MLR Query Region Feature New feature: query region:

Multiple Binary Valued Features:• Feature vector: qr = (CN, JP, UK, DE, KR)• CN queries: (1, 0, 0, 0, 0)• JP queries: (0, 1, 0, 0, 0)• UK queries: (0, 0, 1, 0, 0)• …

To test the Trained Universal MLR on new languages: e.g. FR

• Feature vector: qr = (0, 0, 0, 0, 0)

Page 28: Multi-Task Learning and Web Search Ranking

Page 28 websearch

Multi-Language MLR Query Region Feature: Experiment results

Language Combined Model Combined Model With Query Region Feature

JP 3.07% 3.53%

CN 6.24% 7.02%

UK 4.34% 5.92%

DE 9.86% 10.51%

KR 5.79% 6.83%

% DCG-5 improvement over base function

Page 29: Multi-Task Learning and Web Search Ranking

Page 29 websearch

Multi-Language MLR Query Region Feature: Experiment results CJK and UK,DE Models

Test Language All Language Model UK, DE Model

CJK Model

JP 3.53% 4.39%

CN 7.02% 7.17%

UK 5.92% 5.93%

DE 10.51% 12.5%

KR 6.83% 6.14%

All models include query region feature

Page 30: Multi-Task Learning and Web Search Ranking

Page 30 websearch

Multi-Language MLR Query Region Feature: Observations

Query Region feature seems to improve combined model performance in every case. Not always statistically significant.

Helped more when we had less data (KR). Helped more when introducing “near languages”

models (CJK, EU) Would not help for languages with large training data

(JP, CN).

Page 31: Multi-Task Learning and Web Search Ranking

Page 31 websearch

Multi-Language MLR Experiments: Overweighting Target Language

This method deals with the common case where there is a language with a small amount of data available.

Use all available data, but change the weight of the data from the target language.

When weight=1 “Universal Language Model”

As weight->INF becomes Single Language Model.

Page 32: Multi-Task Learning and Web Search Ranking

Page 32 websearch

Multi-Language MLR Germany

0.00%

2.00%

4.00%

6.00%

8.00%

10.00%

12.00%

14.00%

0 5000 10000 15000 20000

Weight 1

Weight 10

Weight 100

DE Only

Page 33: Multi-Task Learning and Web Search Ranking

Page 33 websearch

Multi-Language MLR UK

-12.00%

-10.00%

-8.00%

-6.00%

-4.00%

-2.00%

0.00%

2.00%

4.00%

6.00%

8.00%

0 5000 10000 15000 20000

Weight 1

Weight 10

Weight 100

UK Only

Page 34: Multi-Task Learning and Web Search Ranking

Page 34 websearch

Multi-Language MLR China

-1.00%

0.00%

1.00%

2.00%

3.00%

4.00%

5.00%

6.00%

7.00%

8.00%

0 5000 10000 15000 20000

Weight 1

Weight 10

Weight 100

CN Only

Page 35: Multi-Task Learning and Web Search Ranking

Page 35 websearch

Multi-Language MLR Korea

0.00%

1.00%

2.00%

3.00%

4.00%

5.00%

6.00%

0 5000 10000 15000 20000

Weight 1

Weight 10

Weight 100

KR Only

Page 36: Multi-Task Learning and Web Search Ranking

Page 36 websearch

Multi-Language MLR Japan

-8.00%

-6.00%

-4.00%

-2.00%

0.00%

2.00%

4.00%

0 5000 10000 15000 20000Weight 1

Weight 10

Weight 100

JP Only

Page 37: Multi-Task Learning and Web Search Ranking

Page 37 websearch

Multi-Language MLR Average DCG Gain For JP, CN, DE, UK, KR

-0.04

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0 5000 10000 15000 20000

Weight 1

Weight 10

Weight 100

UK Only

Page 38: Multi-Task Learning and Web Search Ranking

Page 38 websearch

Multi-Language MLR Overweighting Target LanguageObservations:

It helps on certain languages with small size of data (KR, DE).

It does not help on some languages (CN, JP).

For languages with enough data, it will not help.

The weighting of 10 seems better than 1 and 100 on average.

Page 39: Multi-Task Learning and Web Search Ranking

Page 39 websearch

Multi-Language MLR

Part 3

Transfer Learning with Language Neutral Data and Regression Diff

Page 40: Multi-Task Learning and Web Search Ranking

Page 40 websearch

Multi-Language MLR Selection of Language Neutral queries:

For each of (CN, JP, KR, DE, UK), train an MLR with own data.

Test queries of one language by all languages MLRs.

Select queries that showed best DCG cross different language MLRs.

Consider these queries as language neutral and could be shared by all language MLR development.

Page 41: Multi-Task Learning and Web Search Ranking

Page 41 websearch

Multi-Language MLR

Evaluation of Language Neutral Queries on CN-simplified dataset (2,753 queries).

All the queries Language-Neutral queries only (top ~500 queries)

CN-Traditional dcg5 = 5.64 5.79(+2.7%)

Korean 5.19 5.50(+6%)

Japanese 5.85 5.83

Page 42: Multi-Task Learning and Web Search Ranking

Page 42 websearch

Outline:

1. Brief Review: Machine Learning in web search ranking and Multi-Task learning.

2. MLR with Adaptive Target Value Transformation – each query is a task.

3. MLR for Multi-Languages – each language is a task.

4. MLR for Multi-query classes – each type of queries is a task.

5. Future work and Challenges.

Page 43: Multi-Task Learning and Web Search Ranking

Page 43 websearch

Multi-Query Class MLR

Intuitions:• Different types of queries behave differently:

• Require different ranking features,(Time sensitive queries page_time_stamps).

• Expect different results:(Navigational queries one official page on

the top.)

• Also, different types of queries could share the same ranking features.

• .

• Multi-class learning could be done in a unified MLR by• Introducing query classification and use query class as input

ranking features.• Adding page level features for the corresponding classes.

Page 44: Multi-Task Learning and Web Search Ranking

Page 44 websearch

Multi-Query Class MLR

Time Recency experiments:• Feature implementation:

• Binary query feature: Time Sensitive (0,1)• Binary page feature: discovered within last three month.

• Data:• 300 time sensitive queries (editorial).• ~2000 ordinary queries.• Over weight time sensitive queries by 3.

• 10-fold cross validation on MLR training/testing.

Page 45: Multi-Task Learning and Web Search Ranking

Page 45 websearch

Multi-Query Class MLR

Time Recency experiments result:Compare MLR with and w/o page_time feature.

DCG gain P-value

Time sensitive queries

2.31% 1.08e-6

All queries 0.52% 0.0017

Page 46: Multi-Task Learning and Web Search Ranking

Page 46 websearch

Multi-Query Class MLR

Name Entity queries:• Feature implementation:

• Binary query feature: name entity query (0,1)• 11 new page features implemented:

Path lengtho Host lengtho Number of host component (url depth)o Path contains “index”o Path contains either “cgi”, “asp”, “jsp”, or “php”o Path contains “search” or “srch”, …

• Data:• 142 place name entity queries.• ~2000 ordinary queries.• 10-fold cross validation on MLR training/testing.

Page 47: Multi-Task Learning and Web Search Ranking

Page 47 websearch

Multi-Query Class MLR

Name Entity query experiments result:Compared MLR with base model without name entity features.

DCG gain P-value

Name Entity queries (142)

0.82% 0.09

All queries 0.28% 0.09

Page 48: Multi-Task Learning and Web Search Ranking

Page 48 websearch

Multi-Query Class MLR

Observations:

• Query class combined with page level features could help MLR relevance.

• More research is needed on query classification and page level feature optimization.

Page 49: Multi-Task Learning and Web Search Ranking

Page 49 websearch

Outline:

1. Brief Review: Machine Learning in web search ranking and Multi-Task learning.

2. MLR with Adaptive Target Value Transformation – each query is a task.

3. MLR for Multi-Languages – each language is a task.

4. MLR for Multi-query classes – each type of queries is a task.

5. Future work and Challenges.

Page 50: Multi-Task Learning and Web Search Ranking

Page 50 websearch

Future Work and Challenges Multi-task learning extended to different types of training data:

• Editorial judgment data.• User click-through data

Multi-task learning extended to different types of relevance judgments:

• Absolute relevance judgment.• Relative relevance judgment

Multi-task learning extended to use both• Labeled data.• Unlabeled data.

Multi-task learning extended to different types of search user intentions.

Page 51: Multi-Task Learning and Web Search Ranking

Page 51 websearch

Contributors from Yahoo! International Search Relevance team:

• Algorithm and model development:Zhaohui Zheng, Hongyuan Zha,Lukas Biewald, Haoying Fu

• Data exporting/processing/QA:Jianzhang HeSrihari Reddy

• Director:Gordon Sun.

Page 52: Multi-Task Learning and Web Search Ranking

Page 52 websearch

Thank you.

Q&A?