Time is of the Essence : Improving Recency Ranking Using Twitter Data

Post on 16-Jan-2017

128 views 0 download

Transcript of Time is of the Essence : Improving Recency Ranking Using Twitter Data

Time is of the Essence : Improving Recency Ranking Using Twitter Data (WWW2k10)

Anlei Dong, Ruiqiang Zhang, Pranam Kolari , Jing Bai, Fernando Diaz, Yi Chang, Zhaohui Zheng

Presenter : ChinHui Chen ( 陳晉暉 )

Author

• Anlei Dong• Area: Yahoo! Search Sciences• Location: Yahoo! Labs Silicon Valley

Agenda

• Introduction• Motivation • Method • Experiment• Discussion

Introduction

• Recency Sensitive Queriesex: earthquake -> relevant , timely

• Problem : 1. 0 recall prob2. user’s need for relevant content is immediate

• Use Micro-Blogging site

Motivation

• General web search algorithm : – match signals (Language/VSM Model)– query-independent signals (PageRank)

• But when we issue recency sensitive queries– Fresh docs may have very few in-links.– Fresh docs may have very few clicks.

Motivation (con’t)

• So how ?

Methods

• Learning to rank • What ?– 想像是個黑盒子– 給一堆 Query – Doc Pair 的 feature– 就會 Train 一個 Model– 之後遇到未知 Query 抽 Feature 即可對 Doc 排序

Feature

Score

Prediction

Methods (con’t)

• So the goal is :

Query : 章魚哥Ranking List

Regular URL

Fresh URL

Methods (con’t)

• So the goal is :

Query : 章魚哥Regular URL Fresh URL

標準答案 測試

Regular URL Fresh URL

Query : 新的 Query抽 feature 抽 feature

3514

4425

抽 feature 抽 feature

????

????

Train Predict

Methods(con’t)

• Therefore, the main steps :1. Extract Features.2. Apply Learning to rank.

Methods – FeatureSet

• Content Features– Functions of the content of the doc

(ex. query term match, …)• Aggregate Features– A doc’s long term popularity, usage

(in-link stat, clicks, PageRank, …)• Twitter Features

Methods – FeatureSet (con’t)

• Content Features• Aggregate Features• Twitter Features– Textual Features– Social Network Features– Other Features

Methods – FeatureSet (con’t)

• Twitter - Textual Features (q vs url)Goal : 將 URL 用 text 表示 , 可算 Cosine

Mm post

w URLs

Dm post

v words

Represents a URL by combination of twitter contents

Methods – FeatureSet (con’t)

• Twitter - Textual FeaturesGoal : 沒有 match 的 term 應該懲罰

PS:

Methods – FeatureSet (con’t)

• Twitter - Textual FeaturesGoal : phrase

Methods – FeatureSet (con’t)

• 整理 - Textual Features (q vs url)

Methods – FeatureSet (con’t)

• Twitter - Social Network Features

A user i posted URL j

Methods – FeatureSet (con’t)

• Twitter - Other Features– The next page

Avg stat of users who issued the tiny url.

First user who issued the tiny url.

Issued the tiny url with highest Twitter score

Methods – FeatureSet (con’t)

Regular URL Twitter URL

Content Features(ex. term info )

O O

Aggregate Features(ex. PageRank)

O poor

Twitter Features No Have

Methods – Ranking

• We have feature set now.• How does Learning To Rank work ??

1. Build Relevance Model (Training)2. Predict ranking (Prediction)

Methods – Ranking(con’t)

• 1. Build Relevance Model : – Train 一個 query 與 url 是否相關的 Model.

– Straightforward : • 1. sample query-url pairs (regular + twitter) , and label them.• 2. train a ranking function

(RankSVM, RankBoost, Gbrank, RankNet,…)

• But … regular url >>>>> twitter url(twitter feature 只有 twitter url 有 , 會被忽略 )

Methods – Ranking(con’t)

• 1. Build Relevance Model : – Modified :

M represents ranking functionD represents data setF represents feature setTRAIN-MLR (D,F) : train D using F PREDICT(D,M) : scores dataset, D using M

Methods – Ranking(con’t)

• 2. Prediction Ranking Straightforward : 1. apply step1. model to regular/twitter urls. 2. rank url by sorting scores.

Regular Twitter

Methods – Ranking(con’t)

Experiments

• Dataset : – Queries : • only consider time-sensitive queries in one hour.

– Regular URL : • in the search engine index during one hour.

– Twitter URL : • 9-hour period before the query time.

Experiments(con’t)• Label : – query-url pairs: perfect, excellent, good, fair, bad.– documents :

提升 fresh降低 out of date

Experiments(con’t)

Experiments(con’t)Evaluation :

Experiments(con’t)

• Q :

Experiments(con’t)Mregular = content + aggreMcontent = content onlyMtwitter = twitter only

Regular Twitter

Content Feature(ex. term info )

O O

Aggregate(ex. PageRank)

O poor

Twitter Features No Have

• Result:

Experiments(con’t)

• Feature Importance

Authority & activity of users

Q&A