Download - D ESIGNING S TATES, A CTIONS, AND R EWARDS FOR U SING POMDP IN S ESSION S EARCH Jiyun Luo, Sicong Zhang, Xuchu Dong, Grace Hui Yang InfoSense Department.

Transcript

DESIGNING STATES, ACTIONS, AND REWARDS FOR USING POMDP IN SESSION SEARCH

Jiyun Luo, Sicong Zhang, Xuchu Dong, Grace Hui Yang

InfoSenseDepartment of Computer Science

Georgetown University

{jl1749,sz303,xd47}@georgetown.edu

[email protected]

Page 2: D ESIGNING S TATES, A CTIONS, AND R EWARDS FOR U SING POMDP IN S ESSION S EARCH Jiyun Luo, Sicong Zhang, Xuchu Dong, Grace Hui Yang InfoSense Department.

E.g. Find what city and state Dulles airport is in, what shuttles ride-sharing vans and taxi cabs connect the airport to other cities, what hotels are close to the airport, what are some cheap off-airport parking, and what are the metro stops close to the Dulles airport.

DYNAMIC IR- A NEW PERSPECTIVE TO LOOK AT SEARCH

Informationneed

User

Search Engine

Page 3: D ESIGNING S TATES, A CTIONS, AND R EWARDS FOR U SING POMDP IN S ESSION S EARCH Jiyun Luo, Sicong Zhang, Xuchu Dong, Grace Hui Yang InfoSense Department.

Trial-and-error

CHARACTERISTICS OF DYNAMIC IR

3 q1 – "dulles hotels"

q2 – "dulles airport"

q3 – "dulles airport location”

q4 – "dulles metrostop"

Page 4: D ESIGNING S TATES, A CTIONS, AND R EWARDS FOR U SING POMDP IN S ESSION S EARCH Jiyun Luo, Sicong Zhang, Xuchu Dong, Grace Hui Yang InfoSense Department.

Rich interactions Query formulation Document clicks Document examination eye movement mouse movements etc.

CHARACTERISTICS OF DYNAMIC IR

Page 5: D ESIGNING S TATES, A CTIONS, AND R EWARDS FOR U SING POMDP IN S ESSION S EARCH Jiyun Luo, Sicong Zhang, Xuchu Dong, Grace Hui Yang InfoSense Department.

Temporal dependency

CHARACTERISTICS OF DYNAMIC IR

clicked documentsquery

ranked documents

q1 C1

q2 C2……

…… Dn

qn Cn

I information need

iteration 1 iteration 2 iteration n

Page 6: D ESIGNING S TATES, A CTIONS, AND R EWARDS FOR U SING POMDP IN S ESSION S EARCH Jiyun Luo, Sicong Zhang, Xuchu Dong, Grace Hui Yang InfoSense Department.

Fits well in this trial-and-error setting

It is to learn from repeated, varied attempts which are continued until success.

The learner (also known as agent) learns from its dynamic interactions with the world rather than from a labeled dataset as in supervised learning.

The stochastic model assumes that the system's current state depend on the previous state and action in a non-deterministic manner

REINFORCEMENT LEARNING (RL)

Page 7: D ESIGNING S TATES, A CTIONS, AND R EWARDS FOR U SING POMDP IN S ESSION S EARCH Jiyun Luo, Sicong Zhang, Xuchu Dong, Grace Hui Yang InfoSense Department.

PARTIALLY OBSERVABLE MARKOV DECISION PROCESS (POMDP)

……s0 s1

Hidden states

ActionsRewards

1R. D. Smallwood et. al., ‘73

o1 o2 o3

Markov Long Term OptimizationObservations, Beliefs

Page 8: D ESIGNING S TATES, A CTIONS, AND R EWARDS FOR U SING POMDP IN S ESSION S EARCH Jiyun Luo, Sicong Zhang, Xuchu Dong, Grace Hui Yang InfoSense Department.

Study designs of states, actions, reward functions of RL algorithms in Session Search

GOAL OF THIS PAPER

Page 9: D ESIGNING S TATES, A CTIONS, AND R EWARDS FOR U SING POMDP IN S ESSION S EARCH Jiyun Luo, Sicong Zhang, Xuchu Dong, Grace Hui Yang InfoSense Department.

A MARKOV CHAIN OF DECISION MAKING STATES

[Luo, Zhang, and Yang SIGIR 2014]

Page 10: D ESIGNING S TATES, A CTIONS, AND R EWARDS FOR U SING POMDP IN S ESSION S EARCH Jiyun Luo, Sicong Zhang, Xuchu Dong, Grace Hui Yang InfoSense Department.

Partially Observable Markov Decision Process

Two agentsCooperative game Joint Optimization

WIN-WIN SEARCH: DUAL-AGENT STOCHASTIC GAME

Hidden states Actions Rewards Markov

[Luo, Zhang, and Yang SIGIR 2014]

Page 11: D ESIGNING S TATES, A CTIONS, AND R EWARDS FOR U SING POMDP IN S ESSION S EARCH Jiyun Luo, Sicong Zhang, Xuchu Dong, Grace Hui Yang InfoSense Department.

A tuple (S, M, A, R, γ, O, Θ, B) S : state space M: state transition function A: actions R: reward function γ: discount factor, 0< γ ≤1 O: observations a symbol emitted according to a hidden state. Θ: observation function Θ(s,a,o) is the probability that o is observed when the system transitions into state s after taking action a, i.e. P(o|s,a). B: belief space Belief is a probability distribution over hidden states.

PARTIALLY OBSERVABLE MARKOV DECISION PROCESS (POMDP)

1R. D. Smallwood et. al., ‘73

Page 12: D ESIGNING S TATES, A CTIONS, AND R EWARDS FOR U SING POMDP IN S ESSION S EARCH Jiyun Luo, Sicong Zhang, Xuchu Dong, Grace Hui Yang InfoSense Department.

SRT

Relevant & Exploitation

SRR

Relevant & Exploration

SNRT

Non-Relevant &

Exploitation

SNRR

Non-Relevant & Exploration

scooter price ⟶ scooter stores

collecting old US coins⟶ selling old US coins

Philadelphia NYC travel ⟶ Philadelphia NYC train Boston tourism ⟶ NYC

tourism

HIDDEN DECISION MAKING STATES

[Luo, Zhang, and Yang SIGIR 2014]

Page 13: D ESIGNING S TATES, A CTIONS, AND R EWARDS FOR U SING POMDP IN S ESSION S EARCH Jiyun Luo, Sicong Zhang, Xuchu Dong, Grace Hui Yang InfoSense Department.

ACTIONS User Action (Au)

add query terms (+Δq) remove query terms (-Δq) keep query terms (qtheme)

Search Engine Action(Ase) Increase/ decrease/ keep term weights Switch on or off a search technique,

e.g. to use or not to use query expansion

adjust parameters in search techniques e.g., select the best k for the top k

docs used in PRF Message from the user(Σu)

clicked documents SAT clicked documents

Message from search engine(Σse) top k returned documents

Messages are essentially documents that an agent thinks are relevant.

[Luo, Zhang, and Yang SIGIR 2014]

Page 14: D ESIGNING S TATES, A CTIONS, AND R EWARDS FOR U SING POMDP IN S ESSION S EARCH Jiyun Luo, Sicong Zhang, Xuchu Dong, Grace Hui Yang InfoSense Department.

Based on Markov Decision Process (MDP) States: Queries

Observable Actions:

User actions: Add/remove/ unchange the query terms

Nicely correspond to our definition of query change Search Engine actions:

Increase/ decrease /remain term weights

Rewards: nDCG

[Guan, Zhang, and Yang SIGIR 2013]2ND MODEL: QUERY CHANGE MODEL

Page 15: D ESIGNING S TATES, A CTIONS, AND R EWARDS FOR U SING POMDP IN S ESSION S EARCH Jiyun Luo, Sicong Zhang, Xuchu Dong, Grace Hui Yang InfoSense Department.

SEARCH ENGINE AGENT’S ACTIONS∈ Di−1 action Example

qtheme

Y increase “pocono mountain” in s6

N increase

“france world cup 98 reaction” in s28, france world cup 98 reaction stock market→ france world cup 98 reaction

+∆q Y decrease ‘policy’ in s37, Merck lobbyists →

Merck lobbyists US policy

N increase ‘US’ in s37, Merck lobbyists → Merck lobbyists US policy

−∆q Y decrease

‘reaction’ in s28, france world cup 98 reaction → france world cup 98

N No change

‘legislation’ in s32, bollywood legislation →bollywood law

15[Guan, Zhang, and Yang SIGIR 2013]

Page 16: D ESIGNING S TATES, A CTIONS, AND R EWARDS FOR U SING POMDP IN S ESSION S EARCH Jiyun Luo, Sicong Zhang, Xuchu Dong, Grace Hui Yang InfoSense Department.

QUERY CHANGE RETRIEVAL MODEL (QCM)

Bellman Equation gives the optimal value for an MDP:

The reward function is used as the document relevance score function and is tweaked backwards from Bellman equation:

Document relevant

scoreQuery

Transition model

Maximum past

relevanceCurrent reward/releva

nce score

[Guan, Zhang, and Yang SIGIR 2013]

Page 17: D ESIGNING S TATES, A CTIONS, AND R EWARDS FOR U SING POMDP IN S ESSION S EARCH Jiyun Luo, Sicong Zhang, Xuchu Dong, Grace Hui Yang InfoSense Department.

CALCULATING THE TRANSITION MODEL

)|(log)|(

)|(log)()|(log)|(

)|(log)]|(1[+ d)|P(q log = d) ,Score(q

*1ii

dtPdtP

dtPtidfdtPdtP

dtPdtP

qti

dtqt

qthemeti

• According to Query Change and Search Engine Actions

Current reward/ relevance score

Increase weights for

theme terms

Decrease weights for

removed terms

Increase weights for novel added

terms

Decrease weights for old added terms

[Guan, Zhang, and Yang SIGIR 2013]

Page 18: D ESIGNING S TATES, A CTIONS, AND R EWARDS FOR U SING POMDP IN S ESSION S EARCH Jiyun Luo, Sicong Zhang, Xuchu Dong, Grace Hui Yang InfoSense Department.

RELATED WORK Katja Hofmann, Shimon Whiteson, and Maarten

de Rijke. Balancing exploration and exploitation in learning to rank online. In ECIR'11.

Xiaoran Jin and Marc Sloan, and Jun Wang. Interactive exploratory search for multi page search results. In WWW '13

Xuehua Shen, Bin Tan, and Chengxiang Zhai. Implicit user modeling for personalized search. In CIKM '05

Norbert Fuhr. A Probability Ranking Principle for Interactive Information Retrieval. In IRJ, 11, 3, 2008 18

Page 19: D ESIGNING S TATES, A CTIONS, AND R EWARDS FOR U SING POMDP IN S ESSION S EARCH Jiyun Luo, Sicong Zhang, Xuchu Dong, Grace Hui Yang InfoSense Department.

STATE DESIGN OPTIONS (S1) Fixed number of states

use two binary relevance states “Relevant” or “Irrelevant”

use four states whether the previously retrieved documents are

relevant whether the user desires to explore

(S2) Varying number of states model queries as states, n queries n states infinity states

document relevance score distribution as states. one document corresponds to one state

Page 20: D ESIGNING S TATES, A CTIONS, AND R EWARDS FOR U SING POMDP IN S ESSION S EARCH Jiyun Luo, Sicong Zhang, Xuchu Dong, Grace Hui Yang InfoSense Department.

ACTION DESIGN OPTIONS

(A1) Technology Selection a meta-level modeling of actions

implement multiple search methods, and select the best methods for each query

Select the best parameters for each method

(A2) Term Weight Adjustment adjusted term weights

(A3) Ranked List One possible ranking of a list of documents is

one single action If the corpus size is N and the retrieved document

number is n, then the size of the action space is: 20

Page 21: D ESIGNING S TATES, A CTIONS, AND R EWARDS FOR U SING POMDP IN S ESSION S EARCH Jiyun Luo, Sicong Zhang, Xuchu Dong, Grace Hui Yang InfoSense Department.

REWARD FUNCTION DESIGN OPTIONS

(R1) Explicit Feedback Rewards generated from user’s relevance

assessments. nDCG, MAP, etc

(R2) Implicit Feedback Use implicit feedback obtained from user behavior

Clicks, SAT clicks

Page 22: D ESIGNING S TATES, A CTIONS, AND R EWARDS FOR U SING POMDP IN S ESSION S EARCH Jiyun Luo, Sicong Zhang, Xuchu Dong, Grace Hui Yang InfoSense Department.

SYSTEMS UNDER COMPARISON

Luo, et al. Win-Win Search: Dual-Agent Stochastic Game in Session Search. SIGIR’14

Zhang, et al. A POMDP Model for Content-Free Document Re-ranking. SIGIR’14

Guan, et al. Utilizing Query Change for Session Search. SIGIR’13

Shen, et al. Implicit user modeling for personalized search. CIKM '05

Jin, et al. Interactive exploratory search for multi page search results. WWW '13

S1A1R1(win-win)

S1A3R2

S2A2R1(QCM)

S2A1R1(UCAIR)

S2A3R1(IES)

S1A1R2

S1A2R1

S2A1R1

Page 23: D ESIGNING S TATES, A CTIONS, AND R EWARDS FOR U SING POMDP IN S ESSION S EARCH Jiyun Luo, Sicong Zhang, Xuchu Dong, Grace Hui Yang InfoSense Department.

EXPERIMENTS Evaluate on TREC 2012 and 2013 Session Tracks

The session logs contain session topic user queries previously retrieved URLs, snippets user clicks, and dwell time etc.

Task: retrieve 2,000 documents for the last query in each session

The evaluation is based on the whole session. Metrics include: nDCG@10, nDCG, nERR@10 and MAP Wall Clock Time, CPU cycles and the Big O notation

Datasets ClueWeb09 CatB ClueWeb12 CatB spam documents are

removed duplicated documents

are removed

Page 24: D ESIGNING S TATES, A CTIONS, AND R EWARDS FOR U SING POMDP IN S ESSION S EARCH Jiyun Luo, Sicong Zhang, Xuchu Dong, Grace Hui Yang InfoSense Department.

EFFICIENCY VS. # OF ACTIONS ON TREC 2012

24 When number of actions increases, efficiency tends

to drop dramatically

S1A3R2, S1A2R1,

S2A1R1(UCAIR), S2A2R1(QCM) and S2A1R1 are efficient

S1A1R1(win-win) and S1A1R2 are moderately efficient

S2A3R1(IES) is the slowest system

Page 25: D ESIGNING S TATES, A CTIONS, AND R EWARDS FOR U SING POMDP IN S ESSION S EARCH Jiyun Luo, Sicong Zhang, Xuchu Dong, Grace Hui Yang InfoSense Department.

ACCURACY VS. EFFICIENCY

TREC 2012 TREC 2013

Accuracy tends to increase when efficiency decreases S2A1R1(UCAIR) strikes a good balance between accuracy

and efficiency S1A1R1(win-win) gives impressive accuracy with a fair

degree of efficiency

Page 26: D ESIGNING S TATES, A CTIONS, AND R EWARDS FOR U SING POMDP IN S ESSION S EARCH Jiyun Luo, Sicong Zhang, Xuchu Dong, Grace Hui Yang InfoSense Department.

OUR RECOMMENDATION

If focus on accuracy

If time limit is within one hour

If want the balance of accuracy and efficiency

Note: number of actions heavily effect efficiency which need to be carefully designed

Page 27: D ESIGNING S TATES, A CTIONS, AND R EWARDS FOR U SING POMDP IN S ESSION S EARCH Jiyun Luo, Sicong Zhang, Xuchu Dong, Grace Hui Yang InfoSense Department.

CONCLUSIONS

POMDPs are good for session search modeling Information seeking behaviors

Design questions States: What changes with each time step? Actions: How does our system change the state? Rewards: How can we measure feedback or

effectiveness? It is something between an Art and Empirical

Experiments Balance between efficiency and accuracy

Page 28: D ESIGNING S TATES, A CTIONS, AND R EWARDS FOR U SING POMDP IN S ESSION S EARCH Jiyun Luo, Sicong Zhang, Xuchu Dong, Grace Hui Yang InfoSense Department.

RESOURCES

Infosense http://infosense.cs.georgetown.edu/

Dynamic IR Website Tutorials :

http://www.dynamic-ir-modeling.org/ Live Online Search Engine – Dumpling

http://dumplingproject.org Upcoming Book

Dynamic Information Retrieval Modeling TREC 2015 Dynamic Domain Track

http://trec-dd.org/ Please participate, if you are interested in

interactive, and dynamic search

http://infosense.cs.georgetown.edu/

http://www.dynamic-ir-modeling.org/

http://dumplingproject.org/

http://trec-dd.org/

Page 29: D ESIGNING S TATES, A CTIONS, AND R EWARDS FOR U SING POMDP IN S ESSION S EARCH Jiyun Luo, Sicong Zhang, Xuchu Dong, Grace Hui Yang InfoSense Department.

THANK YOU

InfoSenseGeorgetown University

[email protected]

Top Related

TA ES C RK D. - AndMark Properties...3565 Tates Creek Road, Lexington, KY 40517 | T 859.273.5500 | E CreeksOnTatesCreekLeasing@beztak.comTheCreeksonTatesCreek.com BEZNOS BEZ TAK Managed

TA ES C RK D. - AndMark Properties...3565 Tates Creek Road, Lexington, KY 40517 | T 859.273.5500 | E [email protected] BEZNOS BEZ TAK Managed

Baku masterpiece - Theatre Projectstheatreprojects.com/files/pdf/publications_designingforeducation.pdfDESIGN D esigning successful creative spaces in an educational environment starts

ContributionstoProgramOptimization andHigh-LevelSynthesisperso.ens-lyon.fr/christophe.alias/hdr-alias.pdf · 1 Introduction D ESIGNING and programming supercomputers is a major challenge

Tates t. informática. n y a

The Practice of States as Evidence of Custom: An Analysis ...P ATRICK UMBERRY T HE RACTICE OF TATES AS E VIDENCE OF C USTOM A N A NALY SIS OF F AIR AND E QUITABLE T REATMENT TAND ARD

POMDPを用いた聞き役対話システムの対話制御‚’学習する．DBN の確率変数は次のように設定した． So は対話状態，Sa はアクションの状態，oは話し役の

Partially Observable Markov Decision Processes (POMDP) תומר באום Based on ch. 15 in “Probabilistic Robotics” by Thrun et al. ב"הב"ה.

Reinforcement Learning Partially Observable Markov Decision Processes (POMDP)