Online User Location Inference Exploiting Spatiotemporal Correlations in Social Streams
-
Upload
yuto-yamaguchi -
Category
Technology
-
view
1.137 -
download
0
Transcript of Online User Location Inference Exploiting Spatiotemporal Correlations in Social Streams
Online User Loca.on Inference Exploi'ng Spa'otemporal
Correla'ons in Social Streams
Yuto Yamaguchi†, Toshiyuki Amagasa†, Hiroyuki Kitagawa†, and Yohei Ikawa‡
† University of Tsukuba ‡ IBM Research -‐ Tokyo
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 1
Tweets that help us
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 2
Shaked !!!
We can infer your home loca'on immediately
Thunder
Social and Loca'on
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 3
• Lots of social media users
• Frequent updates
• User home loca'ons
Loca'on-‐based Applica'ons
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 4
• Event Detec'on
• Loca'on-‐based Marke'ng
• Epidemics Analysis
Lack of home loca'ons
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 5
Most users do not disclose their home loca'ons
• 74% of TwiXer users [Cheng+, 10]
• 94% of Facebooks users [Backstrom+, 10]
Our Objec've
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 6
To infer home loca'ons of social media users
Focus & Contribu'ons
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 7
〜〜〜〜〜〜〜〜
〜〜〜〜〜〜〜〜
Time
〜〜〜〜〜〜〜〜
Our Focus
Social contents are not sta-c, but like a stream
Our Contribu.ons
1. Online & Incremental Inference
2. Exploi'ng Spa'otemporal features
ONLINE & INCREMENTAL INFERENCE
Contribu'on 1
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 8
Exis'ng methods: Batch inference
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 9
Exis'ng Methods
Inference Results Batch Input
Perform batch inference just once acer “enough data” is stored
è Can’t update the results L è What is “enough”? L è When will it be enough? L
Our method: Online & incremental inference method
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 10
Online & incremental method
Inference Results Social Stream
Perform loca'on inference every 'me new post arrives
è Can keep the results up to date J
EXPLOITING SPATIOTEMPORAL FEATURES
Contribu'on 2
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 11
Local words
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 12
地震だ! Steelers!
Home loca.on known
Local words: strongly correlated to a specific loca.on
Steelers!
Home loca.on unknown
Infer PiXsburgh?
Exis'ng methods: Only sta'c features
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 13
地震だ! Thunder!
Home loca'on known
Thunder!
Home loca'on unknown
“Thunderbolt” is not a local word sta'cally
Thunder!
Thunder! Thunder!
Home loca'on known
Home loca'on known
Home loca'on known
è Can’t u.lize this word L
Can’t infer
Our method: Spa'otemporal correla'on
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 14
地震だ! Thunder!
Home loca'on known
Thunder!
Home loca'on unknown
“Thunderbolt” can be a local word temporally è Our method can u.lize this word J
In a specific .me period
Can infer
PROPOSED METHOD OLIM: Online Loca'on Inference Method
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 15
The Algorithm
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 16
1. divideMap() 2. calcPopula'onDistribu'on() 3. for post p from SocialStream 4. user u <-‐ getUser(p) 5. if u is loca'on-‐known 6. updateLocalWords(p) 7. else 8. updateUserLoca'on(u,p) .
Preprocessing
Main
Slide 17
Slide 20
Slide 24
divideMap
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 17
Each region is treated as a categorical loca'on
Quadtree decomposi.on
L = l1, l2,…, lK{ }
Loca.on inference is reduced to a classifica.on problem
Popula'on distribu'on
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 18
…
l1 l2 l3 l4 l5 lK
What frac.on of loca.on-‐known users live in each loca.on Used for local words extrac.on
Categorical distribu'on
The Algorithm
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 19
1. divideMap() 2. calcPopula'onDistribu'on() 3. for post p from SocialStream 4. user u <-‐ getUser(p) 5. if u is loca'on-‐known 6. updateLocalWords(p) 7. else 8. updateUserLoca'on(u,p) .
Preprocessing
Main
Slide 17
Slide 20
Slide 24
updateLocalWords: Sliding window and word distribu'on
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 20
Sliding window with length N
e.g.) N = 5
…
l1 l2 l3 l4 l5 lK…
Word distribu'on
Where the word posted from?
updateLocalWords: Local Word Intui'on
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 21
…
…
l1 l2 l3 l4 l5 lK…
…
l1 l2 l3 l4 l5 lK…
Popula'on distribu'on Word distribu'on
Word distribu'on
KL Divergence small
Local word
…
l1 l2 l3 l4 l5 lK
Detail
updateLocalWords: Online upda'ng
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 22
Detail
Window length N is fixed
We can update KL in O(1) every .me new post arrives J
The Algorithm
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 23
1. divideMap() 2. calcPopula'onDistribu'on() 3. for post p from SocialStream 4. user u <-‐ getUser(p) 5. if u is loca'on-‐known 6. updateLocalWords(p) 7. else 8. updateUserLoca'on(u,p) .
Preprocessing
Main
Slide 17
Slide 20
Slide 24
updateUserLoca'on: user distribu'on
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 24
…
l1 l2 l3 l4 l5 lK
Denotes how likely this user lives in each loca'on
u
User distribu'on of u
updateUserLoca'on: update
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 25
…
l1 l2 l3 l4 l5 lK
prior
…
l1 l2 l3 l4 l5 lK
posterior
update
If user u posts local word w:
…
l1 l2 l3 l4 l5 lK…
w
Word distribu'on of w
Dirichlet-‐Mul.nomial Compound for Bayesian updates
Detail
EXPERIMENTS Accuracy & Costs
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 26
Data from TwiXer
• Data size – 200K loca'on-‐known users in Japan
• Geocode loca'on profiles into coordinates – 200 tweets for each user (40M in total) – 34M follow edges (for exis'ng methods)
• 90% for training; 5% for valida'on; 5% for test
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 27
Inference accuracy
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 28
Exis.ng methods
Cost per update
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 29
Be]er
Variants of ours
Exis.ng methods
Feed 40M tweets in the dataset chronologically
Conclusion
• Proposed loca'on inference method – online & incremental inference
• Constant 'me complexity
– exploi'ng spa'otemporal correla'on • BeXer accuracy
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 30