Fast and Exact Monitoring of Co-evolving Data Streams Yasuko Matsubara, Yasushi Sakurai (Kumamoto...

65
Fast and Exact Monitoring of Co-evolving Data Streams Yasuko Matsubara, Yasushi Sakurai (Kumamoto University) Naonori Ueda (NTT) Masatoshi Yoshikawa (Kyoto University) ICDM 2014 Y. Matsubara et al. 1 Speaker: Yasushi Sakurai

Transcript of Fast and Exact Monitoring of Co-evolving Data Streams Yasuko Matsubara, Yasushi Sakurai (Kumamoto...

PowerPoint Presentation

Fast and Exact Monitoring of Co-evolving Data StreamsYasuko Matsubara, Yasushi Sakurai (Kumamoto University)Naonori Ueda (NTT)Masatoshi Yoshikawa (Kyoto University)ICDM 2014Y. Matsubara et al.1

Speaker: Yasushi Sakurai MotivationGiven: co-evolving data streamse.g., MoCap (leg/arm sensors)

Y. Matsubara et al.2ICDM 2014

http://www.youtube.com/watch?v=6UV3kRV46Zs&t=49s

2MotivationGiven: co-evolving data streamse.g., MoCap (leg/arm sensors)

Y. Matsubara et al.3ICDM 2014

Multiple distinct patternsdifferent durationse.g., Walking, punchinghttp://www.youtube.com/watch?v=6UV3kRV46Zs&t=49s

3MotivationGiven: co-evolving data streamse.g., MoCap (leg/arm sensors)

Y. Matsubara et al.4ICDM 2014

Q1: WalkingQ2: RunningQ3: PunchingQueries:

http://www.youtube.com/watch?v=6UV3kRV46Zs&t=49s

4MotivationGiven: co-evolving data streamse.g., MoCap (leg/arm sensors)

Y. Matsubara et al.5ICDM 2014

Q1: WalkingQ2: RunningQ3: PunchingQueries:

Q. Can we find subsequences that have the characteristics of query ?

http://www.youtube.com/watch?v=6UV3kRV46Zs&t=49s

5OutlineICDM 20146MotivationProblem formulationMain ideasExperimentsStreamScan at workConclusionsY. Matsubara et al.BackgroundICDM 2014Y. Matsubara et al.7Goal: statistical monitoring of time-varying data streamsBackgroundICDM 2014Y. Matsubara et al.8Goal: statistical monitoring of time-varying data streams

Query (punch)Data streamBackgroundICDM 2014Y. Matsubara et al.9Goal: statistical monitoring of time-varying data streams

Query (punch)Data streamPunch is here!BackgroundICDM 2014Y. Matsubara et al.10Goal: statistical monitoring of time-varying data streams

Query (punch)Data streamHidden Markov models (HMMs)BackgroundICDM 2014Y. Matsubara et al.11Hidden Markov models (HMMs)Initial state probabilitiesState transition probabilities

Output probabilitiesstate 1

state 3state 2

DetailsBackgroundICDM 2014Y. Matsubara et al.12Model: Sequence:

Likelihood:

Details

state 1state 3state 2

Viterbi pathTrellis structureBackgroundICDM 2014Y. Matsubara et al.13Model: Sequence:

Likelihood:

Details

state 1state 3state 2

Viterbi pathTrellis structure

BackgroundICDM 201414Given: a data stream X and model

Y. Matsubara et al.

BackgroundICDM 201415Given: a data stream X and model

Y. Matsubara et al.

Find: subsequences that has a high likelihood value

RequirementsICDM 201416Exponential threshold functionMinimum length of subsequencesNon-overlapping matches

Y. Matsubara et al.RequirementsICDM 201417Exponential threshold functionMinimum length of subsequencesNon-overlapping matches

Y. Matsubara et al.x1xnXRequirementsICDM 201418Exponential threshold functionMinimum length of subsequencesNon-overlapping matches

Y. Matsubara et al.x1xnXWalkJumpWalkJumpPunchRequirementsICDM 201419Exponential threshold functionMinimum length of subsequencesNon-overlapping matches

Y. Matsubara et al.x1xnX

RequirementsICDM 201420Exponential threshold functionMinimum length of subsequencesNon-overlapping matches

Y. Matsubara et al.x1xnX

RequirementsICDM 201421Exponential threshold functionMinimum length of subsequencesNon-overlapping matches

Y. Matsubara et al.x1xnX

ignoreignoreRequirementsICDM 201422Exponential threshold functionMinimum length of subsequencesNon-overlapping matches

Y. Matsubara et al.x1xnX

Best matchCandidates

Problem definitionICDM 201423Given: data stream Modelthresholds Y. Matsubara et al.

XProblem definitionICDM 201424Given: data stream Modelthresholds Report: all subsequencesthat satisfy: 1. 2. only the local maximum, among several overlapping matchesY. Matsubara et al.

X

Best matchOutlineICDM 201425MotivationProblem formulationMain ideasExperimentsStreamScan at workConclusionsY. Matsubara et al.Previous solutionICDM 201426Y. Matsubara et al.Sliding model method (SMM), by Wilpon et al.Compute likelihood starting from every time-tick

Previous solutionICDM 201427Y. Matsubara et al.Sliding model method (SMM), by Wilpon et al.Compute likelihood starting from every time-tick

O(n) trellis structuresUpdate O(nk2) for each time-tickMain idea: StreamScan ICDM 201428Y. Matsubara et al. Cumulative likelihood function

Subsequence trellis structure (STS)

ICDM 201429Y. Matsubara et al. Cumulative likelihood function

DetailsMain idea (a) ICDM 201430Y. Matsubara et al. Cumulative likelihood function

Details It requires a single structure It guarantees the best matchesMain idea (a)ICDM 201431Y. Matsubara et al.(b) Subsequence trellis structure (STS)

DetailsMain idea (b)

ICDM 201432Y. Matsubara et al.(b) Subsequence trellis structure (STS)

DetailsMain idea (b)

: Starting position: Cum. likelihood

For each cell: ICDM 201433Y. Matsubara et al.Example, DetailsAlgorithm

state 1

state 3state 2

ICDM 201434Y. Matsubara et al.Example, DetailsAlgorithmState 10(1)State 20(1)State 30(1)x1=3x2=1x3=1x4=2x5=3x6=3x7=3x8=1

(STS, k=3)

ICDM 201435Y. Matsubara et al.Example, DetailsAlgorithmState 10(1)State 20(1)State 30(1)x1=3x2=1x3=1x4=2x5=3x6=3x7=3x8=1

: Cum. Likelihood: Starting position

(STS, k=3)ICDM 201436Y. Matsubara et al.Example, DetailsAlgorithmState 10(1)10(2)State 20(1)0(2)State 30(1)0(2)x1=3x2=1x3=1x4=2x5=3x6=3x7=3x8=1

candidate!

(STS, k=3)ICDM 201437Y. Matsubara et al.Example, DetailsAlgorithmState 10(1)10(2)50(2)State 20(1)0(2)37.5(2)State 30(1)0(2)0(3)x1=3x2=1x3=1x4=2x5=3x6=3x7=3x8=1

(STS, k=3)ICDM 201438Y. Matsubara et al.Example, DetailsAlgorithmState 10(1)10(2)50(2)0(4)State 20(1)0(2)37.5(2)62.5(2)State 30(1)0(2)0(3)0(4)x1=3x2=1x3=1x4=2x5=3x6=3x7=3x8=1

(STS, k=3)ICDM 201439Y. Matsubara et al.Example, DetailsAlgorithmState 10(1)10(2)50(2)0(4)0(5)0(6)0(7)State 20(1)0(2)37.5(2)62.5(2)0(5)0(6)0(7)State 30(1)0(2)0(3)0(4)156.25(2)1562.5(2)15625(2)x1=3x2=1x3=1x4=2x5=3x6=3x7=3x8=1

(STS, k=3)ICDM 201440Y. Matsubara et al.Example, DetailsAlgorithmState 10(1)10(2)50(2)0(4)0(5)0(6)0(7)10(8)State 20(1)0(2)37.5(2)62.5(2)0(5)0(6)0(7)0(8)State 30(1)0(2)0(3)0(4)156.25(2)1562.5(2)15625(2)0(8)x1=3x2=1x3=1x4=2x5=3x6=3x7=3x8=1

Report X[2:7]

(STS, k=3)ICDM 201441Y. Matsubara et al.StreamScan guarantees no false dismissalsO(1) space and time per time-tick(Details in paper!)

Theoretical analysis

DetailsOutlineICDM 201442MotivationProblem formulationMain ideasExperimentsStreamScan at workConclusionsY. Matsubara et al.ExperimentsWe answer the following questions

Q1. Effectiveness How successful is StreamScan in capturing sequence patterns?Q2. Scalability How does it scale with the sequence lengths n in terms of time and space?

ICDM 201443Y. Matsubara et al.Q1. Effectiveness (MoCap)ICDM 201444Y. Matsubara et al.

(Query)

(Output)Q1. EffectivenessICDM 201445Y. Matsubara et al.MoCap

Q1. WalkQ2. JumpQ1. WalkQ2. JumpQ3. Moving armsQ2. ScalabilityTime/space vs. data size (length) : n

ICDM 201446Y. Matsubara et al.StreamScan requires constant time/space, i.e., O(1)

(n)

(n)(time)(space)Q2. ScalabilityTime/space vs. data size (length) : n

ICDM 201447Y. Matsubara et al.StreamScan requires constant time/space, i.e., O(1)

(n)

(n)(time)(space)SFR051383xSMM89479,000xOutlineICDM 201448MotivationProblem formulationMain ideasExperimentsStreamScan at workConclusionsY. Matsubara et al.StreamScan at workStreamScan is capable of various applications, e.g., App1. Social activity monitoring - Web-click streamsApp2. Extreme detection in keyword stream - GoogleTrend streams

ICDM 201449Y. Matsubara et al.

StreamScan at workStreamScan is capable of various applications, e.g., App1. Social activity monitoring - Web-click streamsApp2. Extreme detection in keyword stream - GoogleTrend streams

ICDM 201450Y. Matsubara et al.

Application (1)Social activity monitoringGiven: Web-click sequences (1 month, 5urls)Query: first Sunday

( 5urls: blog, news, dictionary, Q&A, mail )ICDM 201451Y. Matsubara et al.

Time (per 10 seconds)QueryApplication (1)Social activity monitoringGiven: Web-click sequences (1 month, 5urls)Query: first Sunday

( 5urls: blog, news, dictionary, Q&A, mail )ICDM 201452Y. Matsubara et al.

Time (per 10 seconds)Query

StreamScan identifies all weekends + holidayDetected!StreamScan at workStreamScan is capable of various applications, e.g., App1. Social activity monitoring - Web-click streamsApp2. Extreme detection in keyword stream - GoogleTrend streams

ICDM 201453Y. Matsubara et al.

Extreme monitoring in keyword streamGiven: GoogleTrend streams (10 years, weekly)

ICDM 201454Y. Matsubara et al.Application (2)Extreme monitoring in keyword streamGiven: GoogleTrend streams (10 years, weekly)

e.g., Finance-related keywords

ICDM 201455Y. Matsubara et al.Application (2)

Time (weekly, 10 years)Extreme monitoring in keyword streamGiven: GoogleTrend streams (10 years, weekly)

e.g., Finance-related keywords

ICDM 201456Y. Matsubara et al.Application (2)

Any extreme behavior?Time (weekly, 10 years)?Extreme monitoring in keyword streamGiven: GoogleTrend streams (10 years, weekly)

e.g., Finance-related keywords

ICDM 201457Y. Matsubara et al.Application (2)

Any extreme behavior?Time (weekly, 10 years)(first 3 years)Query ?Extreme monitoring in keyword streamGiven: GoogleTrend streams (10 years, weekly)

e.g., Finance-related keywords

ICDM 201458Y. Matsubara et al.Application (2)

Time (weekly, 10 years)(first 3 years)

Global financial crisis in 2008!Detected!

Extreme monitoring in keyword streamGiven: GoogleTrend streams (10 years, weekly)

e.g., Flu-related keywordsICDM 201459Y. Matsubara et al.Application (2)(first 3 years)Query ?

Extreme monitoring in keyword streamGiven: GoogleTrend streams (10 years, weekly)

e.g., Flu-related keywordsICDM 201460Y. Matsubara et al.Application (2)Swine flu pandemic in 2009Detected!

Extreme monitoring in keyword streamGiven: GoogleTrend streams (10 years, weekly)

e.g., Seasonal sweets-related keywordsICDM 201461Y. Matsubara et al.Application (2)

?(first 3 years)Query

Extreme monitoring in keyword streamGiven: GoogleTrend streams (10 years, weekly)

e.g., Seasonal sweets-related keywordsICDM 201462Y. Matsubara et al.Application (2)Release of Android OS, Gingerbread Ice Cream Sandwich

Detected!OutlineICDM 201463MotivationProblem formulationMain ideasExperimentsStreamScan at workConclusionsY. Matsubara et al.ConclusionsStreamScan has the following propertiesEffective Find fruitful patterns from diverse data streamsExactIt guarantees exactnessFast and nimbleIt requires single scan, O(1) space and timeICDM 2014Y. Matsubara et al.64

Thank you!ICDM 201465Y. Matsubara et al.

http://www.cs.kumamoto-u.ac.jp/~yasuko/[email protected]