TWIX Trees WIth eXtra splits - Homepage of Martin Theus · Martin Theus, Sergej Potapov –...
Transcript of TWIX Trees WIth eXtra splits - Homepage of Martin Theus · Martin Theus, Sergej Potapov –...
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
Sergej PotapovMartin Theus
Simon Urbanek
TWIX
Trees WIth eXtra splits
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 2
Motivation
• Where the classical CART algoritm fails– Greedy algorithms never go for a (locally) second best solution,
which would result in a better overall (global) solution.
Example: XOR-data
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 2
Motivation
• Where the classical CART algoritm fails– Greedy algorithms never go for a (locally) second best solution,
which would result in a better overall (global) solution.
Example: XOR-data
-3 -2 -1 0 1 2 3 4 5 6 7 8
-3
-2
-1
0
1
2
3
4
5
6
7
8
9
10
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 2
Motivation
• Where the classical CART algoritm fails– Greedy algorithms never go for a (locally) second best solution,
which would result in a better overall (global) solution.
Example: XOR-data
-3 -2 -1 0 1 2 3 4 5 6 7 8
0
-3 -2 -1 0 1 2 3 4 5 6 7 8
-3
-2
-1
0
1
2
3
4
5
6
7
8
9
10
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 2
Motivation
• Where the classical CART algoritm fails– Greedy algorithms never go for a (locally) second best solution,
which would result in a better overall (global) solution.
Example: XOR-data
-3 -2 -1 0 1 2 3 4 5 6 7 8
0
>=3.532669
1
< 3.532669
2
1.437037
>=5.679445
y
< 3.151441
1
>=3.151441
2
1.290780
< 3.248369
x
>=2.703216
1
< 2.703216
2
1.806452
>=3.248369
x
1.532075
< 5.679445
y
1.500000
x
-3 -2 -1 0 1 2 3 4 5 6 7 8
-3
-2
-1
0
1
2
3
4
5
6
7
8
9
10
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 3
Trees: More Problems
• Non-orthogonal splitting directions …
-3 -2 -1 0 1 2 3
-3
-2
-1
0
1
2
3
-3 -2 -1 0 1 2 3
-3
-2
-1
0
1
2
3
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 3
Trees: More Problems
• Non-orthogonal splitting directions …
-3 -2 -1 0 1 2 3
-3
-2
-1
0
1
2
3
-3 -2 -1 0 1 2 3
-3
-2
-1
0
1
2
3
< -0.05426408
1
>=0.09766749
1
< 0.09766749
2
1
>=-0.05426408
V1
1
>=-0.5117182
V2
>=-1.571775
1
< -1.758858
1
>=-1.758858
2
1
< -1.571775
V2
1
< -0.8793875
V1
>=-0.8793875
2
1
< -0.5117182
V2
1
< 0.2145846
V1
< 1.029791
1
>=1.029791
2
2
>=0.8353015
V2
< 0.8353015
2
2
>=0.2145846
V1
1
V2
-3 -2 -1 0 1 2 3
-3
-2
-1
0
1
2
3
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 3
Trees: More Problems
• Non-orthogonal splitting directions …
-3 -2 -1 0 1 2 3
-3
-2
-1
0
1
2
3
-3 -2 -1 0 1 2 3
-3
-2
-1
0
1
2
3
< -0.05426408
1
>=0.09766749
1
< 0.09766749
2
1
>=-0.05426408
V1
1
>=-0.5117182
V2
>=-1.571775
1
< -1.758858
1
>=-1.758858
2
1
< -1.571775
V2
1
< -0.8793875
V1
>=-0.8793875
2
1
< -0.5117182
V2
1
< 0.2145846
V1
< 1.029791
1
>=1.029791
2
2
>=0.8353015
V2
< 0.8353015
2
2
>=0.2145846
V1
1
V2
-3 -2 -1 0 1 2 3
-3
-2
-1
0
1
2
3
… can not be handled by single trees, no matter how we split
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 4
Bagging Revisited
Bagging = Boostrap Aggegation, tries to simulate an infinite sam-ple by bootstrapping, i.e. sampling from the original sample withreplacement.
Repeat N times:
1. Generate a bootstrap sample Di of size n.
2. Fit model f̂Di.
Depending on the problem the N results are aggregated:
• Classification: g(x) = argmaxc!C
N!
i=1
I(fDi(x) = c)
• Regression: g(x) =1
N
N!
i=1
fDi(x)
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 5
Ensembles
• General IdeaUse many “different” classifier and combine them to get more accurate results.
• Bagging: Instability of trees yields different models
• Random Forests: Restrict input space randomly to get wider range of models
• Boosting: Iterate to up-weight “bad” points
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 5
Ensembles
• General IdeaUse many “different” classifier and combine them to get more accurate results.
• Bagging: Instability of trees yields different models
• Random Forests: Restrict input space randomly to get wider range of models
• Boosting: Iterate to up-weight “bad” points
Question:Why use randomly generated (sub-optimal) models?
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 6
Tree Mechanics
• CART is a recursive partitioning algorithm
• Each node is split according to the maximum gain in the loss function
• Mountain plots shows the loss function for a variable for all possible split points
Loss Functions Mountain Plot
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.1
0.2
0.3
0.4
0.5
p
Entropy
Gini
Missclassification
6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400
0
100
200
300
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 6
Tree Mechanics
• CART is a recursive partitioning algorithm
• Each node is split according to the maximum gain in the loss function
• Mountain plots shows the loss function for a variable for all possible split points
Loss Functions Mountain Plot
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.1
0.2
0.3
0.4
0.5
p
Entropy
Gini
Missclassification
6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400
0
100
200
300
6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400
1
3
5
7
9
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 6
Tree Mechanics
• CART is a recursive partitioning algorithm
• Each node is split according to the maximum gain in the loss function
• Mountain plots shows the loss function for a variable for all possible split points
Loss Functions Mountain Plot
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.1
0.2
0.3
0.4
0.5
p
Entropy
Gini
Missclassification
6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400
0
100
200
300
6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400
1
3
5
7
9
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 6
Tree Mechanics
• CART is a recursive partitioning algorithm
• Each node is split according to the maximum gain in the loss function
• Mountain plots shows the loss function for a variable for all possible split points
Loss Functions Mountain Plot
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.1
0.2
0.3
0.4
0.5
p
Entropy
Gini
Missclassification
6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400
0
100
200
300
6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400
1
3
5
7
9
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 6
Tree Mechanics
• CART is a recursive partitioning algorithm
• Each node is split according to the maximum gain in the loss function
• Mountain plots shows the loss function for a variable for all possible split points
Loss Functions Mountain Plot
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.1
0.2
0.3
0.4
0.5
p
Entropy
Gini
Missclassification
6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400
0
100
200
300
6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400
1
3
5
7
9
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 7
Idea behind TWIX
• Since the greedy CART algorithm not necessarily finds the “optimal” tree, try second best splits.
• Use these forests for aggregation
• Expect better results for both single trees and aggregations
• How to find “good” candidates for second best splits?
• Number of inner nodes grows exponentially with the number of levels in the tree
⇒ so does the number of alternative trees
Problems
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 8
Second Best Splits: South African Heart Data
100 120 140 160 180 200
05
10
15
20
sbp
Deviance
0 5 10 15 20 25 30
05
10
15
20
tobacco
Deviance
2 4 6 8 10 12 14
05
10
15
20
ldl
Deviance
10 15 20 25 30 35 40
05
10
15
20
adiposity
Deviance
20 30 40 50 60 70
05
10
15
20
typea
Deviance
20 25 30 35 40 45
05
10
15
20
obesityDeviance
0 20 40 60 80 100 120
05
10
15
20
alcohol
Deviance
20 30 40 50 60
05
10
15
20
age
Deviance
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 8
Second Best Splits: South African Heart Data
100 120 140 160 180 200
05
10
15
20
sbp
Deviance
0 5 10 15 20 25 30
05
10
15
20
tobacco
Deviance
2 4 6 8 10 12 14
05
10
15
20
ldl
Deviance
10 15 20 25 30 35 40
05
10
15
20
adiposity
Deviance
20 30 40 50 60 70
05
10
15
20
typea
Deviance
20 25 30 35 40 45
05
10
15
20
obesityDeviance
0 20 40 60 80 100 120
05
10
15
20
alcohol
Deviance
20 30 40 50 60
05
10
15
20
age
Deviance
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 8
Second Best Splits: South African Heart Data
100 120 140 160 180 200
05
10
15
20
sbp
Deviance
0 5 10 15 20 25 30
05
10
15
20
tobacco
Deviance
2 4 6 8 10 12 14
05
10
15
20
ldl
Deviance
10 15 20 25 30 35 40
05
10
15
20
adiposity
Deviance
20 30 40 50 60 70
05
10
15
20
typea
Deviance
20 25 30 35 40 45
05
10
15
20
obesityDeviance
0 20 40 60 80 100 120
05
10
15
20
alcohol
Deviance
20 30 40 50 60
05
10
15
20
age
Deviance
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 8
Second Best Splits: South African Heart Data
100 120 140 160 180 200
05
10
15
20
sbp
Deviance
0 5 10 15 20 25 30
05
10
15
20
tobacco
Deviance
2 4 6 8 10 12 14
05
10
15
20
ldl
Deviance
10 15 20 25 30 35 40
05
10
15
20
adiposity
Deviance
20 30 40 50 60 70
05
10
15
20
typea
Deviance
20 25 30 35 40 45
05
10
15
20
obesityDeviance
0 20 40 60 80 100 120
05
10
15
20
alcohol
Deviance
20 30 40 50 60
05
10
15
20
age
Deviance
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 9
Second Best Splits: Global vs. Local
• When searching for a “best” split point, we can either look for– all top n greatest deviance gains, or– only look for local maxima
• ExampleTop 6 splits
100 120 140 160 180 200
05
10
15
20
sbp
Deviance
0 5 10 15 20 25 30
05
10
15
20
tobacco
Deviance
2 4 6 8 10 12 14
05
10
15
20
ldl
Deviance
10 15 20 25 30 35 40
05
10
15
20
adiposity
Deviance
20 30 40 50 60 70
05
10
15
20
typea
Deviance
20 25 30 35 40 45
05
10
15
20
obesity
Deviance
0 20 40 60 80 100 120
05
10
15
20
alcohol
Deviance
20 30 40 50 60
05
10
15
20
age
Deviance
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 9
Second Best Splits: Global vs. Local
• When searching for a “best” split point, we can either look for– all top n greatest deviance gains, or– only look for local maxima
• ExampleTop 6 splits
100 120 140 160 180 200
05
10
15
20
sbp
Deviance
0 5 10 15 20 25 30
05
10
15
20
tobacco
Deviance
2 4 6 8 10 12 14
05
10
15
20
ldl
Deviance
10 15 20 25 30 35 40
05
10
15
20
adiposity
Deviance
20 30 40 50 60 70
05
10
15
20
typea
Deviance
20 25 30 35 40 45
05
10
15
20
obesity
Deviance
0 20 40 60 80 100 120
05
10
15
20
alcohol
Deviance
20 30 40 50 60
05
10
15
20
age
Deviance
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 9
Second Best Splits: Global vs. Local
• When searching for a “best” split point, we can either look for– all top n greatest deviance gains, or– only look for local maxima
• ExampleTop 6 splits
100 120 140 160 180 200
05
10
15
20
sbp
Deviance
0 5 10 15 20 25 30
05
10
15
20
tobacco
Deviance
2 4 6 8 10 12 14
05
10
15
20
ldl
Deviance
10 15 20 25 30 35 40
05
10
15
20
adiposity
Deviance
20 30 40 50 60 70
05
10
15
20
typea
Deviance
20 25 30 35 40 45
05
10
15
20
obesity
Deviance
0 20 40 60 80 100 120
05
10
15
20
alcohol
Deviance
20 30 40 50 60
05
10
15
20
age
Deviance
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 10
Second Best Splits: Forcing Variables
• Often a single variable dominates the potential deviance gain, and shadows all other variables⇒ Many probably good split points are lost.
• Solution:Force a minimum number of split points for each variable.
• Example: top 6 vs. top 3
100 120 140 160 180 200
05
10
15
20
sbp
Deviance
0 5 10 15 20 25 30
05
10
15
20
tobacco
Deviance
2 4 6 8 10 12 14
05
10
15
20
ldl
Deviance
10 15 20 25 30 35 40
05
10
15
20
adiposity
Deviance
20 30 40 50 60 70
05
10
15
20
typea
Deviance
20 25 30 35 40 45
05
10
15
20
obesity
Deviance
0 20 40 60 80 100 120
05
10
15
20
alcohol
Deviance
20 30 40 50 60
05
10
15
20
age
Deviance
100 120 140 160 180 200
05
10
15
20
sbp
Deviance
0 5 10 15 20 25 30
05
10
15
20
tobaccoDeviance
2 4 6 8 10 12 14
05
10
15
20
ldl
Deviance
10 15 20 25 30 35 40
05
10
15
20
adiposity
Deviance
20 30 40 50 60 70
05
10
15
20
typea
Deviance
20 25 30 35 40 45
05
10
15
20
obesity
Deviance
0 20 40 60 80 100 120
05
10
15
20
alcohol
Deviance
20 30 40 50 60
05
10
15
20
age
Deviance
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 10
Second Best Splits: Forcing Variables
• Often a single variable dominates the potential deviance gain, and shadows all other variables⇒ Many probably good split points are lost.
• Solution:Force a minimum number of split points for each variable.
• Example: top 6 vs. top 3
100 120 140 160 180 200
05
10
15
20
sbp
Deviance
0 5 10 15 20 25 30
05
10
15
20
tobacco
Deviance
2 4 6 8 10 12 14
05
10
15
20
ldl
Deviance
10 15 20 25 30 35 40
05
10
15
20
adiposity
Deviance
20 30 40 50 60 70
05
10
15
20
typea
Deviance
20 25 30 35 40 45
05
10
15
20
obesity
Deviance
0 20 40 60 80 100 120
05
10
15
20
alcohol
Deviance
20 30 40 50 60
05
10
15
20
age
Deviance
100 120 140 160 180 200
05
10
15
20
sbp
Deviance
0 5 10 15 20 25 30
05
10
15
20
tobaccoDeviance
2 4 6 8 10 12 14
05
10
15
20
ldl
Deviance
10 15 20 25 30 35 40
05
10
15
20
adiposity
Deviance
20 30 40 50 60 70
05
10
15
20
typea
Deviance
20 25 30 35 40 45
05
10
15
20
obesity
Deviance
0 20 40 60 80 100 120
05
10
15
20
alcohol
Deviance
20 30 40 50 60
05
10
15
20
age
Deviance
56
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 10
Second Best Splits: Forcing Variables
• Often a single variable dominates the potential deviance gain, and shadows all other variables⇒ Many probably good split points are lost.
• Solution:Force a minimum number of split points for each variable.
• Example: top 6 vs. top 3
100 120 140 160 180 200
05
10
15
20
sbp
Deviance
0 5 10 15 20 25 30
05
10
15
20
tobacco
Deviance
2 4 6 8 10 12 14
05
10
15
20
ldl
Deviance
10 15 20 25 30 35 40
05
10
15
20
adiposity
Deviance
20 30 40 50 60 70
05
10
15
20
typea
Deviance
20 25 30 35 40 45
05
10
15
20
obesity
Deviance
0 20 40 60 80 100 120
05
10
15
20
alcohol
Deviance
20 30 40 50 60
05
10
15
20
age
Deviance
100 120 140 160 180 200
05
10
15
20
sbp
Deviance
0 5 10 15 20 25 30
05
10
15
20
tobaccoDeviance
2 4 6 8 10 12 14
05
10
15
20
ldl
Deviance
10 15 20 25 30 35 40
05
10
15
20
adiposity
Deviance
20 30 40 50 60 70
05
10
15
20
typea
Deviance
20 25 30 35 40 45
05
10
15
20
obesity
Deviance
0 20 40 60 80 100 120
05
10
15
20
alcohol
Deviance
20 30 40 50 60
05
10
15
20
age
Deviance
56
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 10
Second Best Splits: Forcing Variables
• Often a single variable dominates the potential deviance gain, and shadows all other variables⇒ Many probably good split points are lost.
• Solution:Force a minimum number of split points for each variable.
• Example: top 6 vs. top 3
100 120 140 160 180 200
05
10
15
20
sbp
Deviance
0 5 10 15 20 25 30
05
10
15
20
tobacco
Deviance
2 4 6 8 10 12 14
05
10
15
20
ldl
Deviance
10 15 20 25 30 35 40
05
10
15
20
adiposity
Deviance
20 30 40 50 60 70
05
10
15
20
typea
Deviance
20 25 30 35 40 45
05
10
15
20
obesity
Deviance
0 20 40 60 80 100 120
05
10
15
20
alcohol
Deviance
20 30 40 50 60
05
10
15
20
age
Deviance
100 120 140 160 180 200
05
10
15
20
sbp
Deviance
0 5 10 15 20 25 30
05
10
15
20
tobaccoDeviance
2 4 6 8 10 12 14
05
10
15
20
ldl
Deviance
10 15 20 25 30 35 40
05
10
15
20
adiposity
Deviance
20 30 40 50 60 70
05
10
15
20
typea
Deviance
20 25 30 35 40 45
05
10
15
20
obesity
Deviance
0 20 40 60 80 100 120
05
10
15
20
alcohol
Deviance
20 30 40 50 60
05
10
15
20
age
Deviance
56
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 11
Second Best Splits: Grid Search
• In some situations good split points might not even be associated with some (local) maximum in deviance gain.
(Remember the XOR Example)
• Grid searches are most exhaustive, but also most expensive.
100 120 140 160 180 200
05
10
15
20
sbp
Deviance
0 5 10 15 20 25 30
05
10
15
20
tobacco
Deviance
2 4 6 8 10 12 14
05
10
15
20
ldl
Deviance
10 15 20 25 30 35 40
05
10
15
20
adiposity
Deviance
20 30 40 50 60 70
05
10
15
20
typea
Deviance
20 25 30 35 40 45
05
10
15
20
obesity
Deviance
0 20 40 60 80 100 120
05
10
15
20
alcohol
Deviance
20 30 40 50 60
05
10
15
20
age
Deviance
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 11
Second Best Splits: Grid Search
• In some situations good split points might not even be associated with some (local) maximum in deviance gain.
(Remember the XOR Example)
• Grid searches are most exhaustive, but also most expensive.
100 120 140 160 180 200
05
10
15
20
sbp
Deviance
0 5 10 15 20 25 30
05
10
15
20
tobacco
Deviance
2 4 6 8 10 12 14
05
10
15
20
ldl
Deviance
10 15 20 25 30 35 40
05
10
15
20
adiposity
Deviance
20 30 40 50 60 70
05
10
15
20
typea
Deviance
20 25 30 35 40 45
05
10
15
20
obesity
Deviance
0 20 40 60 80 100 120
05
10
15
20
alcohol
Deviance
20 30 40 50 60
05
10
15
20
age
Deviance
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 12
Implementation: The Grid
• If we allow sj splits per node on level j of the tree, we get a max-imum of
S =k!
i=1
s2i!1
i
trees for a tree with no more than k levels. Example:s = (7, 4, 2) ! S = 72
0
· 421
· 222
= 7 · 16 · 16 = 1792
!Work on a grid of computers
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 12
Implementation: The Grid
• If we allow sj splits per node on level j of the tree, we get a max-imum of
S =k!
i=1
s2i!1
i
trees for a tree with no more than k levels. Example:s = (7, 4, 2) ! S = 72
0
· 421
· 222
= 7 · 16 · 16 = 1792
!Work on a grid of computers
PVMController
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 13
Using the R Package
• The most important tuning parameters are– method
Which split points will be used? This can be "deviance" (default), "grid" or "local". If the method is set to: local the program uses the local maxima of the split function(entropy), deviance all values of the entropy, grid grid points.
– topn.methodone of "complete"(default) or "single". A specification of the consideration of the split points. If set to "complete" it uses split points from all variables, else it uses split points per variable.
– topNinteger vector. How many splits will be selected and at which level? If length 1, the same size of splits will be selected at each level. If length > 1, for example topN=c(3,2), 3 splits will be chosen at first level, 2 splits at second level and for all next levels 1 split.
– levelmaximum depth of the trees. If level set to 1, trees consist of root node.
– Stopping Rules:■ minsplit
the minimum number of observations that must exist in a node.■ minbucket
the minimum number of observations in any terminal <leaf> node.■ Devmin
the minimum improvement on entropy by splitting.
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 14
South African Heart Desease Data cont.
Training Validation Test
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 14
South African Heart Desease Data cont.
• To get a “fair”, i.e. generalizable and not too overfitted classifier, we usually split the data into 3 chunks:
Training Validation Test
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 14
South African Heart Desease Data cont.
• To get a “fair”, i.e. generalizable and not too overfitted classifier, we usually split the data into 3 chunks:
– TrainingAll models are trained using the training data
Training Validation Test
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 14
South African Heart Desease Data cont.
• To get a “fair”, i.e. generalizable and not too overfitted classifier, we usually split the data into 3 chunks:
– TrainingAll models are trained using the training data
– ValidationThe “best” model is selected using the validation data(The chosen model is then estimated with training+validation)
Training Validation Test
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 14
South African Heart Desease Data cont.
• To get a “fair”, i.e. generalizable and not too overfitted classifier, we usually split the data into 3 chunks:
– TrainingAll models are trained using the training data
– ValidationThe “best” model is selected using the validation data(The chosen model is then estimated with training+validation)
– TestThe performance is then assessed with the test data
Training Validation Test
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 14
South African Heart Desease Data cont.
• To get a “fair”, i.e. generalizable and not too overfitted classifier, we usually split the data into 3 chunks:
– TrainingAll models are trained using the training data
– ValidationThe “best” model is selected using the validation data(The chosen model is then estimated with training+validation)
– TestThe performance is then assessed with the test data
Training Validation Test
What about trees?Model Structure = Model parameters
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 15
The Dataset
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 15
The Dataset
• 10 Variables, 462 Observations
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 15
The Dataset
• 10 Variables, 462 Observations
• Target: Coronary Heart Disease (chd), 34,63% = 160 cases
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 15
The Dataset
• 10 Variables, 462 Observations
• Target: Coronary Heart Disease (chd), 34,63% = 160 cases
• Inputs:continuous– sbp systolic blood pressure– tobacco cumulative tobacco (kg)– ldl low density lipoprotein cholesterol– adiposity– typea type-A behavior– obesity– alcohol current alcohol consumption– age age at onset
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 15
The Dataset
• 10 Variables, 462 Observations
• Target: Coronary Heart Disease (chd), 34,63% = 160 cases
• Inputs:continuous– sbp systolic blood pressure– tobacco cumulative tobacco (kg)– ldl low density lipoprotein cholesterol– adiposity– typea type-A behavior– obesity– alcohol current alcohol consumption– age age at onset
discrete– famhist family history of heart disease (Present, Absent)
101 218
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 16
The Dataset: Univariate
sbp
101 218101 218
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 16
The Dataset: Univariate
sbp
101 218101 218101 218
0.0
1.0
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 16
The Dataset: Univariate
sbp
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 17
The Dataset: Univariate
101 218
0.0
1.0
0 31.2
0.0
1.0
0.98 15.33
0.0
1.0
6.7 42.5
0.0
1.0
13 78
0.0
1.0
14.7 46.6
0.0
1.0
0 147.2
0.0
1.0
15 64
0.0
1.0
sbp tobacco ldl adiposity
typea obesity alcohol age
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 18
The Dataset:Bivariate sbp
0 5 10 15 20 25 30 10 20 30 40 15 25 35 45 20 30 40 50 60
100
140
180
220
05
1015202530
tobacco
ldl
24
68
10
14
10
20
30
40
adiposity
typea
20
40
60
80
15
25
35
45
obesity
alcohol
050
100
150
100 140 180 220
20
30
40
50
60
2 4 6 8 10 14 20 40 60 80 0 50 100 150
age
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 19
The Dataset: Multivariate
sbp
tobacco
ldl
adiposity
typea
obesity
alcohol
age
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 19
The Dataset: Multivariate
sbp
tobacco
ldl
adiposity
typea
obesity
alcohol
age
sbp
tobacco
ldl
adiposity
typea
obesity
alcohol
age
!
! !
!
! ! !!
! ! ! !
!!!!!! ! !
! !!!! !!!
! !! ! ! !
! !!! !
! ! !!! !! !!! ! !! ! !!!
! ! !
! !
! !!!!
!!! !!!
!
0.60 0.65 0.70 0.75 0.80
0.6
00
.65
0.7
00
.75
0.8
0
CCR of 130 trees
CCR of training data
CC
R o
f te
tst
da
ta
+
!
!!
!
!
!!!
!!!
!!!
!!
!
!
!
!
!!
!! !
!
!
!!
!
!!!
!!!!!!
!
!
!
!
!!!
!
!
!!!!
!
!!!!
!!
!
!
!
!
!
! !
!
!!!
!
!
! !
!!!
!
!!
!!!
!!!
!!
!!
!!
!
!
!
!!!!!!!
!!!!!!
!!
!
!!!!
!
!!
!
!
!!!
!!!
!!!
!
50 100 150
51
01
5
Training Deviance
Te
st
De
via
nce
!
Deviances of trees (n= 130 )
Training Deviance
Fre
qu
en
cy
50 100 150
05
10
15
20
25
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 20
TWIX: Diagnostics
• For a given “Multitree” we can compare deviance and classification rate on training and test/validation data.
Example:+ rpart Tree
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 21
TWIX: Tree Selection
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 21
TWIX: Tree Selection
• The CCR (Correct Classification Rate) of the top TWIX trees are better than those of greedy trees and many other classif. methods
Quest:How to find the “best” trees from the validation data?
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 21
TWIX: Tree Selection
• The CCR (Correct Classification Rate) of the top TWIX trees are better than those of greedy trees and many other classif. methods
Quest:How to find the “best” trees from the validation data?
• Several approaches:
(a) Sort trees according to: ■ training deviance, ■ validation deviance, ■ validation CCR,
and pick the best!
(b) Avoid extreme trees, i.e. forget trees having the worst deviances or CCRs and repeat (a).
(c) Look for structural properties like balance of tree, purity and size of leaves
(d) Identify clusters among the trees and avoid selecting trees from a “bad” cluster
(e) Use a mixture from (a) – (d) …
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
Looking at Tree Clusters
• Metric: Jaccard Coefficient
• Create groups via MDS and hierarchical clustering
22
1 2 3 4 5 6 7 8
020
40
60
80
100
120
Dimensionen
stress
!!!!!!
0.60
0.65
0.70
0.75
0.80
!!
0.60
0.65
0.70
0.75
0.80
0.60
0.65
0.70
0.75
0.80
0.60
0.65
0.70
0.75
0.80
1 2 3 4
010
20
30
40
50
16 KAPITEL 2. METRIKEN AUS DER LITERATUR
Somit unterscheiden sich die Bäume in genau drei Variablen, nämlich sbp, ldl undtypea. Für den Matching Koeffizienten erhält man dann:
dmc(B1, B2) =19
·9!
i=1
| !1,i ! !2,i |=
=19
· (0 + 1 + 1 + 0 + 0 + 0 + 0 + 1 + 0) =
=19
· 3 =13" 0.33
Aufgund diese Ergbnisses beträgt die Ähnlichkeit der beiden Bäume 1! 13 " 67%.
2.2.2 Der Jaccard Koeffizient
Der Jaccard Koeffizient basiert auf der selben Überlegung wie der Matching Koeffizient,mit dem Unterschied, dass die Normierung beim Jaccard Koeffizient nur mit der tatsäch-lich verwendeten Anzahl von Splitvariablen stattfindet.Somit ergibt sich für den Jaccard Koeffizienten folgende Definition:
djacc(Bi, Bj) :=1
| Vi # Vj | ·n!
k=1
| !i,k ! !j,k |
Der Normierungsfaktor | Vi # Vj | entspricht der Anzahl derjenigen Variablen, die tat-sächlich in den Bäumen Bi und Bj als Splitvariablen verwendet worden sind, dabei stelltVj die Menge der im Baum j verwendeten Splitvariablen dar. Analog zum Matching Ko-effizienten entspricht ! der boolsche Vektor der verwendeten Splitvariablen.
Beweis: djacc ist eine MetrikDer Jaccard Koeffizient kann durch den Matching Koeffizienten dargestellt werden:
djacc(Bi, Bj) :=1
| Vi # Vj | ·n!
k=1
| !i,k ! !j,k |= n
| Vi # Vj | · dmc(Bi, Bj)
Somit erfolgt der Beweis, dass djacc eine Metrik ist analog zu dmc. !
Beispiel 2: Es werden wie im Beispiel 1 die Bäume aus der Abbildung 2.1 verwendet. Daaus dem Beispiel 1 der Matching Koeffizienten für diese zwei Bäume bereits bekannt ist,muss man nur noch | V1 # V2 | ermitteln.V1 = {age,sbp,ldl,famhist}V2 = {age,typea,famhist}$ | V1 # V2 |=| {age,sbp,ldl,typea,famhist} |= 5Folglich ergibt sich für den Jaccard Koffizienten folgendes Ergebnis:
dj(B1, B2) =9
| V1 # V2 | · dmc(B1, B2) =95
· 13
=35
Damit beträgt die Ähnlichkeit der beiden Bäume approximativ 40%.
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
Looking at Forests
• Traceplots show a tree ensemble in a single framework
Cluster 1
23
tobacco age
age ldltypea tobacco
famhisttypea ldltobacco
ldltobacco agetypea famhist
typea famhisttobacco
ldltobacco
Level
1
2
3
4
5
6
7
8
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
Looking at Forests
• Traceplots show a tree ensemble in a single framework
Cluster 1
23
tobacco age
age ldltypea tobacco
famhisttypea ldltobacco
ldltobacco agetypea famhist
typea famhisttobacco
ldltobacco
tobacco age
tobacco age famhisttypea ldl
famhisttypea tobacco ldlalcohol age adiposityobesity sbp
ldl adiposity sbpagetypeaobesity tobacco famhist
typeaobesity ldlage sbpfamhisttobaccoalcohol
ldlagetypeaobesity tobacco sbpfamhistalcohol adiposity
adiposity sbpldlalcoholobesity tobacco
alcohol typeaobesity age
Cluster 3Level
1
2
3
4
5
6
7
8
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 24
The Competitors: On 100 random samples
• Logistic Regression
glm(response~., data=dataTrain, family="binomial")
• Traditional CART (from rpart)
rpart(response ~ ., data=dataTrain, parms=list(split='information'))
• Bagging (from ipred)
bagging(response~., data=dataTrain)
• SVM (from e1070)
svm(response~.,data=dataTrain)
!! None of the methods has been fine-tuned !!
!
!
! ! ! !
! !!
! !!!!!
rpart.trsvm.tr
log..reg
TWIX.Agg
bagging
0.15 0.20 0.25 0.30 0.35 0.40
!
!
!
!
!
!
!
!
!
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 25
TWIX: Results
• 100 runs of a (20,3) TWIX tree, local maxima
rpart (training)
svm (training)
TWIX (training)
!
!
! ! ! !
! !!
! !!!!!
rpart.trsvm.tr
log..reg
TWIX.Agg
bagging
0.15 0.20 0.25 0.30 0.35 0.40
!
!
!
!
!
!
!
!
!
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 25
TWIX: Results
• 100 runs of a (20,3) TWIX tree, local maxima
rpart (training)
svm (training)
TWIX (training)
logistic (test)
!
!
! ! ! !
! !!
! !!!!!
rpart.trsvm.tr
log..reg
TWIX.Agg
bagging
0.15 0.20 0.25 0.30 0.35 0.40
!
!
!
!
!
!
!
!
!
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 25
TWIX: Results
• 100 runs of a (20,3) TWIX tree, local maxima
rpart (training)
svm (training)
TWIX (training)
logistic (test)
svm (test)
!
!
! ! ! !
! !!
! !!!!!
rpart.trsvm.tr
log..reg
TWIX.Agg
bagging
0.15 0.20 0.25 0.30 0.35 0.40
!
!
!
!
!
!
!
!
!
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 25
TWIX: Results
• 100 runs of a (20,3) TWIX tree, local maxima
rpart (training)
svm (training)
TWIX (training)
logistic (test)
svm (test)
TWIX agg (test)
!
!
! ! ! !
! !!
! !!!!!
rpart.trsvm.tr
log..reg
TWIX.Agg
bagging
0.15 0.20 0.25 0.30 0.35 0.40
!
!
!
!
!
!
!
!
!
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 25
TWIX: Results
• 100 runs of a (20,3) TWIX tree, local maxima
rpart (training)
svm (training)
TWIX (training)
logistic (test)
svm (test)
TWIX agg (test)
TWIX (test)
!
!
! ! ! !
! !!
! !!!!!
rpart.trsvm.tr
log..reg
TWIX.Agg
bagging
0.15 0.20 0.25 0.30 0.35 0.40
!
!
!
!
!
!
!
!
!
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 25
TWIX: Results
• 100 runs of a (20,3) TWIX tree, local maxima
rpart (training)
svm (training)
TWIX (training)
logistic (test)
svm (test)
TWIX agg (test)
TWIX (test)
Bagging (test)
!
!
! ! ! !
! !!
! !!!!!
rpart.trsvm.tr
log..reg
TWIX.Agg
bagging
0.15 0.20 0.25 0.30 0.35 0.40
!
!
!
!
!
!
!
!
!
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 25
TWIX: Results
• 100 runs of a (20,3) TWIX tree, local maxima
rpart (training)
svm (training)
TWIX (training)
logistic (test)
svm (test)
TWIX agg (test)
TWIX (test)
Bagging (test)
rpart (test)
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
TWIX Results: Parallel Coordinates
26
rpart (training)
svm (training)
TWIX (training)
logistic (test)
svm (test)
TWIX agg (test)
TWIX (test)
Bagging (test)
rpart (test)
0.12 0.435
rpart tr
svm tr
TWIX Tr
log. reg
svm te
TWIX Agg
TWIX Te
bagging
rpart te
0.12 0.435
rpart tr
svm tr
TWIX Tr
log. reg
svm te
TWIX Agg
TWIX Te
bagging
rpart te
0.12 0.435
rpart tr
svm tr
TWIX Tr
log. reg
svm te
TWIX Agg
TWIX Te
bagging
rpart te
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
TWIX Results: Parallel Coordinates
26
rpart (training)
svm (training)
TWIX (training)
logistic (test)
svm (test)
TWIX agg (test)
TWIX (test)
Bagging (test)
rpart (test)
0.12 0.435
rpart tr
svm tr
TWIX Tr
log. reg
svm te
TWIX Agg
TWIX Te
bagging
rpart te
0.12 0.435
rpart tr
svm tr
TWIX Tr
log. reg
svm te
TWIX Agg
TWIX Te
bagging
rpart te
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
TWIX Results: Parallel Coordinates
26
rpart (training)
svm (training)
TWIX (training)
logistic (test)
svm (test)
TWIX agg (test)
TWIX (test)
Bagging (test)
rpart (test)
0.12 0.435
rpart tr
svm tr
TWIX Tr
log. reg
svm te
TWIX Agg
TWIX Te
bagging
rpart te
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
TWIX Results: Parallel Coordinates
26
rpart (training)
svm (training)
TWIX (training)
logistic (test)
svm (test)
TWIX agg (test)
TWIX (test)
Bagging (test)
rpart (test)
0.12 0.435
rpart tr
svm tr
TWIX Tr
log. reg
svm te
TWIX Agg
TWIX Te
bagging
rpart te
0.12 0.435
rpart tr
svm tr
TWIX Tr
log. reg
svm te
TWIX Agg
TWIX Te
bagging
rpart te
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
How does TWIX compare?
• Are the resulting ensembles similar to other tree-based ensemble methods?
27
TWIX
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
How does TWIX compare?
• Are the resulting ensembles similar to other tree-based ensemble methods?
27
rForests
Force the useof splits in all
variables
TWIX
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
How does TWIX compare?
• Are the resulting ensembles similar to other tree-based ensemble methods?
27
Bagging
many splitsclose to
greedy split
rForests
Force the useof splits in all
variables
TWIX
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 28
Conclusion
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 28
Conclusion
• For the South-African-Heart Data– Single TWIX-trees out-perform traditional trees and usually bagged trees – Aggregated TWIX beats bagged trees and reaches top performance– TWIX gives good single alternative tree models
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 28
Conclusion
• For the South-African-Heart Data– Single TWIX-trees out-perform traditional trees and usually bagged trees – Aggregated TWIX beats bagged trees and reaches top performance– TWIX gives good single alternative tree models
• Still room for performance improvement– Better (more stable) tree selection– Improved selection of “second best” splits– Improved aggregation of the trees (weights, boosting, …)
– More tests on more datasets (mlbench, “report78”)– Better understanding of the tree-families
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 28
Conclusion
• For the South-African-Heart Data– Single TWIX-trees out-perform traditional trees and usually bagged trees – Aggregated TWIX beats bagged trees and reaches top performance– TWIX gives good single alternative tree models
• Still room for performance improvement– Better (more stable) tree selection– Improved selection of “second best” splits– Improved aggregation of the trees (weights, boosting, …)
– More tests on more datasets (mlbench, “report78”)– Better understanding of the tree-families
• Computational effort can be high, but parallel computing helps
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 28
Conclusion
• For the South-African-Heart Data– Single TWIX-trees out-perform traditional trees and usually bagged trees – Aggregated TWIX beats bagged trees and reaches top performance– TWIX gives good single alternative tree models
• Still room for performance improvement– Better (more stable) tree selection– Improved selection of “second best” splits– Improved aggregation of the trees (weights, boosting, …)
– More tests on more datasets (mlbench, “report78”)– Better understanding of the tree-families
• Computational effort can be high, but parallel computing helps
• Understanding of tree ensembles still poor, classical metrics fail
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits 28
Conclusion
• For the South-African-Heart Data– Single TWIX-trees out-perform traditional trees and usually bagged trees – Aggregated TWIX beats bagged trees and reaches top performance– TWIX gives good single alternative tree models
• Still room for performance improvement– Better (more stable) tree selection– Improved selection of “second best” splits– Improved aggregation of the trees (weights, boosting, …)
– More tests on more datasets (mlbench, “report78”)– Better understanding of the tree-families
• Computational effort can be high, but parallel computing helps
• Understanding of tree ensembles still poor, classical metrics fail
• Complex methods are hard to implement and hard to test, importance of “reproducible research” cannot be underestimated!