模糊决策树—Soft decision tree

模糊决策树

Soft Decision Tree

bookcold@msn.com June 2010

清晰决策树（ Crisp Decision Tree ）

数据预处理属性值及分类是清晰的；连续值属性数据需做离散化处理；每个属性值（术语）为属性空间上的经典集。

生成决策树决策树的每个结点是属性空间上的经典子集；每一条从根到叶子的路径对应一条清晰规则；位于同一层的结点的交集为空集。

匹配测试示例仅与一条路径匹配。

适用性适用于符号值属性和分类较清晰、噪音小的中小型数据库。

模糊决策树（ Fuzzy Decision Tree ）

数据预处理属性值及分类是模糊的；连续值属性数据需做模糊化处理；每个属性值（术语）为属性空间上的模糊集。生成决策树决策树的每个结点是属性空间上的模糊子集；每一条从根到叶子路径对应一条模糊规则；位于同一层的结点的交集一般不空。

匹配测试示例可与多条路径近似匹配。

适用范围适用于各种情况的数据库，特别是对属性和类模糊性强，有噪音的数据库。

Fuzzy set theory

• 模糊集合 (Fuzzy set) 来描述模糊事物的概念。其基本思想是把经典集合中的绝对隶属关系模糊化。

Fuzzy set theory

• Let U be a collection of objects denoted generically by {u}. U is called the universe of discourse and u represents the generic element of U.

Fuzzy set theory

• Definition 1. A fuzzy set A in a universe of discourse U is

characterized by a membership function μA which takes values in the interval [0, 1].

For uЄU, μA (u)=1 means that u is definitely a member of A and μA (u) = 0 means that u is definitely not a member of A, and 0 < μA (u) < 1 means that u is partially a member of A. If either μA (u) = 0 or μA (u) = 1 for all u Є U, A is a crisp set.

Fuzzy set theory

Soft Decision Tree

A variant of classical decision tree inductive learning using fuzzy set theory

Soft decision trees

Crisp regression trees

T1->T2->D5

Crisp regression tree

• a single threshold and having two possible answers: yes or no (left or right)

• split into two (in our case of binary trees) non-overlapping subregions of objects

Leaf L4 with membership degree of0.43 Leaf L5 with membership degree of 0.57

1:0 0:43 label∗ ∗ L4 +1:0 0:57 label∗ ∗ L5 =1:0 0:43 0:44+1:0 0:57 1:00=0.76 ∗ ∗ ∗ ∗

Soft decision tree

• discriminator function– piecewise linear (widely used)– two parameters

• α : corresponds to the split threshold in a test node of a decision or a regression tree

• β : the width, the degree of spread that defines the transition region on the attribute chosen in that node

– split (fuzzy partitioned) into two overlapping subregions

– reaches multiple terminal nodes – the output estimations given by all these terminal

nodes are aggregated through some defuzzification scheme in order to obtain the final estimated membership degree to the target class.

Building a soft decision tree

GS: growing set PS: pruning setLS: learning setTS: test set LS =GS PS∪

Soft tree semantics

Fuzzy set S into two fuzzy subsets, SL the left one and SR the right one

Discriminator function v(a(o),α,β,γ…)→[0,1]

Soft tree semantics

• Membership degree of an object to the left successor’s subset SL

– μSL(0)=μS(0)v(a(o),α,β)

strictly positive

• Membership degree of an object to the left successor’s subset SR

– μSR(0)=μS(0)(1-v(a(o),α,β))

strictly positive

Soft tree semantics

• j: a node of a tree• Lj: numerical value (or label) attached to node

• SLj: the fuzzy subset corresponding to this node

SDT growing

• a method to select a (fuzzy) split at every new node of the tree

• a rule for determining when a node should be considered terminal

• a rule for assigning a label to every identified terminal node

Automatic fuzzy partitioning of a node

Objective: Given S, fuzzy set in a soft decision tree, and attribute a(·), threshold α and width β together with successors labels LL and LR, so as to minimize the squared error function

Strategy: •Searching for the attribute and split location. With a fixed β=0 (crisp split) we search among all the attributes for the attribute a(·) yielding the smallest crisp ES , its optimal crisp split threshold α, and its corresponding (provisional) successors labels LL and LR, by using crisp heuristics adapted from CART regression trees.

Strategy: •Fuzzication and labeling. With the optimal attribute a(·) and threshold α kept frozen, we search for the optimal width β by Fibonacci search; for every new β value, the two successors labels LL and LR are automatically updated to every candidate value of β by explicit linear regression formulas.

SDT pruning

• Objective: Given a complete SDT and a pruning sample of objects PS, find

• i.e. find the subtree of the given SDT with the best mean absolute error (MAE) on the pruning set among all subtrees that could be generated from the complete SDT.

SDT pruning

Strategy:•Subtrees sequence generation.The first node in the list is removed and contracted, and the resulting tree is stored in the trees sequence. Finally, we obtain a sequence of trees in decreasing order of complexity.

SDT pruning

Strategy:•Best subtree selection.

– “One-standard-error-rule” to select a tree from the pruning sequence. –Use the PS to get an unbiased estimate of

the MAE, together with its standard error estimate.– Selecting among the trees not the one of

minimal MAE but rather the smallest tree in the sequence

SDT tuning

• Refitting– optimize only terminal nodes parameters– based on linear least squares

• Backfitting– optimize all model free parameters– based on a Levenberg-Marquardt non-linear

optimization technique

Empirical results

References

• Cristina Olaru, Louis Wehenkel A complete fuzzy decision tree technique Fuzzy Sets and Systems 138 (2003) 221–254

• Yufei Yuan, Michael J. Shaw Induction of fuzzy decision trees Fuzzy Sets and Systems 69 (1995) 125-139

• 王熙照孙娟杨宏伟赵明华模糊决策树算法与清晰决策树算法的比较研究计算机工程与应用 2003.21

模糊决策树—Soft decision tree

Technology

Transcript of 模糊决策树—Soft decision tree

树和森林的概念 二叉树 二叉树遍历 二叉树的计数 线索化二叉树 堆 树与森林 Huffman 树

第五章 树及二叉树

第 4 章 模糊控制

第八章 模糊神经网络与神经模糊系统

第七章 模糊与概率

第六章 树和二叉树2

第 6 章 树和二叉树

15 、小柳树和小枣树

树的概念 二叉树 (Binary Tree) 二叉树的存储 二叉树遍历 (Binary Tree Traversal) 树与森林 (Tree & Forest)

第 22 章 模糊理論

线段树 & 树状数组 & 堆

第三章 树 3.1 树的有关定义

简介 决策树表示法 决策树学习的适用问题 基本的决策树学习算法 决策树学习中的假想空间搜索 决策树学习的常见问题

第 6 章 树和二叉树 3

数据结构 第 6 章 树和二叉树

第 6 章 树和二叉树

经济管理学院 - tsinghua.edu.cn3.决策论（包括：决策分析的基本问题；风险型决策方法；决策树方法；不确定型决策方法；效用函数方法；

树的逻辑结构 树的存储结构 二叉树的逻辑结构 二叉树的存储结构及实现 树、森林与二叉树的转换 哈夫曼树

第六章 树和二叉树

第06章 树和二叉树(java版)

树和森林的概念二叉树二叉树遍历二叉树的计数线索化二叉树堆树与森林 Huffman 树

第五章树及二叉树

第 4 章模糊控制

第八章模糊神经网络与神经模糊系统

第七章模糊与概率

第六章树和二叉树2

　第 6 章　树和二叉树

树的概念二叉树 (Binary Tree) 二叉树的存储二叉树遍历 (Binary Tree Traversal) 树与森林 (Tree & Forest)

第 22 章　模糊理論

第三章树 3.1 树的有关定义

简介决策树表示法决策树学习的适用问题基本的决策树学习算法决策树学习中的假想空间搜索决策树学习的常见问题

第 6 章树和二叉树 3

数据结构第 6 章树和二叉树

第 6 章树和二叉树

树的逻辑结构树的存储结构二叉树的逻辑结构二叉树的存储结构及实现树、森林与二叉树的转换哈夫曼树

第六章树和二叉树

第06章树和二叉树(java版)