ddagarc.pdf
-
Upload
arvind-adimoolam -
Category
Documents
-
view
215 -
download
0
Transcript of ddagarc.pdf
-
7/31/2019 ddagarc.pdf
1/3
1
Feature Assignment in DDAGSantosh Arvind Adimoolam
CONTENTS
I Error of DDAG architecture 1
I-A Error of paths . . . . . . . . . . . . . . . . . 1
II Feature Assignment 1
II-A Feature assignment to subDAGs . . . . . . . . 2
III Algorithm 2
III-A Complexity . . . . . . . . . . . . . . . . . . . 2
III-B Alternate algorithm . . . . . . . . . . . . . . . 2
IV Feature extraction techniques 3
IV-A Principle Component Analysis . . . . . . . . . 3
IV-B Multiclass linear discriminant analysis . . . . 3
IV-C Discrete fourier transform . . . . . . . . . . . 3
IV-D Discrete cosine transform . . . . . . . . . . . 3IV-E Central Moments . . . . . . . . . . . . . . . . 3
IV-F Distance transform . . . . . . . . . . . . . . . 3
I. ERROR OF DDAG ARCHITECTURE
Definition I-.1. Let N0 be a root node in DAG and N1 and N2be its daughter nodes. N1 is on the left and N2 is on the rightas shown in the figure I. Then the transmission probability tN0 ofthe node is defined as the fraction of samples in the union of all
classes that pass to N1. We may also call this the left transmissionprobability. The right transmission probability is 1left transmissionprobability.
Fig. 1. transmission prob
Remark I-.2. Let QN0 be the probability of misclassification at thenode N0. Change in QN0 affects the transmission probability tN0and vice-versa.
A. Error of paths
Implicitly we will always be referring to paths from the root to
the end nodes of the DAG unless otherwise specified.
Lets consider a DAG architecture for n classes consisting ofnC2 nodes. A sample passes through a path of n nodes in a DAG.However if it is misclassified, then the misclassification has occurred
at only one particular node and not at multiple nodes. In a DAG
hierarchy, we compare Nj < Ni if Ni node is higher that Nj in thehierarchy. Let X be any node in a path P. We denote X as Xl orXr depending on whether X is a left or right daughter of its parent
node in the path. The root node is included among left nodes and
transmission probability for empty set of nodes is taken to be 1.Then the error of misclassification at a node Nj via a path P in theDAG is
XlP,Xl
-
7/31/2019 ddagarc.pdf
2/3
2
Fig. 2. DDAG
A. Feature assignment to subDAGs
The following theorem leads us to propose an algorithm for optimal
feature assignment.
Theorem II-A.1. Consider a DAG D with root node N0 and N1and
N2as its two daughter nodes as shown in figure 2. Let
F aD be
the feature assignment to D. If F aD(N1) and F aD(N2) are optimalfeature assignments, then by changing only F aD(N0) to a value forwhich Q(F aD) is least, we get an optimal feature assignment.
The theorem states that if the two subDAGs at the daughter nodes
of the root node are optimal w.r.t feature assignment to the DAG,
then optimizing the DAG by just changing the value of the root
node will give us an optimal DAG.
Proof: Let t be the transmission probability of N0. By usingequations (1) and (2) we can arrive at
QD = QN0 + tQD(N1) + (1 t)QD(N2) (3)
Let F a1D be an optimal feature assignment to D which is differentfrom F aD in the theorem. Define F a2D as
F a2D(N1) = F aD(N1) F a2D(N2)
= F aD(N2)
F a2D(N0) = F a1D(N0)
The transmission probability of a node depends only on the features
assigned to that node and not on any other node. Transmission
probability of N0 is the same in case of F a2D and F a
1D as
F a2D(N0) = F a1D(N0). Let t be the transmission probability. By
equation 3
Q
F a2D
= Q
F a2D(N0)
+ tQ
F a2D(N1)
+(1 t)Q
F a2D(N2) (4)
Q
F a1D
= Q
F a1D(N0)
+ tQ
F a1D(N1)
+(1 t)Q
F a1D(N2) (5)
Q
F a1D(N0)
= Q
F a2D(N0)
as defined for the function F a2D.
By substraction equation (4) from (5) we get
Q
F a2DQ
F a1D
= t
Q
F a2D(N1)Q
F a1D(N1)
+(1 t)
Q
F a2D(N2)Q
F a1D(N2)
(6)
F a2D(N1) and F a2D(N2)
are optimal because they are the same as
F aD(N1) and F aD(N2). Hence, 6 implies
Q
F a2DQ
F a1D
0
However F a1D is an optimal assignment to D. So,Q
F a2D
= Q
F a1D
and F a2D is also optimal.
F aD(N1) = F a2D(N1)
F aD(N2) = F a2D(N2)
Only F aD(N0) is different.So, by changing F aD(N0) to F a
2D(N0) we get an optimal DAG
same as F a2D. This proves the theorem.
Remark II-A.2. If F aD is a feature assignment to D, changingF aD(N) does not change the total probability of misclassificationQ
F aD(M)
of subDAG D(M) if N is not a node in D(M).
III. ALGORITHM
Remark II-A.2 implies that all nodes at a same level can be
optimized simultaneously with respect to their subDAGs. This result
is used in the algorithm.
Algorithm III-.1. For i = n 2 to 0, i do
Optimize all nodes at level i with respect to their subDAGs.
The algorithm directly follows from the theorem.
Notice that the algorithm proceeds by optimizing from end nodes to
the start node. By optimizing a node we mean reducing the error of
the entire subDAG with the node as its root, by changing the feature
assigned to that one particular node. This does not entail that the
error at the node reduces.
A. Complexity
At every node we are running a subDAG to assign a feature to the
node. So, total number of classifications is
n1
i=1(n i)(
i+1C2) =n4 7n2 + 6n
12
(7)
This is multiplied by the number of feature classes C As complexityis high, large number of samples can not be used for training. If
however we can estimate the transmission probability of every node,
then we can use a simpler algorithm proposed below.
B. Alternate algorithm
Consider a node N and its two daughter nodes denoted as Nland Nr . If t is the transmission probability of a node N, followingequation holds
QD(N) = QN + tQD(Nl) + (1 t)QD(Nr)
.
Algorithm III-B.1. For i = n 2 to 0, i do:At every node N at level i optimize
QN + tQD(Nl) + (1 t)QD(Nr)
Store
QD(N) = QN + tQD(Nl) + (1 t)QD(Nr)
Complexity of algorithm III-B.1 is nC2 C which is second de-gree in n. This is much faster than the previous algorithm. However,this algorithm may not be accurate because it would be difficult to
estimate the transmission probability of every node accurately.
-
7/31/2019 ddagarc.pdf
3/3
3
IV. FEATURE EXTRACTION TECHNIQUES
A. Principle Component Analysis
Pseudocode:
1. Calculate covariance matrix dd2. Find d largest Eigenvalues of 3. Find the d eigenvectors corresponding to those eigenvalues4. Project data matrix to d dimensions
has d2
inputs and each input requires calculating n multiplications.So, number of multiplications for step 1 = d2n. In step 2 and3, calculating eigenvalues and corresponding vectors requires QR
decomposition in general. So, QR decomposition requires 25d3
operations. Projecting to d dimensions in the final step requiresndd multiplications.
Total number of multiplications = d2n + 25d3 + ndd.
B. Multiclass linear discriminant analysis
Pseudocode:
1) Find the means of samples in every class.
2) Find the covariance matrix of the means-b.
3) Find the covariance matrix of all samples-.4) Calculate A = 1b5) Calculate the eigenvalues and eigenvectors of A
The eigenvectors are used for feature extraction by projections.
Calculating covariance matrix of b in step 2 requires C d2
multiplications, where C is the number of classes and d is thedimension of the vectors. Step 3 requires n d2 multiplications,where n is the total number of samples. In step 4, calculationof inverse of requires nearly d3 by Gauss-Jordan elliminationrequires d(d 1)(4d + 1)/6 multiplications and d(3d 1)/2divisions. Multiplication of 1 and requires d3 multiplications.Calculating eigenvalues and eigenvectors by QR requires 25d3
multiplications.
Total number of multiplications is
(26 + 2/3)d3 + (n + C 0.5)d2 1
6d
Total number of divisions is d(3d 1)/2.
C. Discrete fourier transform
Let (x1, x2,...,xn) be a vector of dimension d. Then if we want toobtain a d dimension vector by eliminating the higher frequencies,a Discrete fourier transform can be used. Pseudocode:
1) initialize y1, y2,...,yd
2) For t = 1 to d doyt = d
j=1xt exp 2( k1
d
)(t 1)Number of multiplications for discrete fourier transform is dd.
D. Discrete cosine transform
Let (x1, x2,...,xn) be a vector of dimension d. Then if we want toobtain a d dimension vector by eliminating the higher frequencies,a Discrete cosine transform can be used. Pseudocode:
1) initialize y1, y2,...,yd2) For t = 1 to d do
yt = 0.5(x1 + (1)t1xd) +d1
j=2 xt cos
d1
(t 1)(j 1)
Number of multiplications for discrete fourier transform is dd.
E. Central Moments
If an image is two dimensional, for j and k being integers, letf(x,y) be the pixel value at a point (x,y) then the central moment
j,k is given by:
j,k =
x,y
(x x)(y y)f(x, y)x,y
f(x, y)
Number of multiplications=Number of pixels in the image.Number of divisions=1.
F. Distance transform
The map labels each pixel in the image to its nearest boundary
pixel. If we take an n m pixels binary image, then nm searcheshave to be performed. Each search takes nm/d2 queries where d isthe reduced dimension. Complexity n2m2/d2 queries for a NMbinary image reduced to d dimensional vector.