ddagarc.pdf

7/31/2019 ddagarc.pdf

1/3

1

Feature Assignment in DDAGSantosh Arvind Adimoolam

CONTENTS

I Error of DDAG architecture 1

I-A Error of paths . . . . . . . . . . . . . . . . . 1

II Feature Assignment 1

II-A Feature assignment to subDAGs . . . . . . . . 2

III Algorithm 2

III-A Complexity . . . . . . . . . . . . . . . . . . . 2

III-B Alternate algorithm . . . . . . . . . . . . . . . 2

IV Feature extraction techniques 3

IV-A Principle Component Analysis . . . . . . . . . 3

IV-B Multiclass linear discriminant analysis . . . . 3

IV-C Discrete fourier transform . . . . . . . . . . . 3

IV-D Discrete cosine transform . . . . . . . . . . . 3IV-E Central Moments . . . . . . . . . . . . . . . . 3

IV-F Distance transform . . . . . . . . . . . . . . . 3

I. ERROR OF DDAG ARCHITECTURE

Definition I-.1. Let N0 be a root node in DAG and N1 and N2be its daughter nodes. N1 is on the left and N2 is on the rightas shown in the figure I. Then the transmission probability tN0 ofthe node is defined as the fraction of samples in the union of all

classes that pass to N1. We may also call this the left transmissionprobability. The right transmission probability is 1left transmissionprobability.

Fig. 1. transmission prob

Remark I-.2. Let QN0 be the probability of misclassification at thenode N0. Change in QN0 affects the transmission probability tN0and vice-versa.

A. Error of paths

Implicitly we will always be referring to paths from the root to

the end nodes of the DAG unless otherwise specified.

Lets consider a DAG architecture for n classes consisting ofnC2 nodes. A sample passes through a path of n nodes in a DAG.However if it is misclassified, then the misclassification has occurred

at only one particular node and not at multiple nodes. In a DAG

hierarchy, we compare Nj < Ni if Ni node is higher that Nj in thehierarchy. Let X be any node in a path P. We denote X as Xl orXr depending on whether X is a left or right daughter of its parent

node in the path. The root node is included among left nodes and

transmission probability for empty set of nodes is taken to be 1.Then the error of misclassification at a node Nj via a path P in theDAG is

XlP,Xl


2/3

2

Fig. 2. DDAG

A. Feature assignment to subDAGs

The following theorem leads us to propose an algorithm for optimal

feature assignment.

Theorem II-A.1. Consider a DAG D with root node N0 and N1and

N2as its two daughter nodes as shown in figure 2. Let

F aD be

the feature assignment to D. If F aD(N1) and F aD(N2) are optimalfeature assignments, then by changing only F aD(N0) to a value forwhich Q(F aD) is least, we get an optimal feature assignment.

The theorem states that if the two subDAGs at the daughter nodes

of the root node are optimal w.r.t feature assignment to the DAG,

then optimizing the DAG by just changing the value of the root

node will give us an optimal DAG.

Proof: Let t be the transmission probability of N0. By usingequations (1) and (2) we can arrive at

QD = QN0 + tQD(N1) + (1 t)QD(N2) (3)

Let F a1D be an optimal feature assignment to D which is differentfrom F aD in the theorem. Define F a2D as

F a2D(N1) = F aD(N1) F a2D(N2)

= F aD(N2)

F a2D(N0) = F a1D(N0)

The transmission probability of a node depends only on the features

assigned to that node and not on any other node. Transmission

probability of N0 is the same in case of F a2D and F a

1D as

F a2D(N0) = F a1D(N0). Let t be the transmission probability. By

equation 3

Q

F a2D

= Q

F a2D(N0)

+ tQ

F a2D(N1)

+(1 t)Q

F a2D(N2) (4)

Q

F a1D

= Q

F a1D(N0)

+ tQ

F a1D(N1)

+(1 t)Q

F a1D(N2) (5)

Q

F a1D(N0)

= Q

F a2D(N0)

as defined for the function F a2D.

By substraction equation (4) from (5) we get

Q

F a2DQ

F a1D

= t

Q

F a2D(N1)Q

F a1D(N1)

+(1 t)

Q

F a2D(N2)Q

F a1D(N2)

(6)

F a2D(N1) and F a2D(N2)

are optimal because they are the same as

F aD(N1) and F aD(N2). Hence, 6 implies

Q

F a2DQ

F a1D

0

However F a1D is an optimal assignment to D. So,Q

F a2D

= Q

F a1D

and F a2D is also optimal.

F aD(N1) = F a2D(N1)

F aD(N2) = F a2D(N2)

Only F aD(N0) is different.So, by changing F aD(N0) to F a

2D(N0) we get an optimal DAG

same as F a2D. This proves the theorem.

Remark II-A.2. If F aD is a feature assignment to D, changingF aD(N) does not change the total probability of misclassificationQ

F aD(M)

of subDAG D(M) if N is not a node in D(M).

III. ALGORITHM

Remark II-A.2 implies that all nodes at a same level can be

optimized simultaneously with respect to their subDAGs. This result

is used in the algorithm.

Algorithm III-.1. For i = n 2 to 0, i do

Optimize all nodes at level i with respect to their subDAGs.

The algorithm directly follows from the theorem.

Notice that the algorithm proceeds by optimizing from end nodes to

the start node. By optimizing a node we mean reducing the error of

the entire subDAG with the node as its root, by changing the feature

assigned to that one particular node. This does not entail that the

error at the node reduces.

A. Complexity

At every node we are running a subDAG to assign a feature to the

node. So, total number of classifications is

n1

i=1(n i)(

i+1C2) =n4 7n2 + 6n

12

(7)

This is multiplied by the number of feature classes C As complexityis high, large number of samples can not be used for training. If

however we can estimate the transmission probability of every node,

then we can use a simpler algorithm proposed below.

B. Alternate algorithm

Consider a node N and its two daughter nodes denoted as Nland Nr . If t is the transmission probability of a node N, followingequation holds

QD(N) = QN + tQD(Nl) + (1 t)QD(Nr)

.

Algorithm III-B.1. For i = n 2 to 0, i do:At every node N at level i optimize

QN + tQD(Nl) + (1 t)QD(Nr)

Store

QD(N) = QN + tQD(Nl) + (1 t)QD(Nr)

Complexity of algorithm III-B.1 is nC2 C which is second de-gree in n. This is much faster than the previous algorithm. However,this algorithm may not be accurate because it would be difficult to

estimate the transmission probability of every node accurately.


3/3

3

IV. FEATURE EXTRACTION TECHNIQUES

A. Principle Component Analysis

Pseudocode:

1. Calculate covariance matrix dd2. Find d largest Eigenvalues of 3. Find the d eigenvectors corresponding to those eigenvalues4. Project data matrix to d dimensions

has d2

inputs and each input requires calculating n multiplications.So, number of multiplications for step 1 = d2n. In step 2 and3, calculating eigenvalues and corresponding vectors requires QR

decomposition in general. So, QR decomposition requires 25d3

operations. Projecting to d dimensions in the final step requiresndd multiplications.

Total number of multiplications = d2n + 25d3 + ndd.

B. Multiclass linear discriminant analysis

Pseudocode:

1) Find the means of samples in every class.

2) Find the covariance matrix of the means-b.

3) Find the covariance matrix of all samples-.4) Calculate A = 1b5) Calculate the eigenvalues and eigenvectors of A

The eigenvectors are used for feature extraction by projections.

Calculating covariance matrix of b in step 2 requires C d2

multiplications, where C is the number of classes and d is thedimension of the vectors. Step 3 requires n d2 multiplications,where n is the total number of samples. In step 4, calculationof inverse of requires nearly d3 by Gauss-Jordan elliminationrequires d(d 1)(4d + 1)/6 multiplications and d(3d 1)/2divisions. Multiplication of 1 and requires d3 multiplications.Calculating eigenvalues and eigenvectors by QR requires 25d3

multiplications.

Total number of multiplications is

(26 + 2/3)d3 + (n + C 0.5)d2 1

6d

Total number of divisions is d(3d 1)/2.

C. Discrete fourier transform

Let (x1, x2,...,xn) be a vector of dimension d. Then if we want toobtain a d dimension vector by eliminating the higher frequencies,a Discrete fourier transform can be used. Pseudocode:

1) initialize y1, y2,...,yd

2) For t = 1 to d doyt = d

j=1xt exp 2( k1

d

)(t 1)Number of multiplications for discrete fourier transform is dd.

D. Discrete cosine transform

Let (x1, x2,...,xn) be a vector of dimension d. Then if we want toobtain a d dimension vector by eliminating the higher frequencies,a Discrete cosine transform can be used. Pseudocode:

1) initialize y1, y2,...,yd2) For t = 1 to d do

yt = 0.5(x1 + (1)t1xd) +d1

j=2 xt cos

d1

(t 1)(j 1)

Number of multiplications for discrete fourier transform is dd.

E. Central Moments

If an image is two dimensional, for j and k being integers, letf(x,y) be the pixel value at a point (x,y) then the central moment

j,k is given by:

j,k =

x,y

(x x)(y y)f(x, y)x,y

f(x, y)

Number of multiplications=Number of pixels in the image.Number of divisions=1.

F. Distance transform

The map labels each pixel in the image to its nearest boundary

pixel. If we take an n m pixels binary image, then nm searcheshave to be performed. Each search takes nm/d2 queries where d isthe reduced dimension. Complexity n2m2/d2 queries for a NMbinary image reduced to d dimensional vector.

ddagarc.pdf

Documents

Transcript of ddagarc.pdf