ddagarc.pdf

download ddagarc.pdf

of 3

Transcript of ddagarc.pdf

  • 7/31/2019 ddagarc.pdf

    1/3

    1

    Feature Assignment in DDAGSantosh Arvind Adimoolam

    CONTENTS

    I Error of DDAG architecture 1

    I-A Error of paths . . . . . . . . . . . . . . . . . 1

    II Feature Assignment 1

    II-A Feature assignment to subDAGs . . . . . . . . 2

    III Algorithm 2

    III-A Complexity . . . . . . . . . . . . . . . . . . . 2

    III-B Alternate algorithm . . . . . . . . . . . . . . . 2

    IV Feature extraction techniques 3

    IV-A Principle Component Analysis . . . . . . . . . 3

    IV-B Multiclass linear discriminant analysis . . . . 3

    IV-C Discrete fourier transform . . . . . . . . . . . 3

    IV-D Discrete cosine transform . . . . . . . . . . . 3IV-E Central Moments . . . . . . . . . . . . . . . . 3

    IV-F Distance transform . . . . . . . . . . . . . . . 3

    I. ERROR OF DDAG ARCHITECTURE

    Definition I-.1. Let N0 be a root node in DAG and N1 and N2be its daughter nodes. N1 is on the left and N2 is on the rightas shown in the figure I. Then the transmission probability tN0 ofthe node is defined as the fraction of samples in the union of all

    classes that pass to N1. We may also call this the left transmissionprobability. The right transmission probability is 1left transmissionprobability.

    Fig. 1. transmission prob

    Remark I-.2. Let QN0 be the probability of misclassification at thenode N0. Change in QN0 affects the transmission probability tN0and vice-versa.

    A. Error of paths

    Implicitly we will always be referring to paths from the root to

    the end nodes of the DAG unless otherwise specified.

    Lets consider a DAG architecture for n classes consisting ofnC2 nodes. A sample passes through a path of n nodes in a DAG.However if it is misclassified, then the misclassification has occurred

    at only one particular node and not at multiple nodes. In a DAG

    hierarchy, we compare Nj < Ni if Ni node is higher that Nj in thehierarchy. Let X be any node in a path P. We denote X as Xl orXr depending on whether X is a left or right daughter of its parent

    node in the path. The root node is included among left nodes and

    transmission probability for empty set of nodes is taken to be 1.Then the error of misclassification at a node Nj via a path P in theDAG is

    XlP,Xl

  • 7/31/2019 ddagarc.pdf

    2/3

    2

    Fig. 2. DDAG

    A. Feature assignment to subDAGs

    The following theorem leads us to propose an algorithm for optimal

    feature assignment.

    Theorem II-A.1. Consider a DAG D with root node N0 and N1and

    N2as its two daughter nodes as shown in figure 2. Let

    F aD be

    the feature assignment to D. If F aD(N1) and F aD(N2) are optimalfeature assignments, then by changing only F aD(N0) to a value forwhich Q(F aD) is least, we get an optimal feature assignment.

    The theorem states that if the two subDAGs at the daughter nodes

    of the root node are optimal w.r.t feature assignment to the DAG,

    then optimizing the DAG by just changing the value of the root

    node will give us an optimal DAG.

    Proof: Let t be the transmission probability of N0. By usingequations (1) and (2) we can arrive at

    QD = QN0 + tQD(N1) + (1 t)QD(N2) (3)

    Let F a1D be an optimal feature assignment to D which is differentfrom F aD in the theorem. Define F a2D as

    F a2D(N1) = F aD(N1) F a2D(N2)

    = F aD(N2)

    F a2D(N0) = F a1D(N0)

    The transmission probability of a node depends only on the features

    assigned to that node and not on any other node. Transmission

    probability of N0 is the same in case of F a2D and F a

    1D as

    F a2D(N0) = F a1D(N0). Let t be the transmission probability. By

    equation 3

    Q

    F a2D

    = Q

    F a2D(N0)

    + tQ

    F a2D(N1)

    +(1 t)Q

    F a2D(N2) (4)

    Q

    F a1D

    = Q

    F a1D(N0)

    + tQ

    F a1D(N1)

    +(1 t)Q

    F a1D(N2) (5)

    Q

    F a1D(N0)

    = Q

    F a2D(N0)

    as defined for the function F a2D.

    By substraction equation (4) from (5) we get

    Q

    F a2DQ

    F a1D

    = t

    Q

    F a2D(N1)Q

    F a1D(N1)

    +(1 t)

    Q

    F a2D(N2)Q

    F a1D(N2)

    (6)

    F a2D(N1) and F a2D(N2)

    are optimal because they are the same as

    F aD(N1) and F aD(N2). Hence, 6 implies

    Q

    F a2DQ

    F a1D

    0

    However F a1D is an optimal assignment to D. So,Q

    F a2D

    = Q

    F a1D

    and F a2D is also optimal.

    F aD(N1) = F a2D(N1)

    F aD(N2) = F a2D(N2)

    Only F aD(N0) is different.So, by changing F aD(N0) to F a

    2D(N0) we get an optimal DAG

    same as F a2D. This proves the theorem.

    Remark II-A.2. If F aD is a feature assignment to D, changingF aD(N) does not change the total probability of misclassificationQ

    F aD(M)

    of subDAG D(M) if N is not a node in D(M).

    III. ALGORITHM

    Remark II-A.2 implies that all nodes at a same level can be

    optimized simultaneously with respect to their subDAGs. This result

    is used in the algorithm.

    Algorithm III-.1. For i = n 2 to 0, i do

    Optimize all nodes at level i with respect to their subDAGs.

    The algorithm directly follows from the theorem.

    Notice that the algorithm proceeds by optimizing from end nodes to

    the start node. By optimizing a node we mean reducing the error of

    the entire subDAG with the node as its root, by changing the feature

    assigned to that one particular node. This does not entail that the

    error at the node reduces.

    A. Complexity

    At every node we are running a subDAG to assign a feature to the

    node. So, total number of classifications is

    n1

    i=1(n i)(

    i+1C2) =n4 7n2 + 6n

    12

    (7)

    This is multiplied by the number of feature classes C As complexityis high, large number of samples can not be used for training. If

    however we can estimate the transmission probability of every node,

    then we can use a simpler algorithm proposed below.

    B. Alternate algorithm

    Consider a node N and its two daughter nodes denoted as Nland Nr . If t is the transmission probability of a node N, followingequation holds

    QD(N) = QN + tQD(Nl) + (1 t)QD(Nr)

    .

    Algorithm III-B.1. For i = n 2 to 0, i do:At every node N at level i optimize

    QN + tQD(Nl) + (1 t)QD(Nr)

    Store

    QD(N) = QN + tQD(Nl) + (1 t)QD(Nr)

    Complexity of algorithm III-B.1 is nC2 C which is second de-gree in n. This is much faster than the previous algorithm. However,this algorithm may not be accurate because it would be difficult to

    estimate the transmission probability of every node accurately.

  • 7/31/2019 ddagarc.pdf

    3/3

    3

    IV. FEATURE EXTRACTION TECHNIQUES

    A. Principle Component Analysis

    Pseudocode:

    1. Calculate covariance matrix dd2. Find d largest Eigenvalues of 3. Find the d eigenvectors corresponding to those eigenvalues4. Project data matrix to d dimensions

    has d2

    inputs and each input requires calculating n multiplications.So, number of multiplications for step 1 = d2n. In step 2 and3, calculating eigenvalues and corresponding vectors requires QR

    decomposition in general. So, QR decomposition requires 25d3

    operations. Projecting to d dimensions in the final step requiresndd multiplications.

    Total number of multiplications = d2n + 25d3 + ndd.

    B. Multiclass linear discriminant analysis

    Pseudocode:

    1) Find the means of samples in every class.

    2) Find the covariance matrix of the means-b.

    3) Find the covariance matrix of all samples-.4) Calculate A = 1b5) Calculate the eigenvalues and eigenvectors of A

    The eigenvectors are used for feature extraction by projections.

    Calculating covariance matrix of b in step 2 requires C d2

    multiplications, where C is the number of classes and d is thedimension of the vectors. Step 3 requires n d2 multiplications,where n is the total number of samples. In step 4, calculationof inverse of requires nearly d3 by Gauss-Jordan elliminationrequires d(d 1)(4d + 1)/6 multiplications and d(3d 1)/2divisions. Multiplication of 1 and requires d3 multiplications.Calculating eigenvalues and eigenvectors by QR requires 25d3

    multiplications.

    Total number of multiplications is

    (26 + 2/3)d3 + (n + C 0.5)d2 1

    6d

    Total number of divisions is d(3d 1)/2.

    C. Discrete fourier transform

    Let (x1, x2,...,xn) be a vector of dimension d. Then if we want toobtain a d dimension vector by eliminating the higher frequencies,a Discrete fourier transform can be used. Pseudocode:

    1) initialize y1, y2,...,yd

    2) For t = 1 to d doyt = d

    j=1xt exp 2( k1

    d

    )(t 1)Number of multiplications for discrete fourier transform is dd.

    D. Discrete cosine transform

    Let (x1, x2,...,xn) be a vector of dimension d. Then if we want toobtain a d dimension vector by eliminating the higher frequencies,a Discrete cosine transform can be used. Pseudocode:

    1) initialize y1, y2,...,yd2) For t = 1 to d do

    yt = 0.5(x1 + (1)t1xd) +d1

    j=2 xt cos

    d1

    (t 1)(j 1)

    Number of multiplications for discrete fourier transform is dd.

    E. Central Moments

    If an image is two dimensional, for j and k being integers, letf(x,y) be the pixel value at a point (x,y) then the central moment

    j,k is given by:

    j,k =

    x,y

    (x x)(y y)f(x, y)x,y

    f(x, y)

    Number of multiplications=Number of pixels in the image.Number of divisions=1.

    F. Distance transform

    The map labels each pixel in the image to its nearest boundary

    pixel. If we take an n m pixels binary image, then nm searcheshave to be performed. Each search takes nm/d2 queries where d isthe reduced dimension. Complexity n2m2/d2 queries for a NMbinary image reduced to d dimensional vector.