Improved data compression scheme for multi-scan designs

TSINGHUA SCIENCE AND TECHNOLOGY ISSN 1007-0214 16/49 pp89-94 Volume 12, Number S1, July 2007

Improved Data Compression Scheme for Multi-Scan Designs*

LIN Teng (林腾), FENG Jianhua (冯建华)**, Wang Yangyuan(王阳元)

Department of Microelectronics, Peking University, Beijing 100871, China

Abstract: This paper presents an improved test data compression scheme based on a combination of test

data compatibility and dictionary for multi-scan designs to reduce test data volume and thus test cost. The

proposed method includes two steps. First a drive bit matrix with less columns is generated by the compati-

bilities between the columns of the initial scan bit matrix, also the inverse compatibilities and the logic de-

pendencies between the columns of mid bit matrixes. Secondly a dictionary bit matrix with limited rows is

constructed, which has the properties that for each row of the drive bit matrix, a compatible row exists or can

be generated by XOR operation of multiple rows in the dictionary bit matrix and the total number of rows

used to compute all compatible rows is minimal. The rows in the dictionary matrix are encoded to further re-

duce the number of ATE channels and test data volume. Experimental results for the large ISCAS 89

benchmarks show that the proposed method significantly reduces test data volume for multi-scan designs.

Key words: test data compression; multi-scan design; test data volume

Introduction

As the feature size of technology shrinks, more and more transistors are integrated on a single chip, manu-facturing test cost accounts for a larger portion in the total product cost. One of the most important factors that drive test cost is the increasing amount of test data volume. ATE with larger channel storage ability and more channels will cost higher. And since multiple ATE reloads are time-consuming, for given types of ATE, larger test data volume will lead to either fewer devices that can be tested simultaneously or longer test application time and thus higher test cost. Therefore, several techniques have been proposed to reduce test data volume supported by commercial EDA tools.

Test data compression based on coding techniques,

including statistical coding [1], Golomb coding [2], FDR coding [3], EFDR coding[4], and Huffman coding [5], is based on the traditional data compression techniques. However, additional synchronization overhead will be typically induced by these techniques. And for multi-scan chain designs, the number of scan chains, which is desired to be large to reduce test application time (TAT), is limited for these technique [6], since one de-coder and one test channel source is needed for each scan chain. Thus it is hard to apply these techniques to multi-scan designs. The methods proposed recently in Refs. [7-10] aim at multi-scan designs, which try to re-duce test data volume by driving a large number of scan chains with a small number of ATE channels.

In this paper, we present an improved test data com-pression scheme for multi-scan designs to reduce test data volume and thus the test cost. In this scheme, no break cycle is inserted for ATE to stop transferring compressed test data to the decompressor on chip; therefore, synchronization overhead is eliminated compared with the compression schemes based on cod-ing techniques [6]. Moreover, since this scheme requires

﹡

﹡﹡

Received: 2007-02-01 Supported by the National Natural Science Foundation of China (Nos. 90207018 and 60576030) To whom correspondence should be addressed. E-mail: [email protected]; Tel: 86-10-62767302

Tsinghua Science and Technology, July 2007, 12(S1): 89-94

90

no structural information and no intrusion into ATPG process, it is applicable for multi-scan designs based on IP cores.

The rest of the paper is organized as follows. In Sec-tion 1, we present the proposed compression scheme. Section 2 describes the design of the decompression circuit. Experimental results and a comparison with some related work are presented in Section 3. Finally, we conclude the paper in Section 4.

1 Proposed Compression Scheme

A test data compression scheme of two steps is pre-sented for multi-scan designs. For a full scan design with Ns scan cells, the initial test set with Nt test pat-terns can be denoted as a Nt×Ns bit matrix T, where every element of the matrix is drawn from set S = {0, 1, X}. Assuming that the number of scan chains is set to be m, then to minimize test application time, the maximum length Ls of m scan chains should be

s /N m⎡ ⎤⎢ ⎥ . Or if the maximum length of scan chains Ls

is first determined, then to minimize the number of scan chains, m should be s s/N L⎡ ⎤⎢ ⎥ . If s s /L N m> , to

make the numbers of bits scanned into m scan chains balanced during scan-in mode, some dummy X bits should be scanned into shorter scan chains in the first or the last few cycles of each scan-in section. Then a new ( )s sN L m×i bit matrix SM representing the scan

slices can be constructed based on T and the partition scheme of scan chains. The i-th row of SM corre-sponds to the ( )s % -thi L scan slice of the

( )s/ 1 -thi L + test pattern, and the j-th column of SM

corresponds to all bits that are scanned into the j-th scan chain. For example, as shown in Fig. 1, the bit matrix TE denotes the initial 4 test patterns for a design with 25 scan cells. After constructing 9 scan chains based on the scan chain partition scheme, bit matrix SME can be generated with two X bits inserted for each test pattern. In our scheme, the initial test set will be compressed in two steps.

1.1 First step

The concept of compatibilities has been utilized in many compression schemes since it was first proposed in Ref. [11], while the concept of logic dependencies was first presented in Ref. [10]. In the first step of our

scheme, the compatibilities of SM and the inverse compatibilities and the logic dependencies of mid bit matrixes are exploited to generate a new drive bit matrix DIM with less columns of which each row will be an input slice of the scan chain driver cir-cuit. For bit matrix SM, the ith column and the jth col-umn are said to be compatible, iff for t s1 ,k N Li≤ ≤

{ }, 1, , 0,1ik jk k∉SM SM .

Fig. 1 The constructing of SME based on TE and scan chain partition scheme

Those columns, each two of which are compatible, can be merged into one single column by fixing Xs in the k-th row to 0(1) if there is any 0(1) in the k-th row of those columns ( t s1 k N Li≤ ≤ ). The bits in the sin-gle generated column can be scanned into the corre-sponding scan chains of those mutual compatible col-umns during scan section without loss of fault cover-age. Thus to reduce the number of columns mostly, we should find out the least columns that can drive all m scan chains based on the compatibilities between the columns of SM. This problem can be mapped into the vertex coloring problem in graph theory, which is stated as follows. An undirected graph G=< V, E > with |V | = m can be formed based on the compatibil-ities between the columns of SM. In graph G, vertex vi

corresponds to the i-th column of SM, and edge eij

connecting vi and vj belongs to E iff the i-th and the j-th columns are not compatible. After determining an op-timal vertex coloring of graph G, a minimum number of columns can be computed by merging those col-umns of which the corresponding vertexes in graph G are of the same color. Since the vertex coloring

LIN Teng (林腾) et al：Improved Data Comression Scheme for Multi-Scan Designs

91

problem is a well-known NP-completeness problem, the heuristic algorithm proposed in Ref. [12] is adopted to solve it. If vertexes in graph G are colored by

1m m≤ colors, then an ( )t s 1N L m×i array MME1

can be generated as the first mid bit matrix. In Fig. 2, the undirected graph GE is formed based on the com-patibilities between the columns of SME, and v4, v7 and v9 are same colored while the remained vertexes are of different colors. Based on the coloring result, a new bit matrix MME1 is generated by merging the 4th, the 7th and the 9th columns into the 4th column.

Fig. 2 The compatibility graph GE and the merged mid bit matrix MME1 for SME

The inverse compatibilities of columns are then ex-ploited to further narrow the mid bit matrix. For MME1, the i-th column and the j-th column are said to be inverse compatible, iff for t s1 ,k N Li≤ ≤

{ }1, 1,, 1,1 , 0,0 .ik jk ∉MME MME And two inverse

compatible columns can be merged into one single column. But different from what is done to compatible columns, we have to select either one of the two inverse compatible columns as the basic column. Suppose that the i-th column is the basic column, then when merging the i-th and the j-th columns, for t s1 ,k N Li≤ ≤ MME1,ik is set to 0(1) iff

{ }1, 1,, 1,1 , 0,0ik jk ∉MME MME , and the j-th col-

umn is deleted from MME1. In our scheme the first one in the two inverse compatible columns is chosen as the basic column, and the pseudocode to narrow MME1 into an ( )t s 2N L m×i mid bit matrix MME2 with 2 1m m≤ is shown in Appendix A. For the exam-ple of MME1 in Fig. 2, MME2 in Fig.3 can be gener-ated by merging the 3th and 5th columns, which are inversely compatible, based on the 3th column.

Fig. 3 The second bit matrix MME2, the drive bit ma-trix DIME and the scan chain drive circuit SDCE mapping DIME to SME

We then apply the concept of logic dependencies proposed in Ref. [10] to the columns of MME2. For one column of MME2, if a compatible column can be generated by logic operation to some other columns of MME2, then this column can be deleted from MME2. To limit the hardware overhead of the decoder and the computational complexity, only logic dependencies be-tween triple columns of the six two-input logic func-tions in {AND, NAND, OR, NOR, XOR, XNOR} are examined for the columns of MME2. The pseudocode presented in Ref. [10] is utilized in our scheme to de-termine the logic dependencies of MME2 so that some columns can be deleted from MME2 and some Xs in the remained columns may be evaluated as 0 or 1 to construct an ( )t sN L n×i bit matrix DIM with

2n m≤ , and the details are not discussed here due to space limitations. As shown in Fig. 3, MME2 is nar-rowed into DIME since the 1th column of MME2 can be achieved by applying XOR operation to the 2th and 6th columns.

In the first step, based on the compatibilities of SM, the inverse compatibilities of MME1 and the logic de-pendencies of MME2, we can get a scan chain driver circuit SDC with n inputs and m outputs and a ( )t sN L n×i bit matrix DIME of which each row is an

input slice of SDC to generate the corresponding row of SM. In the example, the circuit SDCE mapping DIME to SME is shown in Fig. 3.

1.2 Second step

For the ( )t sN L n×i bit matrix DIME which repre-

sents the input slices of the scan chain driver circuit of m scan chains, a corresponding dictionary bit matrix


92

DME is an dN n× bit matrix with d t sN N Li≤ . By applying the concept of compatibilities to the rows of DIME, DME can be constructed such that for each row of DME, there is at least one row of DME that is compatible to it. Then test data volume can be reduced by encoding all rows of DME into codes with

2 dlog N⎡ ⎤⎢ ⎥ bits. However the value of Nd depends on

the compatibilities between all rows of DIME, and to keep the decoder overhead low, Nd should be limited to an acceptable value, which on the other hand may make some rows of DIME incompatible with any row of DME. To fix this dilemma, we construct a diction-ary matrix DME with a limited Nd, which satisfies the following two properties. (1) For each row of DIME, a compatible row exists in the rows of DME or can be generated by XOR op-eration of multiple rows of DME. (2) The total number of rows in DME required to generate compatible rows for all rows in DIME is minimal. To construct such a dictionary matrix DME, the following two equations are utilized.

The bit matrix Z in the left side of Eq. (1) is an

r c× matrix while the first bit matrix in the right side which can be considered as the initial multiplier matrix is an r k× matrix and the second one which can be considered as the initial dictionary matrix for Z is a k × c bit matrix. And k could be any integer that is not smaller than c. In Eq. (2), P is an r × k bit matrix and Q is a k × c bit matrix.

For a ( )t sN L n×i bit matrix DIM, in order to con-

struct an initial dN n× dictionary matrix that satisfies the first property, each row of the corresponding initial multiplier matrix should contain at least one element that is 1. Therefore, according to Eq. (1), n is large enough for Nd if all rows of DIM contain at least one element that is 1, while if there are some rows of which no element is 1, then n + 1 is large enough for Nd. To make full use of the encoding ability of codes to increase the compression ratio, Nd is determined as

2log2 n⎡ ⎤⎢ ⎥ for the first situation and 2log ( 1)2 n+⎡ ⎤⎢ ⎥ for the second situation in our scheme. After determining the value of Nd, the initial dN n× dictionary matrix is constructed such that the uppermost n rows can be an n×n identity matrix while all elements in the remained rows are 0s. And the leftmost n columns of the initial ( )t s dN L N×i multiplier matrix is same to the columns

of DIME, while the remained elements are all Xs ex-cept that those elements in the n +1 column are set to 1 if no element in the corresponding rows of DIME is 1.

Since the number of 1s in the multiplier matrix MDME equals to the total number of rows in DME required to generate all rows of DIME which deter-mines the compressed test data volume, Eq. (2) is util-ized to reduce the number of 1s in the initial multiplier matrix by XOR operation between the columns of the initial multiplier matrix and also between the corre-sponding columns of the initial dictionary matrix. Be-fore XORing the ith column with the other columns in the initial multiplier bit matrix, we map all Xs in the i-th column to 0 or 1 at first. And then the i-th column is XORed with those columns in the initial multiplier matrix only if the number of 1s can be reduced. The pseudocode used in our scheme to compute the dic-tionary bit matrix DME and the corresponding multi-plier matrix MDME is shown in Appendix B. After determining the dictionary bit matrix DME, each row of DME is encoded into a code of 2 dlog N bits, and then a dictionary decoder circuit DDC can be synthe-sized based on the dictionary bit matrix and the encod-ing scheme. Since for each row of DIM which repre-sents an input slice for SDC, multiple rows of DME might be required to generate it by XOR operation to them, a prefix bit is added for each code to denote if the current code is the final code of a consecutive codes whose corresponding rows of DME are required to be XORed to generate an input slice.

LIN Teng (林腾) et al：Improved Data Comression Scheme for Multi-Scan Designs

93

In Fig. 4, the initial dictionary matrix and the initial multiplier matrix for bit matrix DIME is first con-structed as shown in the mid side of the equation. By applying the procedure to these two matrixes, we can get the final dictionary matrix DME and the corre-sponding multiplier matrix MDME in the right side of the equation. The number of 1s in the multiplier matrix is decreased from 26 to 13. If each row of DME is en-coded into the binary code of its index, then the test data stored in ATE channel memory is shown in Fig. 5, which has ( )213 1 log 5× + ⎡ ⎤⎢ ⎥ = 52 bits in total. Note

that two rows in DME are XORed to generate the low-est row of DIME, so the prefix of the second lowest row of ATE stored data is 0 and the prefix of the low-est row is 1.

Fig. 4 The constructing of the dictionary bit matrix DME and the corresponding MDME for DIME

2 Architecture of the Decompressor The architecture of the decompressor, as illustrated in Fig. 5, consists of a dictionary decoder circuit DDC, an XOR operation array and a scan chain driver circuit SDC. As mentioned before, DDC decodes the diction-ary codes, which is directly given by ATE channels, into the corresponding rows of DME and SDC drives the scan chains. The function of the decompressor, which is stated as follows, is simple. In every clock cycle, the prefix bit of the previous code which is stored in the ppb register determines the mode of the XOR operation array and whether a scan slice is scanned into the scan chain.

If ppb = 0, it means that the previous code is not the final one of continuous codes to generate an input slice of SDC, thus the current decoded dictionary entry is XORed with the previous results stored in the registers of the XOR operation array and the scanned signal is not valid.

Fig. 5 The ATE stored data for SME and the decom-pressor architecture

On the other hand, if ppb = 1, then the current code is the first one of continuous codes to generate an input slice of SDC, thus the current decoded dictionary entry is directly stored in the registers of the XOR operation array, and the output of SDC is scanned into the m scan chains since the current input is a valid input slice and the corresponding output of SDC is a valid scan slice.

3 Experimental Results

The proposed methodology was implemented in C lan-guage and applied to the seven largest ISCAS 89 benchmark circuits. The initial test sets in the experi-ments are generated by the Mintest ATPG program [13]. Table 1 shows the experimental results, where we compare our scheme to some previous works on the compressed test data volume. It is notable that our scheme outperforms 9C coding[9] for all circuits and LZ77[7] for all accessible results. The compressed test data volume is smaller than space/time compression scheme[10], EFDR coding[4] and test data mutation cod-ing[8] for all accessible results except s38584. Since the size of dictionary is limited and only one decompressor

Table 1 Comparison with previous works on |TE|

Circuit Proposed scheme

S/TC[11] EFDR[5] 9C[10] LZ77[8] TDM[9]

s5378 6592 14 220 11 419 11 487 12 769 N/As9234 16 312 30 144 21 250 19 279 19 429 N/As13207 14 368 20 988 29 992 27 737 N/A 16 913s15850 20 280 25 140 24 643 30 271 37 980 14 676s35932 1446 3969 5554 9458 22 496 11 298s38417 47 880 85 225 64 962 74 938 N/A 55 848s38584 74 862 57 120 73 853 85 674 N/A 47 886


94

is required for all scan chains, the hardware overhead is low for our scheme. Moreover, the numbers of ATE channels used to test the circuits are all smaller than 10 in our scheme, which can allow multi-site test on ATE to improve test throughput.

4 Conclusions

In this paper, we have presented a new test data com-pression scheme for multi-scan designs. The scheme requires no intrusion into ATPG process since it takes a precomputed test set as its input. Test data volume, test application time, and the number of ATE channels are reduced in our scheme with small scan chain drive cir-cuit and limited dictionary size. The experimental re-sults for ISCAS 89 benchmark demonstrate the effec-tiveness of the proposed scheme.

References

[1] Ichihara H, Ogava A, Inoue T, Tamura A. Dynamic test compression using statistical coding. In: Proceedings Asian Test Symposium. Kyoto, Japan, 2001:143-148.

[2] Chandra A, Chakrabarty K. System-on-a-chip test data compression and decompression architectures based on golomb codes. IEEE Trans. Computer-Aided Design, 2001, 20(3): 355-368.

[3] Chandra A, Chakrabarty K. Test data compression and test resource partitioning for system on-a-chip using fre-quency-directed run-length (FDR) codes. IEEE Trans. Comput., 2003, 52(12): 1076-1088.

[4] El-Maleh A, Al-Abaji R. Extended frequency-directed run-length codes with improved application to systems-on-a-chip test data compression. In: International Conference on Electronics, Circuits and Systems. Dubrovnik, Croatia, 2002: 449-452.

[5] Long Jieyi, Feng Jianhua, Zhu Lida, Xu Wenhua, Wang Xinan. A new test data compression/decompression scheme to reduce SOC test time. In: International Confer-ence on ASIC. Shanghai, China, 2005: 685-688.

[6] Gonciari P T, Al-Hashimi B, Nicolici N. Synchronization overhead in SOC compressed test. IEEE Trans. VLSI Syst., 2005, 13(1): 140-152.

[7] Wolff F G, Papachristou C. Multiscan-based test compres-sion and hardware decompression using LZ77. In: Pro-ceedings International Test Conference. Washington, 2002: 331-339.

[8] Reda S, Orailoglu A. Reducing test application time through test data mutation encoding. In: Proceedings De-sign, Automation and Test in Europe Conference. Paris, France, 2002: 331-339.

[9] Tehranipour M, Nourani M, Chakrabarty K. Nine-coded compression technique with application to reduced pin-count testing and flexible on-chip decompression. In: Pro-ceedings Design, Automation and Test in Europe Confer-ence. Paris, France, 2004: 1284-1289.

[10] Li L, Chakrabarty K, Kajihara S, Swaminathan S. Effi-cient space/time compression to reduce test data volume and testing time for IP cores. In: International Conference on VLSI Design. Kolkata, India, 2005: 53-58.

[11] Lee K J, Chen J J, Huang C H. Using a single input to support multiple scan chains. In: Proceedings International Conference on Computer-Aided Design. San Jose, Cali-fornia, USA, 1998: 74-78.

[12] Gonciari P T, Al-Hashimi B, Nicolici N. A column gen-eration approach for graph coloring. INFORMS J. Comput., 1996, 8(14): 344-353.

[13] Hamzaoglu I, Patel J H. Test set compaction algorithms for combinational circuits. In: Proceedings International Conference on Computer-Aided Design. San Jose, Cali-fornia, USA, 1998: 283-289.

Improved data compression scheme for multi-scan designs

Documents

Transcript of Improved data compression scheme for multi-scan designs