3D-TSV 技術を組み込んで主流となるアプリケーションは何か How to make true...

3D-TSV技術を組み込んで主流となるアプリケーションは何か

How to make true 3D-TSV IC application

明星大学大塚寛治

Meisei UniversityCollaborative Research Center

Kanji Otsuka

2

過去、システムの性能向上の足を引っ張った真の犯人はだれか？

• 脳は計算素子であるニューロンは記憶素子も兼ねる、論理・メモリ共役システムである。

• それに対して、コンピュータは論理素子とメモリが別々にあり論理素子の高速化にメモリ情報量の入出時間が遅れて論理素子の足かせになっていた。

• 足かせになると解決策として論理のそばにメモリを置くキャッシュメモリシステムをはじめ、方式の複雑化が進展し、それを制御する論理がまた必要となってきた。あたかも本社機能が肥大化した大会社という図式である。

3Intel Developer Forum 2003 Springより

Intel also said this is big problem.

4日経マイクロデバイセズ 2009.4、 pp17-21

プロセッサチップの処理能力とデータ取り込み速度のギャップが大きくなって、システム性能向上を阻害している。

5

Intelのマイクロアーキテクチャの複雑化ほとんどはバンド幅不足の対策世界的に実装に金をかけることをしなかった結果としてのひずみである。実装分野の地位があまりにも低かったため、声が経営陣まで届かないこと、虐げられ情報入手ができなかった実装技術者の無知の 2点が原因であると断言する。

Intel Developer Forum 2003 Springより

Multi-Threaded, Multi-Core

Pentium4 and XeonTM Archtekuture with HTMulti-Treaded

Pentium 4 ArchitectureTrace Cache

Pentium Pro ArchitectureSpeculative Out of Order

Pentium ArchitectureSuper Scalar 命令並列処理時代

仕事単位並列時代

1000000

100000

10000

1000

100

10

1

MIP

S

1980 1985 1990 1995 2000 2005 2010

Intel社アーキテクチャの変化

マイクロコード並列時代

More than Moorのキーテクノロジー：　 3D-TSV

• 3D-TSVは 3Dであること、上記バンド幅の拡大が図れそうなことで期待感あふれている。

• 3D-TSVがいまや世界的な認識となって、プロセス開発、設計手法の開発、検査手法の開発がなされているが、それを有効に使用できるアプリケーションがあるのかをここで検証する

Kanji Otsuka, Meisei University

7

We still not find major application with TSV interconnection structure.

• As our recognition, the main figure of merit on TSV structure is avoiding from the 2-D restriction provided by 3-D interconnections.

• Is the figure of merit collect?

• We should again check the concept of this main figure of merit toward making major applications.

8

2D interconnection

TSV

Waste of active and 2D wiring area

Even if we chose the size of 2um dia.

Si substrate

1. TSV diameter: still very large for interconnection.



9

Current technology: 6 or 10 metal layer

TSV can provide approximately 2 more wiring layers prevented with wiring length prolong.

Si substrate

TSV

TSV would not get down with wiring limitation. TSV advantage is rather in 3D structure.

10

Si substrate

2. Trade-off issue between TSV aspect ratio and intrinsic gettering layer

Loss of intrinsic gettering layer from when wafer thickness is 50um or less.

TSV

Thinning edge IG Layer

In case of Via-last



11

Failure die

3. Difficult solving on Know-Good-Die issue at W2W, therefore needed redundancy implement


12

4. Difficulty in thermal issue on many stacking structure, then saving power required

Si substrate

TSV

Si substrate

TSV

Si substrate

TSV

integrated thermal energy


13

5. Effective function overcome cost issue

6. Other many restrictions under process and design technology: complexity increasing

13


14

Restriction and problem Task

(Red characters are focused now)

1.

Less area efficiency under wasting active and 2D wiring

Find function and performance beyond TSV area penalty

2.

Trade-off issue between TSV aspect ratio and loss of IG layer

Improvement process came into view now

3.

Difficulty on known-Good-Die Introduce W2C or C2C for production or made redundancy

4.

Thermal issue limitation ; the most important issue for 3D

Choosing power saving circuit and system

; need fundamental approach

5.

Cost issue limitation Effective function and performance overcome cost

6.

Complicate process and design methodology

Simple process and easy design algorithm

Summary of 3D-TSV restrictions


15

Several solutions have been announced. Trend seems to be still not enough now.

(1) Tile or small block array through TSV interconnection are good for memory or image sensor system with wide band interconnection by several thousand TSVs.

(2) Cache DRAM faces on CPU as providing large size cache with area saving.

(3) Stacked closed function block including FPGA and core makes to scalable system with redundancy.

(4) Using silicon interposer with TSVs gets higher performance of 2D wiring.

(5) Memory stacked module and many small core stacked module connect with diagnosis-restoration and dynamic reconfiguration wiring module. This is some of ideal system, however there is not any specified now.

MemoryDiagnostic-restorationMany core

Redundant memory

Core CPU or bus controller

MemoryFPGACore

FPGASi interposer

16

Active areaTSV and interconnection pad

世の中の例では、すべてタイルまたはブロック上アクティブエリアの周辺に TSV

アクティブエリアに TSV配列はありえない

17

I/O interface

DRAMs

Crossbar switch and cache memory

Many core processor

Active areaTSV

LCR embedded interposer for higher frequency I/O signal and power distribution

Total stacked chip thickness up to 0.8mm

Die size: 10mm

Via pitch: 20μm

Attached heat-sink

Maximum wiring length within lumped model handling up to 500MHz=15mm

バンド幅を向上させるプロセッサ構造の一例

I/O interface chip handled up to 5GHz

Clock frequency: 500MHz

18

I/O interface

DRAMs

Crossbar switch and cache memory

Many core processor

Active areaTSV

LCR embedded interposer for higher frequency I/O signal and power distribution

Total stacked chip thickness up to 1.0mm

Die size: 10mm

Via pitch: 20μm

Attached heat-sink

メニーコアがマルチチップになった例


19

Active area TSV and interconnection padVia pitch: 20um

Via diameter: 10um

Number of vias: 10,000

Via shared area of 10mm square chip: 4.4%

Ratio of signal pin vs power/ground: 5,000 vs 5,000


チップレイアウトの寸法一例

信号のバンド幅はピン数と周波数とデータレートで次のように計算される。 :(# of signal pins) x (clock frequency) x (data rate) = 5,000 x 500 x 109 x 2 = 5Tbps Band width of current high end bus = up to 10GHz

20

I/Oインターフェース

LCRエンベッデッド分布定数配線と電源配線を有するインターポーザ

10mm

差動伝送線路

デカップリングキャパシタ

低インピーダンス電源・グランドペア伝送線路

LCRイコーライザ

I/Oドライバより特定チップ分配電源

差動信号伝送線路

インターポーザの断面構造一例

ドライバ・レシーバ間差動伝送線路の直流抵抗 2Ω 以内、最大配線長400mm、 500MHzではイコーライザ不要、 2Ω 以上かさらに高い周波数では（例えば 6.4Gbps ）イコーライザ必要です。このときのレシーバゲインは高感度に変更する必要があります。電源・グランドペアの一チップ当たり特性インピーダンスは 0.2Ω 以下になるように、ビアを含めて配線引回しを行います。

21

5Tbpsはすごい• 現状の最高バンド幅と比較すると、 5Tbps/10Gbps=500• 現状のコアの性能を 10Gbpsに相当するものとすると 500コアで構

成されたシステムを作れる。• 3D-SiP構造のみにこの解があることがわかる。• どんな装置にも使え強靭で柔軟なシステムが出来上がる。• KGDはそのシステムの冗長性から必ずしも必要でないが、 KGDができたらもっと素晴らしい。

• しかし、アーキテクチャの単純化ができない。• I/Oドライバのドライバビリティ削減で大幅に小電力となる。しかし、

３ D で電力集中が起き、放熱問題は今のところ解決できない。• ここまで性能の高い用途は限定され、 TSVの一般普及の救世主になるとは言えない。

• 簡単な方法論で簡単なシステムは出来ないか？


22

Small number of TSVs in each tile or small block would make most effective structure.

However, different function of tile would have different size and different connection requirement. Therefore it could not produce to efficient stack-up and interconnection.

Naturally, an idea can be created as unified circuit in whole of system. Then we can make the tile structure efficiently.

Neuron of our brain is unified function conjugated with logical processing and memory. Can we make such circuit by CMOS unit gate? Neuron and axon network


23

Array of mat

Logic

Cache surrounded the logic

Increasing and decreasing depend on cache hit ratio

Adding cache by new generated logic

When job capacity increasing

Expanding Logic


Multi task with shared cache

Dynamic reconfiguration algorism by unified function block

Efficient communication between neighbor block with

high band width and high processing rate


24

For memory For logic

Unified circuit! Easy to make as following configuration.SRAM can change to any function even wiring connection.

Changed by mode selector


25

FPGA○ Logic block: LUT (SRAM) and simple logic with relative small driver○ Switching block: FF+switch○ Connecting block: wiringAbove is not true unified block that is composed by primitive logic and additional memory (both are of hard structure)

Toward unified circuit (before slide)○ Logic block: SRAM with mode selector○ Memory block: SRAM with mode selector　　　○ Switching block： SRAM○ basic cell connection (wiring): SRAM　　Unified ! However poor efficiency on switching block and wiring by SRAMThen, arrange optimum basic cell size and cluster size ○ Logic block: SRAM with mode selector with relative small driver○ Memory block: SRAM with mode selector with relative small driver 　　　○ Cluster connection: bus with driver (through TSV)　　

Logic BlockConnecting Block

I/O

FPGA’s Basic Cell

Switching Block

ＦＦ0： off1： on

10

0

0

0 0

CIN

BX

B1B2B3B4B5B6

6-LUT

MUX5-LUT

5-LUT

FF BQ

B

BMUX

COUT

LUT architecture of Xilinx Virtex-5

Unified like algorithm is already current in FPGAs.


26

Now I introduce our memory-logic conjugate system

Meisei UniversityYoichi Sato

Kanji OtsukaHitachi ULSI Systems

Masahiro Yoshida

SRAM based 8bit Processor An application

of Memory-Logic Conjugate System (MLCS)

in Smallest model


27

The Outlook of the Memory - Logic Conjugate System(MLCS)

1. Solving the problem of low band width between memories and logics.

(because of memory to be logic itself)2. Effective architecture: dynamic reconfiguration can done

by only rewriting register. (because of memory to be logic itself)

3. High speed operation: miscellaneous registers in a basic cell can be used by dynamic reconfiguration. (a basic cell itself can be programmable)

4. Suitable for 3D-TSV assembly and scalable made by small block configuration.

5. Low power: no need I/O circuits between Logic circuits and SRAMs. And access path can be saved.


28

： Control signal　　（ 1bit each ）

： address, data　（ 4bit each ）

Control bus （ CY etc ）

（ 4bit×4 ）

（ 4bit×4 ）

（ 4bit×4 ）

Structure of Basic Cell

： Outputs of RouteConfiguration register or Mode register： reconfiguration bus (4bit each)

Simple operation can be programmable by using rich internal registers.Bus wiring can be routing on the memory area (about 70%), which can save area.

Sub control bus　（ 8bit ）

SRAM （ LUT ）

256W×8bit

R/W

CK

CE

DIN

D

Ch. set register

ADD （ Write)

Input control circuit（ mode change control & channel control ）

Output control circuit（ register, switch, etcControl)

(4bit REG x8)

Mode set register

ADD

（ 4bit×4 ）

： write command bus

（ 4bit×4 ）

（ 4bit×4 ）

（ 4bit×2 ）

（ 4bit×2 ）

（ 4bit×4 ）


29

Operation mode

Through Access mode (= initial mode)

System mode

Arithmetic operation mode

Combinational Circuit mode

Internal memory mode

External memory mode

S/R=“L”　（ reset mode ）

S/R=“H”

Memorymode

Logic mode

External memory mode

Logic library mode (Macro-cell)

Operation mode of basic cell (Memory-logic conjugate cell)

Route Configuration Register Mode (making LUT)

Information Update mode for Route Configuration Register

Route Configuration modeby Mode Register

Route ConfigurationRegister Mode (making LUT for dynamic reconfiguration)

Rich operation modes can construct flexible and variable systems.

For dynamic reconfiguration


30

・・・・・・

・・・・・・

m rows

n columns

・・・・・・

・・・・・・

Basic Cell Array

Other Systems (including Cluster memory)Other Systems (including Cluster memory)

8 bitq bit

Memory address of B.C.

Extension address

（ address space of Cluster memory ）

Addresses

Clk + Control signal

Data( 8 bit×n )

Multiple bus

Basic CellArray

decoders

Control Circuit　　 +Bus I/F

C Ｘ

C ＹCluster memory

Memory – Logic Conjugate System (MLCS):Total system including some Cluster memories

Basic Cell

Outlook of MLCS structureSome size of cluster allocation matches to operation and logic density.


31

Actual design of four basic cell configuration

Four basic cell Area for TSVs

Memory (SRAM) for testing

256W x 8bit x 4cell


32

● Area is about 330X330um2 @90nm process (One Cluster)

X

Y00 01 10 11

11

10

01

00

Program memory(512w×8b)

Logical judgment circuit

Instruction decoder

Reserve part

（ decoder control ） Basic cell

Basic cell array

shifter(8bit ）

decoder

（ Note ）　　（１） Program counter： 16bit　　　． 2-cycle operation in case of overflow in address operation　　　． 1-cycle operation (without overflow)　　　　（ by using 8bit ALU ）

　　（２） structure of 8bit ALU　　　． To enable 2-cycle 16bit addition, new type of adder with carry code input is introduced (which uses 4 Basic Cells).

Cluster memory layout example in single 8 bit ALU

PC Adder & 8bit ALUs (one resource shared)

32

33

Operation speed of processor mode

Area consumption on the same logic with different peripheral circuit　 Area Pure logic MLCS FPGA

Ratio

　 : constant size with some allowance design

　 : dynamic size with minimum design

Performance comparison between pure logic and MLCS

Power Pure logic MLCS FPGA

Relative ratio １ 2 20

Power consumption on the same logic with one thread

Band frequency

Pure logic**

(8/32bit)

MLCS (8bit) MLCS (32bit)

Non-parallel

Four parallel*

Non-parallel

Four parallel*

Maximum 4GHz 1GHz 4GHz 1GHz 4GHz

Mean rate ? （ 1GHz ）

（ 3GHz ）

(1GHz) (4GHz)Note: *Incase of 50% independency between four threads　　　 **One thread in pure logic that is superior than the SRAM based

MLCS

,,

1 7 30

,

Pure logic would be the best for processing, however MLCS can operate dynamic reconfiguration mode and memory function.

Four multi-thread processing Program command + data

Rearrangement



34

Configuring from cluster to mat structure controlled by synchronous clock

Basic Cell Array=Cluster

decodersd

eco

der

sControl Circuit

decoders

dec

od

ers

Control Circuit

decodersControl Circuit

decoders Control Circuit

Cluster memory

Space for wiring and TSVs connecting between clusters in a mat

dec

od

ers

dec

od

ers

A mat（ unit processor element ）

Position of clock supply





35Clock synchronous cube, we said Mat

cluster

Sub-Processor

Master clock ; asynchronous on mat-to-mat

Dynamic access by asynchronous clock on mat-to-mat with dynamic reconfiguration

Hit signal from neighbor mat by the header of a packet

Clock timing image for synchronous and asynchronous


36

Array of mat

Logic


Increasing and decreasing depend on cache hit ratio

Adding cache by new generated logic

When job capacity increasing

Expanding Logic


Multi task with shared cache

Of course, mat itself can dynamically set number of registers depend on requirement.Mat also can include penetrated caches inside.

Dynamic reconfiguration algorism

Adjacent addressing can save the latency within 1clock within synchronous cube


Memory structured LUT presented by Masayuki Sato, RECONF Symposium 2006.9

Other approach in technical papers.

One idea introduce as half quadrate interconnection memory based logic circuit in random array, however still memories are consumed for interconnection / switching. Rearrangement of unit tile is developing now by Mr. Sato and Prof. Hironaka from Hiroshima City University.

37


38

Next significant issue is power saving.Is there drastic power saving method?

Yes we have one idea.

2

2

1, mvKmvI

start stopRadiation of heat


39

Physics of power consumption

Power consumption on unit circuit

2)(2

1]W[

)(]C[

ddILT

ddILT

VCCCP

VCCCQ

VoltageCurrent

0

Current to waste

sumondis

sumonch

sumonddf

sumonddr

on

dd

CR

tii

CR

tii

CR

tVv

CR

tVv

R

Vi

exp1,exp

exp,exp1

maxmax

max

We should recover it.

CI

RIRon

CL

CTOn current

Off current

RC Delay circuit

2

2

1, mvKmvI start stop

Radiation of heat


40K computer, performance : 10PFLOPS, Largest computer in the world at now

Power supply building

Huge power!!


41

2

2

1, mvKmvI

Sports EV

Discharge

Charge by brake

battery

One of solution can be found on electric motor car operation.

DS

G

P-type

N-type N-type

Active carriers on conduction band

DS

G

P-type

N-type N-type

Diffusing and shifting to valence band

0V

association

Generating heat

Vacancy layer

However, transistor can not recover the active carrier energy, we all would think. Is that true?


42

Input characteristic impedance Z0=100Ω

Output characteristic impedanceZ0=100Ω

Differential MOS’s in the same well

11.5um

Drain

Source

Gate

Differential pair

Space1um 2um

5um

4.3um

7.2um

Recovering signal energy method: Active carrier reused on differential CMOS circuit

Key structure is that differential MOS

transistors are positioned in the same well.


43

P n+ P

N-Well

P_SUB

N p+ N N

IN-PositiveIN-Negative

+

-

+

VRF

INP

INN

OUTN

OUTP

VDD VDD VDDVDD

Recovering signal energy method: Active carrier reused on differential CMOS I/O Driver

P-Well

Input ESD Output ESDInverter

Current control

Arrangement differential transistors in the same well

P NP


45ESD Inverter ESD

Unit cell ray-out configuration

Active carrier reused model

0V

Transient inversion region

1

0.5

oxCC /

F

sAs

sox

KqNC

CCC

4,

11 0min

1

minmin

G

sA

oxV

xKqN

KC

C

20

0202

1

1

oxCC oxCC

GV

Capacitance profile depending on bias in nMOS transistor


47


48

Forced releasing carrier by capacitance change

Moving free carrier to other

capacitance by voltage sink

TransientInitial

Discharge limiting inductance at carrier

rejection through source or drain

After inversion

Paired switch in same well

Set condition is as mobility of hole=4×102[cm2/Vs] at 300k in carrier density 1014 ～ 1015[cm-3], and Vdd=1.8V. Then drift speed

D=7.2×102 [cm2/s] is counted. When carrier traveling length is 10μm, 0.001cm=√Dt=√2×102 ・ t is derived, thus t=1.3×10-

9s=1.3ns is given comparing with longer time for our object rise time of pulse 100ps (3GHz equivalent). But electron travel time is 130ps that is our order of rise time.


49

Carrier reuse driver chip


50

Power current measurement from the voltage drop at 4.7ohm series resistance.

Substrate wiring length for differential output;

8mm Z0=100Ω

IC chip

Z0=100ohm

Z0=100ohm0.25mm length

Differential input

Flip chip bonding

Terminator 100ohm

Differential probing

R for current measurement

Cip=0.47pF

Cip=0.47pF

Cin=0.45pF Cin=0.45pF

Cwel=1.56pF“0.18um node” conventional

CMOS process

Vdd

0

2

4

6

8

0.001 0.01 0.1 110

Cu

rren

t[m

A]

Calculation current by cap.

Ohmic current

Current at Vdd 1.8V

Depressed swing height region

Vdd

Differential inverter current depending on frequency

0

2

4

6

8

10

12

14

0.001 0.01 0.1 110

Frequency [GHz]

Cu

rren

t [m

A]

Calculation current by cap.

Ohmic current

Current at Vdd 1.8V

Depressed swing height region

Reduction!!

We can save the power by carrier reused circuit.

DC current by current control transistors and

clumping drivers on others


51

terminationProbe point

4mm

Random pulse eye pattern shows high speed even in 0.18um process node.

FR-4 substrate： transmission line ＝ 100Ω ESDZ=50ΩVCC ＝ 1.8V termination ＝ 100Ω、 input swing1.8V

8Gbps 9Gbps 10Gbps

11Gbps 12Gbps


52

More effective carrier reuse circuit structure is in double gate Fin type.

Drain １ Gate １

Source ２

Source １

Drain ２ Gate ２

Insulating layer

Gatedrain

source


53

Device Function Initial / Carrier reuse Power saving ratio

(1) Pure logic ALU 　　 15 to 30 %Peripheral

I/O

(2) DRAM memory mat 　　 10 to 30 %Addressing

I/O

(3) SRAM Memory mat

25 to 45 %Addressing

I/O

(4) MLCS with small cell

M/L mat　　　 30 to 50 %Addressing

I/O

Relative power consumption level

Power saving image in each device used by carrier reuse transistor circuit

Applicable on all differential circuit

Less than SRAM due to small cell


54

Previous listed task Solution

1.

Find function and performance beyond TSV area penalty

Tile or small block array structure through TSV interconnection

3.

Made redundancy Unified circuit such as memory-logic conjugation system

4.

Choosing power saving circuit and system

Carrier reuse transistor circuit

5.

Effective function and performance turning over cost

Unified circuit such as memory-logic conjugation system

6.

Easy design algorithm Unified circuit such as memory-logic conjugation system

Summary for a solution

As like my presentation example, more fundamental physics and algorithm concept should be developed for 3D

structure with TSVs.

SEMI Taiwan 2011の SiP Global Summit 2011, 3D IC Technology

Forum• 3D-TSVのその他の問題の動きの一部を

紹介する。

Mr. Victor Peng, Senior Vice President, Programmable Platforms Development, Xilinx, Inc.● Leads a global team responsible for development and delivery of programmable platforms including FPGA silicon, software, IP, and boards● Served as Corporate VP of the Graphics Products Group (GPG) silicon engineering with AMD ● Held key engineering leadership roles at TZero Technologies, MIPS Technologies, SGI, and Digital Equipment Corporation● BSEE from Rensselaer Polytechnic Institute & a ME in EE from Cornell University and holds four US patents

Keynote Speech - Realizing a Two Million Logic Cell 28nm FPGA with Stacked Silicon Interconnect Technology Mr. Victor Peng, Senior Vice President, Programmable Platforms Development, Xilinx, Inc. FPGA logic capacity is doubling with successive process technology nodes and has enabled systems on chip (SOC) of greater complexity to be implemented using FPGAs. The industry’s need for greater integration at the 28nm node and beyond continues and indeed is increasing for many applications. This talk will outline how Xilinx is realizing a 2M logic cell 28nm hi-k metal gate FPGA product using Stacked Silicon Interconnect (SSI) technology. SSI technology utilizes micro-bump and Through-Silicon Via (TSV) technologies, with multiple active die on a passive interposer to enable integration beyond what’s possible with monolithic die or Multi-Chip Modules (MCMs). An overview of the design, technology, and supply chain issues will be given, as well as potential future directions.

講演で隣に座ることになり、 Test Couponであるが、一番最新の試作品を見せてもらった。発表内容の主なものはその Couponであり、 Siインターポーザの魅力はTSV-3Dのいろいろな制限事項、特に放熱と TSV 応力の問題を回避でき、 Side-by-sideレイアウトのため、隣接チップの配線が短くでき、スピードと消費電力（動作時で 0.8、 Stand byで 0.5 ）の改善ができる。 FPGAは隣接チップの関連付けでよいことからの利点である。45mm×45mmのサイズで、マイクロバンプは 200kバンプ（ 45μ ｍピッチ）があり、8 層の配線である。 1チップ当たりの隣接チップとの I/Oと思われるが10kConnectionである。 FPGAの配線ネックをこれで解消できているように思える。チップ内タイルの中央に配線層がありこの部分で接続している。 2GBのメモリを含んでいる。 1ns latencyであり、１ Gbpsとなるが、 PLL 精度が 6.59psであり、これがデバイスジッター精度とすると、総合精度は 30psレベルであり、 300ps パルス（ 3GH ｚ、 6Gbps ）も動作できるのではないかと思われるが、ヘビーな PLLを使ってチップ関タイミングを調整する必要が欠点である。BGAは一回り大きなもので（ 50×50mm？）、見せてもらった Couponの裏側は全面バンプが多ピンバンプのピッチ程度のレベルで配置されていた。 300GBはメモしているが何か不明。

FPGAであるからこそできる高性能 Siインターポーザであり、 TSMCでチップは作り、 Amkorで Siインターポーザを ASEで組み立てを行っている。欧米 System houseが韓国、台湾に依頼して作るという図の典型である。

Mr. Takayuki Watanabe, VP of TSV Packaging Development Group, TD Office, Elpida Memory,Inc. ●TSV Pj. ,Elpida Memory ●BEOL, Akita Elpida Memory ●Semiconductor Memory, NEC●Speaker of ICEP 2009

TSV Technology for 3D DRAMMr. Takayuki Watanabe, VP of TSV Packaging Development Group, TD Office, Elpida ● To approach different die function and design of Wide I/O DRAM for mobile application and computing one.● To be possible of interposer for 2.5D to interconnect between processor and stacked TSV DRAMs.● To take DfX approach as DfT, DfM etc on die design to support TSV improvement as stacked KGD DRAM yield.● To clarify and establish mutual triangle relation via final quality assurance between customer, processor and DRAM vendor.● To standardize TSV pin/array function and location of DRAM at first and then several miscellaneous items.● To be potential of creating new TSV foundry business in case of Via Last process.

Elpedaは DRAMの生き延びるすべは画像など、処理速度向上に対する Bandwidthの拡大に徹する方向であるように思われた。その一環としてワイドバンドの通信をTSVで進める多くのアプローチを紹介していた。システム的なアプローチが多く、「 I/Oの高速化はシステムとコラボレーションが欠かせない」と考えている。 DDR3 以降の標準化もその一環として応援団を作るという方向で、坂本社長の強い意志が感じられる。PTIや OSATの UMCなどとのコラボで TSV-3Dを進めるだけでなく、秋田エルピーダを含めた IDMの力も強めている。 TSV方向は TSMCも悩んでいる OSATやそのプロセスの広がりをどう処理するか、 IDMの魅力を漏らしていたように TSVのプロダクトを多く作っていることで、 Elpedaは IDMに強みが出ると感じている。多くの TSVアプリケーションで、それぞれメモしたが、断片的で意味を付けた書きものにできなかった。一つ言えることは TSVを使用してワイドバンドで、周波数を落として、消費電力を減らすという方向である。電力低減で 3Dにもできる。したがって TSV 製品化に世界で一番熱心な会社と言える。

Dr. C.H. Yu, Sr. Director of Integrated Interconnect and Package Division, R&D, TSMC• Establish TSMC Cu/low-K technology. Deliver the first Cu/FSG, Cu/Low-k(K=3.0) in industry at 0.13 micron node, first Cu/ELK(K=2.6) at 45nm node to production. Invent and develop first Low-R/ELK technology for 28nm node. • Received National Outstanding Invention Award (CAITA), Outstanding Scientific and Technological Worker Award (National Science Council), Industrial Technology Achievement Award (MOEA), National Outstanding R&D Managers (CPMA), and Outstanding Engineer Award (Chinese Institute of Engineer). Awarded 250+ US patents with numerous publications in semiconductor areas.

• Ph.D. in Materials Sciences and Technology from Georgia Institute of Technology. Worked at AT&T Bell Labs. Currently in charge of the R&D of the Integration of Interconnect and Package Technology at TSMC including intra- and inter-chip interconnect, bumping/assembly and TSV/3D-IC technologies.

Paradigm Shift and Foundry IntegrationDr. C.H. Yu,Sr. Director of Integrated Interconnect and Package Division, R&D, TSMC In semiconductor world, there is a new paradigm shift from chip scaling to system scaling to meet the ever increasing electronic system demands in power saving, performance and functionality (including memory bandwidth) increase, form factor improvement and cost reduction. This shift is also triggered by the growing concerns for industry to sustain Moore’s Law.Through-Silicon-Via (TSV) is considered as the most promising system integration approach to enable the system scaling. It involves thin wafer processing and various forms of interconnections including both intra-/inter-chips and intra-/inter-packages interconnect. New supply chain issues arise from the complications of the thin wafer handling, known-good-die/package, and the ownership of overall long TSV process, etc. This presentation will discuss the approaches to resolve these issues.

3D-TSVを Foundry組み込みとしてまじめに検討している。 Strategic な話でFoundryと OSATを組み合わせないと成立しない。 IDMの利点が見え隠れする。商売の単位として、あらゆる Supply chainすなわち Logisticsが考えられる。技術も多様である。 Foundryが主体である状態は Via Middleであり、また、技術クリアしやすい Si Interposerの 2.5Dも含めて主導権を握ろうとしている意図が感じられた。すでに行っている問題で、 Crackと Chippingで信頼性を保持できないという現状ももらしていた。質疑で、 Logisticsに議論が白熱した。いろいろな方法論があり、どれも利点があるため、なんともいえない。いえることは司会者 ITRIの YJ Chanが締めくくったようにマーケットサイズが巨大であることから、それぞれの方法論をみんなが推し進めればよいということとなった。その後の昼食で Dr. C.H. Yuは私の話は New Conceptの提案で面白かった。私がマーケットを支配するまでは 10 年かかると思っている、との話に、Main streamになる提案であり、期待できるといってくれた。

日本の環境を利用できることが前提• 古い話であるが、「日本は IDM （ integrated device

manufacturer ）が多すぎ，ファブレスが少なすぎる」。経済産業省商務情報政策局情報通信機器課課長の福田秀敬氏は， 2004 年 7 月 12日に横浜で開かれた「 STARC/ASPLA共同フォーラム」の基調講演でこのように述べた。

• 3 次元インテグレーションを実現するためにはシステム全体を考えながら設計を始める必要がある。現在の水平分業体制ではだれがテストするのか誰は保証するのかといった問題があり、サブコン、ファンウンドリ、ファブレスが連携して 3D-ICを完成することは極めて難しく、 IDMにこそ一日の長がある。 IMEC, Eric Beyne、 IMEC/ARRM2007やTSMCの C.H.Yuの言。

• 利権が絡む問題は日本は弱く、現行プロトコルに支配されるシステムは手掛けるべきではない。システムを手掛ける以上新規プロトコルがキーとなるが・・・。

• 標準品で、大量生産で、ものつくり技術の優位性が発揮できるものが日本の得意分野→メモリ、画像インターフェース？、自動車センサー？

• 大塚の提案はいかがですか？

3D-TSV 技術を組み込んで主流となるアプリケーションは何か How to make true...

Documents

Transcript of 3D-TSV 技術を組み込んで主流となるアプリケーションは何か How to make true...