3D-TSV 技術を組み込んで主流となるアプリケーションは何か How to make true...
description
Transcript of 3D-TSV 技術を組み込んで主流となるアプリケーションは何か How to make true...
3D-TSV技術を組み込んで主流となるアプリケーションは何か
How to make true 3D-TSV IC application
明星大学大塚寛治
Meisei UniversityCollaborative Research Center
Kanji Otsuka
2
過去、システムの性能向上の足を引っ張った真の犯人はだれか?
• 脳は計算素子であるニューロンは記憶素子も兼ねる、論理・メモリ共役システムである。
• それに対して、コンピュータは論理素子とメモリが別々にあり論理素子の高速化にメモリ情報量の入出時間が遅れて論理素子の足かせになっていた。
• 足かせになると解決策として論理のそばにメモリを置くキャッシュメモリシステムをはじめ、方式の複雑化が進展し、それを制御する論理がまた必要となってきた。あたかも本社機能が肥大化した大会社という図式である。
3Intel Developer Forum 2003 Springより
Intel also said this is big problem.
4日経マイクロデバイセズ 2009.4、 pp17-21
プロセッサチップの処理能力とデータ取り込み速度のギャップが大きくなって、システム性能向上を阻害している。
5
Intelのマイクロアーキテクチャの複雑化ほとんどはバンド幅不足の対策世界的に実装に金をかけることをしなかった結果としてのひずみである。実装分野の地位があまりにも低かったため、声が経営陣まで届かないこと、虐げられ情報入手ができなかった実装技術者の無知の 2点が原因であると断言する。
Intel Developer Forum 2003 Springより
Multi-Threaded, Multi-Core
Pentium4 and XeonTM Archtekuture with HTMulti-Treaded
Pentium 4 ArchitectureTrace Cache
Pentium Pro ArchitectureSpeculative Out of Order
Pentium ArchitectureSuper Scalar 命令並列処理時代
仕事単位並列時代
1000000
100000
10000
1000
100
10
1
MIP
S
1980 1985 1990 1995 2000 2005 2010
Intel社アーキテクチャの変化
マイクロコード並列時代
More than Moorのキーテクノロジー: 3D-TSV
• 3D-TSVは 3Dであること、上記バンド幅の拡大が図れそうなことで期待感あふれている。
• 3D-TSVがいまや世界的な認識となって、プロセス開発、設計手法の開発、検査手法の開発がなされているが、それを有効に使用できるアプリケーションがあるのかをここで検証する
Kanji Otsuka, Meisei University
7
We still not find major application with TSV interconnection structure.
• As our recognition, the main figure of merit on TSV structure is avoiding from the 2-D restriction provided by 3-D interconnections.
• Is the figure of merit collect?
• We should again check the concept of this main figure of merit toward making major applications.
8
2D interconnection
TSV
Waste of active and 2D wiring area
Even if we chose the size of 2um dia.
Si substrate
1. TSV diameter: still very large for interconnection.
Kanji Otsuka, Meisei University
Kanji Otsuka, Meisei University
9
Current technology: 6 or 10 metal layer
TSV can provide approximately 2 more wiring layers prevented with wiring length prolong.
Si substrate
TSV
TSV would not get down with wiring limitation. TSV advantage is rather in 3D structure.
10
Si substrate
2. Trade-off issue between TSV aspect ratio and intrinsic gettering layer
Loss of intrinsic gettering layer from when wafer thickness is 50um or less.
TSV
Thinning edge IG Layer
In case of Via-last
Kanji Otsuka, Meisei University
Kanji Otsuka, Meisei University
11
Failure die
3. Difficult solving on Know-Good-Die issue at W2W, therefore needed redundancy implement
Kanji Otsuka, Meisei University
12
4. Difficulty in thermal issue on many stacking structure, then saving power required
Si substrate
TSV
Si substrate
TSV
Si substrate
TSV
integrated thermal energy
Kanji Otsuka, Meisei University
13
5. Effective function overcome cost issue
6. Other many restrictions under process and design technology: complexity increasing
13
Kanji Otsuka, Meisei University
14
Restriction and problem Task
(Red characters are focused now)
1.
Less area efficiency under wasting active and 2D wiring
Find function and performance beyond TSV area penalty
2.
Trade-off issue between TSV aspect ratio and loss of IG layer
Improvement process came into view now
3.
Difficulty on known-Good-Die Introduce W2C or C2C for production or made redundancy
4.
Thermal issue limitation ; the most important issue for 3D
Choosing power saving circuit and system
; need fundamental approach
5.
Cost issue limitation Effective function and performance overcome cost
6.
Complicate process and design methodology
Simple process and easy design algorithm
Summary of 3D-TSV restrictions
Kanji Otsuka, Meisei University
15
Several solutions have been announced. Trend seems to be still not enough now.
(1) Tile or small block array through TSV interconnection are good for memory or image sensor system with wide band interconnection by several thousand TSVs.
(2) Cache DRAM faces on CPU as providing large size cache with area saving.
(3) Stacked closed function block including FPGA and core makes to scalable system with redundancy.
(4) Using silicon interposer with TSVs gets higher performance of 2D wiring.
(5) Memory stacked module and many small core stacked module connect with diagnosis-restoration and dynamic reconfiguration wiring module. This is some of ideal system, however there is not any specified now.
MemoryDiagnostic-restorationMany core
Redundant memory
Core CPU or bus controller
MemoryFPGACore
FPGASi interposer
16
Active areaTSV and interconnection pad
世の中の例では、すべてタイルまたはブロック上アクティブエリアの周辺に TSV
アクティブエリアに TSV配列はありえない
17
I/O interface
DRAMs
Crossbar switch and cache memory
Many core processor
Active areaTSV
LCR embedded interposer for higher frequency I/O signal and power distribution
Total stacked chip thickness up to 0.8mm
Die size: 10mm
Via pitch: 20μm
Attached heat-sink
Maximum wiring length within lumped model handling up to 500MHz=15mm
バンド幅を向上させるプロセッサ構造の一例
I/O interface chip handled up to 5GHz
Clock frequency: 500MHz
18
I/O interface
DRAMs
Crossbar switch and cache memory
Many core processor
Active areaTSV
LCR embedded interposer for higher frequency I/O signal and power distribution
Total stacked chip thickness up to 1.0mm
Die size: 10mm
Via pitch: 20μm
Attached heat-sink
メニーコアがマルチチップになった例
Clock frequency: 500MHz
19
Active area TSV and interconnection padVia pitch: 20um
Via diameter: 10um
Number of vias: 10,000
Via shared area of 10mm square chip: 4.4%
Ratio of signal pin vs power/ground: 5,000 vs 5,000
Clock frequency: 500MHz
チップレイアウトの寸法一例
信号のバンド幅はピン数と周波数とデータレートで次のように計算される。 :(# of signal pins) x (clock frequency) x (data rate) = 5,000 x 500 x 109 x 2 = 5Tbps Band width of current high end bus = up to 10GHz
20
I/Oインターフェース
LCRエンベッデッド分布定数配線と電源配線を有するインターポーザ
10mm
差動伝送線路
デカップリングキャパシタ
低インピーダンス電源・グランドペア伝送線路
LCRイコーライザ
I/Oドライバより特定チップ分配電源
差動信号伝送線路
インターポーザの断面構造一例
ドライバ・レシーバ間差動伝送線路の直流抵抗 2Ω 以内、最大配線長400mm、 500MHzではイコーライザ不要、 2Ω 以上かさらに高い周波数では(例えば 6.4Gbps )イコーライザ必要です。このときのレシーバゲインは高感度に変更する必要があります。電源・グランドペアの一チップ当たり特性インピーダンスは 0.2Ω 以下になるように、ビアを含めて配線引回しを行います。
21
5Tbpsはすごい• 現状の最高バンド幅と比較すると、 5Tbps/10Gbps=500• 現状のコアの性能を 10Gbpsに相当するものとすると 500コアで構
成されたシステムを作れる。• 3D-SiP構造のみにこの解があることがわかる。• どんな装置にも使え強靭で柔軟なシステムが出来上がる。• KGDはそのシステムの冗長性から必ずしも必要でないが、 KGDができたらもっと素晴らしい。
• しかし、アーキテクチャの単純化ができない。• I/Oドライバのドライバビリティ削減で大幅に小電力となる。しかし、
3 D で電力集中が起き、放熱問題は今のところ解決できない。• ここまで性能の高い用途は限定され、 TSVの一般普及の救世主になるとは言えない。
• 簡単な方法論で簡単なシステムは出来ないか?
Kanji Otsuka, Meisei University
22
Small number of TSVs in each tile or small block would make most effective structure.
However, different function of tile would have different size and different connection requirement. Therefore it could not produce to efficient stack-up and interconnection.
Naturally, an idea can be created as unified circuit in whole of system. Then we can make the tile structure efficiently.
Neuron of our brain is unified function conjugated with logical processing and memory. Can we make such circuit by CMOS unit gate? Neuron and axon network
Kanji Otsuka, Meisei University
23
Array of mat
Logic
Cache surrounded the logic
Increasing and decreasing depend on cache hit ratio
Adding cache by new generated logic
When job capacity increasing
Expanding Logic
Cache surrounded the logic
Multi task with shared cache
Dynamic reconfiguration algorism by unified function block
Efficient communication between neighbor block with
high band width and high processing rate
Kanji Otsuka, Meisei University
24
For memory For logic
Unified circuit! Easy to make as following configuration.SRAM can change to any function even wiring connection.
Changed by mode selector
Kanji Otsuka, Meisei University
25
FPGA○ Logic block: LUT (SRAM) and simple logic with relative small driver○ Switching block: FF+switch○ Connecting block: wiringAbove is not true unified block that is composed by primitive logic and additional memory (both are of hard structure)
Toward unified circuit (before slide)○ Logic block: SRAM with mode selector○ Memory block: SRAM with mode selector ○ Switching block: SRAM○ basic cell connection (wiring): SRAM Unified ! However poor efficiency on switching block and wiring by SRAMThen, arrange optimum basic cell size and cluster size ○ Logic block: SRAM with mode selector with relative small driver○ Memory block: SRAM with mode selector with relative small driver ○ Cluster connection: bus with driver (through TSV)
Logic BlockConnecting Block
I/O
FPGA’s Basic Cell
Switching Block
FF0: off1: on
10
0
0
0 0
CIN
BX
B1B2B3B4B5B6
6-LUT
MUX5-LUT
5-LUT
FF BQ
B
BMUX
COUT
LUT architecture of Xilinx Virtex-5
Unified like algorithm is already current in FPGAs.
Kanji Otsuka, Meisei University
26
Now I introduce our memory-logic conjugate system
Meisei UniversityYoichi Sato
Kanji OtsukaHitachi ULSI Systems
Masahiro Yoshida
SRAM based 8bit Processor An application
of Memory-Logic Conjugate System (MLCS)
in Smallest model
Kanji Otsuka, Meisei University
27
The Outlook of the Memory - Logic Conjugate System(MLCS)
1. Solving the problem of low band width between memories and logics.
(because of memory to be logic itself)2. Effective architecture: dynamic reconfiguration can done
by only rewriting register. (because of memory to be logic itself)
3. High speed operation: miscellaneous registers in a basic cell can be used by dynamic reconfiguration. (a basic cell itself can be programmable)
4. Suitable for 3D-TSV assembly and scalable made by small block configuration.
5. Low power: no need I/O circuits between Logic circuits and SRAMs. And access path can be saved.
Kanji Otsuka, Meisei University
28
: Control signal ( 1bit each )
: address, data ( 4bit each )
Control bus ( CY etc )
( 4bit×4 )
( 4bit×4 )
( 4bit×4 )
Structure of Basic Cell
: Outputs of RouteConfiguration register or Mode register: reconfiguration bus (4bit each)
Simple operation can be programmable by using rich internal registers.Bus wiring can be routing on the memory area (about 70%), which can save area.
Sub control bus ( 8bit )
SRAM ( LUT )
256W×8bit
R/W
CK
CE
DIN
D
Ch. set register
ADD ( Write)
Input control circuit( mode change control & channel control )
Output control circuit( register, switch, etcControl)
(4bit REG x8)
Mode set register
ADD
( 4bit×4 )
: write command bus
( 4bit×4 )
( 4bit×4 )
( 4bit×2 )
( 4bit×2 )
( 4bit×4 )
Kanji Otsuka, Meisei University
29
Operation mode
Through Access mode (= initial mode)
System mode
Arithmetic operation mode
Combinational Circuit mode
Internal memory mode
External memory mode
S/R=“L” ( reset mode )
S/R=“H”
Memorymode
Logic mode
External memory mode
Logic library mode (Macro-cell)
Operation mode of basic cell (Memory-logic conjugate cell)
Route Configuration Register Mode (making LUT)
Information Update mode for Route Configuration Register
Route Configuration modeby Mode Register
Route ConfigurationRegister Mode (making LUT for dynamic reconfiguration)
Rich operation modes can construct flexible and variable systems.
For dynamic reconfiguration
Kanji Otsuka, Meisei University
30
・・・・・・
・・・・・・
m rows
n columns
・・・・・・
・・・・・・
Basic Cell Array
Other Systems (including Cluster memory)Other Systems (including Cluster memory)
8 bitq bit
Memory address of B.C.
Extension address
( address space of Cluster memory )
Addresses
Clk + Control signal
Data( 8 bit×n )
Multiple bus
Basic CellArray
decoders
Control Circuit +Bus I/F
C X
C YCluster memory
Memory – Logic Conjugate System (MLCS):Total system including some Cluster memories
Basic Cell
Outlook of MLCS structureSome size of cluster allocation matches to operation and logic density.
Kanji Otsuka, Meisei University
31
Actual design of four basic cell configuration
Four basic cell Area for TSVs
Memory (SRAM) for testing
256W x 8bit x 4cell
Kanji Otsuka, Meisei University
32
● Area is about 330X330um2 @90nm process (One Cluster)
X
Y00 01 10 11
11
10
01
00
Program memory(512w×8b)
Logical judgment circuit
Instruction decoder
Reserve part
( decoder control ) Basic cell
Basic cell array
shifter(8bit )
decoder
( Note ) (1) Program counter: 16bit . 2-cycle operation in case of overflow in address operation . 1-cycle operation (without overflow) ( by using 8bit ALU )
(2) structure of 8bit ALU . To enable 2-cycle 16bit addition, new type of adder with carry code input is introduced (which uses 4 Basic Cells).
Cluster memory layout example in single 8 bit ALU
PC Adder & 8bit ALUs (one resource shared)
32
33
Operation speed of processor mode
Area consumption on the same logic with different peripheral circuit Area Pure logic MLCS FPGA
Ratio
: constant size with some allowance design
: dynamic size with minimum design
Performance comparison between pure logic and MLCS
Power Pure logic MLCS FPGA
Relative ratio 1 2 20
Power consumption on the same logic with one thread
Band frequency
Pure logic**
(8/32bit)
MLCS (8bit) MLCS (32bit)
Non-parallel
Four parallel*
Non-parallel
Four parallel*
Maximum 4GHz 1GHz 4GHz 1GHz 4GHz
Mean rate ? ( 1GHz )
( 3GHz )
(1GHz) (4GHz)Note: *Incase of 50% independency between four threads **One thread in pure logic that is superior than the SRAM based
MLCS
,,
1 7 30
,
Pure logic would be the best for processing, however MLCS can operate dynamic reconfiguration mode and memory function.
Four multi-thread processing Program command + data
Rearrangement
Kanji Otsuka, Meisei University
Kanji Otsuka, Meisei University
34
Configuring from cluster to mat structure controlled by synchronous clock
Basic Cell Array=Cluster
decodersd
eco
der
sControl Circuit
decoders
dec
od
ers
Control Circuit
decodersControl Circuit
decoders Control Circuit
Cluster memory
Space for wiring and TSVs connecting between clusters in a mat
dec
od
ers
dec
od
ers
A mat( unit processor element )
Position of clock supply
Basic Cell Array=Cluster
Basic Cell Array=Cluster
Basic Cell Array=Cluster
Kanji Otsuka, Meisei University
35Clock synchronous cube, we said Mat
cluster
Sub-Processor
Master clock ; asynchronous on mat-to-mat
Dynamic access by asynchronous clock on mat-to-mat with dynamic reconfiguration
Hit signal from neighbor mat by the header of a packet
Clock timing image for synchronous and asynchronous
Kanji Otsuka, Meisei University
36
Array of mat
Logic
Cache surrounded the logic
Increasing and decreasing depend on cache hit ratio
Adding cache by new generated logic
When job capacity increasing
Expanding Logic
Cache surrounded the logic
Multi task with shared cache
Of course, mat itself can dynamically set number of registers depend on requirement.Mat also can include penetrated caches inside.
Dynamic reconfiguration algorism
Adjacent addressing can save the latency within 1clock within synchronous cube
Kanji Otsuka, Meisei University
Memory structured LUT presented by Masayuki Sato, RECONF Symposium 2006.9
Other approach in technical papers.
One idea introduce as half quadrate interconnection memory based logic circuit in random array, however still memories are consumed for interconnection / switching. Rearrangement of unit tile is developing now by Mr. Sato and Prof. Hironaka from Hiroshima City University.
37
Kanji Otsuka, Meisei University
38
Next significant issue is power saving.Is there drastic power saving method?
Yes we have one idea.
2
2
1, mvKmvI
start stopRadiation of heat
Kanji Otsuka, Meisei University
39
Physics of power consumption
Power consumption on unit circuit
2)(2
1]W[
)(]C[
ddILT
ddILT
VCCCP
VCCCQ
VoltageCurrent
0
Current to waste
sumondis
sumonch
sumonddf
sumonddr
on
dd
CR
tii
CR
tii
CR
tVv
CR
tVv
R
Vi
exp1,exp
exp,exp1
maxmax
max
We should recover it.
CI
RIRon
CL
CTOn current
Off current
RC Delay circuit
2
2
1, mvKmvI start stop
Radiation of heat
Kanji Otsuka, Meisei University
40K computer, performance : 10PFLOPS, Largest computer in the world at now
Power supply building
Huge power!!
Kanji Otsuka, Meisei University
41
2
2
1, mvKmvI
Sports EV
Discharge
Charge by brake
battery
One of solution can be found on electric motor car operation.
DS
G
P-type
N-type N-type
Active carriers on conduction band
DS
G
P-type
N-type N-type
Diffusing and shifting to valence band
0V
association
Generating heat
Vacancy layer
However, transistor can not recover the active carrier energy, we all would think. Is that true?
Kanji Otsuka, Meisei University
42
Input characteristic impedance Z0=100Ω
Output characteristic impedanceZ0=100Ω
Differential MOS’s in the same well
11.5um
Drain
Source
Gate
Differential pair
Space1um 2um
5um
4.3um
7.2um
Recovering signal energy method: Active carrier reused on differential CMOS circuit
Key structure is that differential MOS
transistors are positioned in the same well.
Kanji Otsuka, Meisei University
43
P n+ P
N-Well
P_SUB
N p+ N N
IN-PositiveIN-Negative
+
-
+
VRF
INP
INN
OUTN
OUTP
VDD VDD VDDVDD
Recovering signal energy method: Active carrier reused on differential CMOS I/O Driver
P-Well
Input ESD Output ESDInverter
Current control
Arrangement differential transistors in the same well
P NP
Kanji Otsuka, Meisei University
45ESD Inverter ESD
Unit cell ray-out configuration
Active carrier reused model
0V
Transient inversion region
1
0.5
oxCC /
F
sAs
sox
KqNC
CCC
4,
11 0min
1
minmin
G
sA
oxV
xKqN
KC
C
20
0202
1
1
oxCC oxCC
GV
Capacitance profile depending on bias in nMOS transistor
Kanji Otsuka, Meisei University
47
Kanji Otsuka, Meisei University
48
Forced releasing carrier by capacitance change
Moving free carrier to other
capacitance by voltage sink
TransientInitial
Discharge limiting inductance at carrier
rejection through source or drain
After inversion
Paired switch in same well
Set condition is as mobility of hole=4×102[cm2/Vs] at 300k in carrier density 1014 ~ 1015[cm-3], and Vdd=1.8V. Then drift speed
D=7.2×102 [cm2/s] is counted. When carrier traveling length is 10μm, 0.001cm=√Dt=√2×102 ・ t is derived, thus t=1.3×10-
9s=1.3ns is given comparing with longer time for our object rise time of pulse 100ps (3GHz equivalent). But electron travel time is 130ps that is our order of rise time.
Kanji Otsuka, Meisei University
49
Carrier reuse driver chip
Kanji Otsuka, Meisei University
50
Power current measurement from the voltage drop at 4.7ohm series resistance.
Substrate wiring length for differential output;
8mm Z0=100Ω
IC chip
Z0=100ohm
Z0=100ohm0.25mm length
Differential input
Flip chip bonding
Terminator 100ohm
Differential probing
R for current measurement
Cip=0.47pF
Cip=0.47pF
Cin=0.45pF Cin=0.45pF
Cwel=1.56pF“0.18um node” conventional
CMOS process
Vdd
0
2
4
6
8
0.001 0.01 0.1 110
Cu
rren
t[m
A]
Calculation current by cap.
Ohmic current
Current at Vdd 1.8V
Depressed swing height region
Vdd
Differential inverter current depending on frequency
0
2
4
6
8
10
12
14
0.001 0.01 0.1 110
Frequency [GHz]
Cu
rren
t [m
A]
Calculation current by cap.
Ohmic current
Current at Vdd 1.8V
Depressed swing height region
Reduction!!
We can save the power by carrier reused circuit.
DC current by current control transistors and
clumping drivers on others
Kanji Otsuka, Meisei University
51
terminationProbe point
4mm
Random pulse eye pattern shows high speed even in 0.18um process node.
FR-4 substrate: transmission line = 100Ω ESDZ=50ΩVCC = 1.8V termination = 100Ω、 input swing1.8V
8Gbps 9Gbps 10Gbps
11Gbps 12Gbps
Kanji Otsuka, Meisei University
52
More effective carrier reuse circuit structure is in double gate Fin type.
Drain 1 Gate 1
Source 2
Source 1
Drain 2 Gate 2
Insulating layer
Gatedrain
source
Kanji Otsuka, Meisei University
53
Device Function Initial / Carrier reuse Power saving ratio
(1) Pure logic ALU 15 to 30 %Peripheral
I/O
(2) DRAM memory mat 10 to 30 %Addressing
I/O
(3) SRAM Memory mat
25 to 45 %Addressing
I/O
(4) MLCS with small cell
M/L mat 30 to 50 %Addressing
I/O
Relative power consumption level
Power saving image in each device used by carrier reuse transistor circuit
Applicable on all differential circuit
Less than SRAM due to small cell
Kanji Otsuka, Meisei University
54
Previous listed task Solution
1.
Find function and performance beyond TSV area penalty
Tile or small block array structure through TSV interconnection
3.
Made redundancy Unified circuit such as memory-logic conjugation system
4.
Choosing power saving circuit and system
Carrier reuse transistor circuit
5.
Effective function and performance turning over cost
Unified circuit such as memory-logic conjugation system
6.
Easy design algorithm Unified circuit such as memory-logic conjugation system
Summary for a solution
As like my presentation example, more fundamental physics and algorithm concept should be developed for 3D
structure with TSVs.
SEMI Taiwan 2011の SiP Global Summit 2011, 3D IC Technology
Forum• 3D-TSVのその他の問題の動きの一部を
紹介する。
Mr. Victor Peng, Senior Vice President, Programmable Platforms Development, Xilinx, Inc.● Leads a global team responsible for development and delivery of programmable platforms including FPGA silicon, software, IP, and boards● Served as Corporate VP of the Graphics Products Group (GPG) silicon engineering with AMD ● Held key engineering leadership roles at TZero Technologies, MIPS Technologies, SGI, and Digital Equipment Corporation● BSEE from Rensselaer Polytechnic Institute & a ME in EE from Cornell University and holds four US patents
Keynote Speech - Realizing a Two Million Logic Cell 28nm FPGA with Stacked Silicon Interconnect Technology Mr. Victor Peng, Senior Vice President, Programmable Platforms Development, Xilinx, Inc. FPGA logic capacity is doubling with successive process technology nodes and has enabled systems on chip (SOC) of greater complexity to be implemented using FPGAs. The industry’s need for greater integration at the 28nm node and beyond continues and indeed is increasing for many applications. This talk will outline how Xilinx is realizing a 2M logic cell 28nm hi-k metal gate FPGA product using Stacked Silicon Interconnect (SSI) technology. SSI technology utilizes micro-bump and Through-Silicon Via (TSV) technologies, with multiple active die on a passive interposer to enable integration beyond what’s possible with monolithic die or Multi-Chip Modules (MCMs). An overview of the design, technology, and supply chain issues will be given, as well as potential future directions.
講演で隣に座ることになり、 Test Couponであるが、一番最新の試作品を見せてもらった。発表内容の主なものはその Couponであり、 Siインターポーザの魅力はTSV-3Dのいろいろな制限事項、特に放熱と TSV 応力の問題を回避でき、 Side-by-sideレイアウトのため、隣接チップの配線が短くでき、スピードと消費電力(動作時で 0.8、 Stand byで 0.5 )の改善ができる。 FPGAは隣接チップの関連付けでよいことからの利点である。45mm×45mmのサイズで、マイクロバンプは 200kバンプ( 45μ mピッチ)があり、8 層の配線である。 1チップ当たりの隣接チップとの I/Oと思われるが10kConnectionである。 FPGAの配線ネックをこれで解消できているように思える。チップ内タイルの中央に配線層がありこの部分で接続している。 2GBのメモリを含んでいる。 1ns latencyであり、1 Gbpsとなるが、 PLL 精度が 6.59psであり、これがデバイスジッター精度とすると、総合精度は 30psレベルであり、 300ps パルス( 3GH z、 6Gbps )も動作できるのではないかと思われるが、ヘビーな PLLを使ってチップ関タイミングを調整する必要が欠点である。BGAは一回り大きなもので( 50×50mm?)、見せてもらった Couponの裏側は全面バンプが多ピンバンプのピッチ程度のレベルで配置されていた。 300GBはメモしているが何か不明。
FPGAであるからこそできる高性能 Siインターポーザであり、 TSMCでチップは作り、 Amkorで Siインターポーザを ASEで組み立てを行っている。欧米 System houseが韓国、台湾に依頼して作るという図の典型である。
Mr. Takayuki Watanabe, VP of TSV Packaging Development Group, TD Office, Elpida Memory,Inc. ●TSV Pj. ,Elpida Memory ●BEOL, Akita Elpida Memory ●Semiconductor Memory, NEC●Speaker of ICEP 2009
TSV Technology for 3D DRAMMr. Takayuki Watanabe, VP of TSV Packaging Development Group, TD Office, Elpida ● To approach different die function and design of Wide I/O DRAM for mobile application and computing one.● To be possible of interposer for 2.5D to interconnect between processor and stacked TSV DRAMs.● To take DfX approach as DfT, DfM etc on die design to support TSV improvement as stacked KGD DRAM yield.● To clarify and establish mutual triangle relation via final quality assurance between customer, processor and DRAM vendor.● To standardize TSV pin/array function and location of DRAM at first and then several miscellaneous items.● To be potential of creating new TSV foundry business in case of Via Last process.
Elpedaは DRAMの生き延びるすべは画像など、処理速度向上に対する Bandwidthの拡大に徹する方向であるように思われた。その一環としてワイドバンドの通信をTSVで進める多くのアプローチを紹介していた。システム的なアプローチが多く、「 I/Oの高速化はシステムとコラボレーションが欠かせない」と考えている。 DDR3 以降の標準化もその一環として応援団を作るという方向で、坂本社長の強い意志が感じられる。PTIや OSATの UMCなどとのコラボで TSV-3Dを進めるだけでなく、秋田エルピーダを含めた IDMの力も強めている。 TSV方向は TSMCも悩んでいる OSATやそのプロセスの広がりをどう処理するか、 IDMの魅力を漏らしていたように TSVのプロダクトを多く作っていることで、 Elpedaは IDMに強みが出ると感じている。多くの TSVアプリケーションで、それぞれメモしたが、断片的で意味を付けた書きものにできなかった。一つ言えることは TSVを使用してワイドバンドで、周波数を落として、消費電力を減らすという方向である。電力低減で 3Dにもできる。したがって TSV 製品化に世界で一番熱心な会社と言える。
Dr. C.H. Yu, Sr. Director of Integrated Interconnect and Package Division, R&D, TSMC• Establish TSMC Cu/low-K technology. Deliver the first Cu/FSG, Cu/Low-k(K=3.0) in industry at 0.13 micron node, first Cu/ELK(K=2.6) at 45nm node to production. Invent and develop first Low-R/ELK technology for 28nm node. • Received National Outstanding Invention Award (CAITA), Outstanding Scientific and Technological Worker Award (National Science Council), Industrial Technology Achievement Award (MOEA), National Outstanding R&D Managers (CPMA), and Outstanding Engineer Award (Chinese Institute of Engineer). Awarded 250+ US patents with numerous publications in semiconductor areas.
• Ph.D. in Materials Sciences and Technology from Georgia Institute of Technology. Worked at AT&T Bell Labs. Currently in charge of the R&D of the Integration of Interconnect and Package Technology at TSMC including intra- and inter-chip interconnect, bumping/assembly and TSV/3D-IC technologies.
Paradigm Shift and Foundry IntegrationDr. C.H. Yu,Sr. Director of Integrated Interconnect and Package Division, R&D, TSMC In semiconductor world, there is a new paradigm shift from chip scaling to system scaling to meet the ever increasing electronic system demands in power saving, performance and functionality (including memory bandwidth) increase, form factor improvement and cost reduction. This shift is also triggered by the growing concerns for industry to sustain Moore’s Law.Through-Silicon-Via (TSV) is considered as the most promising system integration approach to enable the system scaling. It involves thin wafer processing and various forms of interconnections including both intra-/inter-chips and intra-/inter-packages interconnect. New supply chain issues arise from the complications of the thin wafer handling, known-good-die/package, and the ownership of overall long TSV process, etc. This presentation will discuss the approaches to resolve these issues.
3D-TSVを Foundry組み込みとしてまじめに検討している。 Strategic な話でFoundryと OSATを組み合わせないと成立しない。 IDMの利点が見え隠れする。商売の単位として、あらゆる Supply chainすなわち Logisticsが考えられる。技術も多様である。 Foundryが主体である状態は Via Middleであり、また、技術クリアしやすい Si Interposerの 2.5Dも含めて主導権を握ろうとしている意図が感じられた。すでに行っている問題で、 Crackと Chippingで信頼性を保持できないという現状ももらしていた。質疑で、 Logisticsに議論が白熱した。いろいろな方法論があり、どれも利点があるため、なんともいえない。いえることは司会者 ITRIの YJ Chanが締めくくったようにマーケットサイズが巨大であることから、それぞれの方法論をみんなが推し進めればよいということとなった。その後の昼食で Dr. C.H. Yuは私の話は New Conceptの提案で面白かった。私がマーケットを支配するまでは 10 年かかると思っている、との話に、Main streamになる提案であり、期待できるといってくれた。
日本の環境を利用できることが前提• 古い話であるが、「日本は IDM ( integrated device
manufacturer )が多すぎ,ファブレスが少なすぎる」。経済産業省 商務情報政策局 情報通信機器課 課長の福田秀敬氏は, 2004 年 7 月 12日に横浜で開かれた「 STARC/ASPLA共同フォーラム」の基調講演でこのように述べた。
• 3 次元インテグレーションを実現するためにはシステム全体を考えながら設計を始める必要がある。現在の水平分業体制ではだれがテストするのか誰は保証するのかといった問題があり、サブコン、ファンウンドリ、ファブレスが連携して 3D-ICを完成することは極めて難しく、 IDMにこそ一日の長がある。 IMEC, Eric Beyne、 IMEC/ARRM2007やTSMCの C.H.Yuの言。
• 利権が絡む問題は日本は弱く、現行プロトコルに支配されるシステムは手掛けるべきではない。システムを手掛ける以上新規プロトコルがキーとなるが・・・。
• 標準品で、大量生産で、ものつくり技術の優位性が発揮できるものが日本の得意分野→メモリ、画像インターフェース?、自動車センサー?
• 大塚の提案はいかがですか?