1
第五章
Memory system
2
高達 2k 個可定址的位置MDR
MAR
圖 5.1 記憶體到處理器的連線
k- 位元位址匯流排
n- 位元資料匯流排
控制線( , MFC, 等等 )
處理器 主要記憶體
字組長度 = n 位元
WR /
3
Basic Computer Organization Revisited
Processor
General-PurposeRegisters
ControlLogic
Memory
I/O
MAR
MDR
PCALUs
Data
Program
4
Access time vs cycle time
Memory access time A measurement of single access
Memory cycle time A measurement of how quickly two back-to-back accesses of a
memory chip can be made Cycle time > access time due to latency between successive
memory accesses DRAM (For construct Main memory)
access time - 50 to 150 nanoseconds require a pause (refresh) between back-to-back accesses
SRAM (For construct Cache memory) access time - 10 nanoseconds no pause between back-to-back accesses
5
FF
圖 5.2 一個記憶體晶片中位元基本位元的組織
電路Sense / Write
位址解碼器
FF
CS
基本單元記憶體
電路Sense / Write Sense / Write
電路
資料輸入 / 輸出線
A0
A1
A2
A3
W0
W1
W15
b7 b1 b0
WR /
b7 b1 b0
b7 b1 b0
•••
•••
•••
•••
•••
•••
••••••
•••
須 4+2+8=14 條外部連線
Bit lineword line
6
圖 5.3 1K*1 記憶體晶片的組織
CS
Sense/Write電路
陣列記憶體基本單元
5- 位元列位址
資料輸入 / 輸出
5- 位元解碼器
5- 位元行位址
10- 位元位址
輸出多工器與
32-to-1
輸入解多工器
32 * 32
WR/
W0
W1
W31
RAS: Row address strobe
CAS: Column address strobe
7
SRAM
SRAM Static Random Access Memory Read/write very fast Needs 6 transistors thus high cost and needs more area Do not need to refresh Low power consumption Implementation technology
CMOS
Construct cache memory
8
字組線
b
位元線
圖 5.5 互補金屬氧化物半導體( CMOS )記憶體基本位元的範例
T1 T2
T6T5
T4T3
YX
V supplyb┘
9
DRAM
DRAMDynamic Random Access MemoryNeeds 1 transistor and 1 capacitorLower cost and compactEach bit must be refreshed periodically Implementation technology
CMOS
Construct Main Memory
10
圖 5.6 單一電晶體的動態記憶體( RAM )基本位元
T
C
字組線
位元線
11
CSSense / Write電路
元素陣列列位址閂 列解碼器
行解碼器行位址閂
4096 * (512 * 8)
R /W
A20 9- A8 0-
D0D7
RAS
CAS
圖 5.7 2M*8 動態記憶體晶片的內部組織
Asynchronous DRAM
12
Fast Page Mode
conventional DRAM requires that a row and column be sent for each access
FPM works by sending the row address just once for many accesses to memory in locations near each other, improving access time. That is Row address is decoded once with varied Column address decoded to access different bytes on the same row.( 見 page 5-54 範例 5.1)
13
Extended Data Out (EDO) DRAM
EDO DRAM also called hyper page mode DRAM EDO memory has had its timing circuits
modified so one access to the memory can begin before the last one has finished
(note: conventional DRAM needs some delay between two consecutive accesses)
14
R/ W
R AS
CAS
CS
Clock
基本位元陣列列位址閂 列解碼器
圖 5.8 同步動態隨機存取記憶體
行解碼器 Read/Write電路與閂行位址閂
列行位址
資料輸入暫存器
資料
更新計數器
模式暫存器與時序控制
資料輸出暫存器
Synchronous DRAM(SDRAM)
15
SDRAM
Support burst operation Auto Column Address increment, that is do not need
external CAS cycle time to select column address Interleaving memory
contains two banks of memory internally instead of one This allows the second bank to be "precharging" (RAS an
d CAS activation) while the first bank is transferring data Will replace older DRAM technologies
16
DDR SDRAM
Double data rate SDRAM Access data both as rising and falling edge of clock Thus doubles the bandwidth of the memory by transfering data twi
ce per clock Standard SDRAM takes action only at rising edge of clock DDR II
running at 1/2 clock frequency of the I/O buffers DDR : 100MHz driven clock -> 100MHz data buffers -> DD
R applied -> 200MHz final data frequencyDDR-II: 100MHz driven clock -> 200MHz data buffers -> DDR applied -> 400MHz final data frequency
17
SIMM vs DIMM
SIMMSingle In-line Memory Modules30 pins (8 bit bus version)72 pins (wider bus, more address lines)
DIMMDual In-line Memory Modules168 pins
18
RAMBUS
RAMBUS Company Make a single chip act more like a memory system than a memory comp
onet Each chip has interleaved memory and high-speed interface RDRAM (1st generation)
Drop RAS/CAS, replacing it with a bus that allows other accesses over the bus between the sending of the address and return of the data.
Run at 300 MHz clock DRDRAM (2nd generation)
Direct RDRAM Separate row- and column-command buses instead of the conventional multi
plexing Run at 400 MHz clock
RIMM 16 RDRAM
19
Other memory
ROM PROM EPROM EEPROM Flash
Low power consumptionPortable system such as PDA, mobile phone,
digital camera, MP3
20
處理器
主要快取 L1
次要快取 L2
主記憶體
磁碟次要記憶體
大小遞增 速度遞增
圖 5.13 記憶體的階層架構
每位元成本遞增
暫存器
Memory hierarchy
21
Memory hierarchy Level 1
Registers <1KB 0.25-0.5 ns 20,000-100,000 MB/sec Managed by compiler
Level 2 Cache <16MB 0.5-25 ns 5000-10000 MB/sec Managed by hardware
Level 3 Main memory <16GB 80-250 ns 1000-5000 MB/sec Managed by OS
Level 4 Disk storage >100GB 5000000 ns 20-150 MB/sec Managed by OS/operator
22
Cache Terms
Locality of reference Temporal spatial
Cache block (cache line) Replacement algorithm Read/write hit/miss Write-through Write-back (copy-back)
Dirty bit/modified bit Valid bit
The valid bit is set every time a row is loaded into the cache by a cache miss, and can only be reset by the flush line
23
Cache mapping functions
Direct mapping( 直接映射 ) Fully associative mapping( 完全關聯映射 ) Set associative mapping ( 集合關聯映射 )
N-way associative mapping
24
tag
tag
tag
快取
主記憶體Block 0
Block 1
Block 127
Block 128
Block 129
Block 255
Block 256
Block 257
Block 4095
Block 0
Block 1
Block 127
7 4 主記憶體位址標籤 區塊 字組
圖 5.15 直接映射的快取
5
•Direct mapping( 直接映射 )
25
4
tag
tag
tag
快取
主記憶體
Block 0
Block 1
Block i
Block 4095
Block 0
Block 1
Block 127
12 主記憶體位址
圖 5.16 關聯式映射的快取
標籤 字組
Fully associative mapping( 完全關聯映射 )
26
tag
tag
tag
快取
主記憶體Block 0
Block 1
Block 63
Block 64
Block 65
Block 127
Block 128
Block 129
Block 4095
Block 0
Block 1
Block 126
tag
tagBlock 2
Block 3
tagBlock 127
主記憶體位址6 6 4T標籤 集合 字組
Set 0
Set 1
Set 63
圖 5.17 每個集合有 2 個區塊的集合關聯式映射快取
Set associative mapping ( 集合關聯映射 )
27
Replacement algorithm
LRULeast recently used最近最少使用到
Random隨機
First in First out (FIFO)最舊
28
68040 cache
4K Data cache 4K Instruction cache Contains 64 set Every set contains 4 blocks
4-way associative mapping 1 cache block contains 4 long words 1 valid bit for cache block 1 dirty bit for long word Write-back/write-through Random replacement
29
M i s s 1=
d
集合0
集合63
000BF2 0 0
區塊 3
區塊 2
區塊 1
區塊 0d
d
d
d
d
d
d
v
d
d
d
d
d
d
d區塊 0
區塊 1
區塊 2
區塊 3
8
標籤
標籤
標籤
標籤
v
v
v
dv
v
v
v
0CA020
標籤
標籤
圖 5.23 在 68040 微處理器中的資料快取組織
位元組
000BF2
No=?
Yes=?
0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 1 0 0 022 個位元 6 個位元 4 個位元
集合
0
M i s s 0=
H i t 0=
H i t 1=
位址
30
ARM710T cache
Only one cache for both data and instructions 4 KB cache 64 sets 1 set contains 4 blocks
4-way associative mapping
1 cache block contains 4 words(32bits)=16bytes Write-through Random replacement
31
Pentium III cache
L1 cache 16KB data cache
4-way Write-back or write-through
16KB instruction cache 2-way No write strategy due to pure code
L2 cache 512KB 4-way Write-back or write-through
Coppermine L2 cache built in CPU
256KB 8-way
32
Pentium 4 cache
L1 cache 8KB data cache
4-way block contains 64 bytes Write-through
L2 cache within CPU 256KB 8-way Block contains 128 bytes Write-back
L3 cache Server-based CPU
33
圖 5.24 在 Pentium III 處理器中的快取與外部連線
處理單元
匯流排介面單元
L2 快取 主記憶體 輸入 / 輸出
系統匯流排快取匯流排
L1 指令快取 L1 資料快取
34
35
36
Memory Performance
Every memory module has address buffer register (ABR) and data buffer register (DBR)
Single module continuous words Continuous module continuous words
Interleaved memoryCPU reference to continuous memory accesses
multiple module concurrently (lower bits select modules)
37
Caculate miss penalty
See p5-54 ~p5-59 examples Tave= hC + (1-h)M ,
where h: hit rate, M: miss penalty, C: access time for cache Tave=h1C1+(1-h1)h2C2+(1-h1)(1-h2)M,
h1 hit rate for L1 cache C1 access time for L1 cache h2 hit rate for L2 cache C2 access time for L2 cache M access time for main memory Note: if h1=h2=0.9 then miss penalty=(1-9)(1-.9)=1% This means if we use two level cache with 0.9 hit rate then the penalty fo
r main memory will less than 1% memory access
38
Other methods to reduce miss penalty Write buffer (improvement for write-through)
Built in CPU Write to write buffer rather than to memory, thus CPU does
n’t need to wait memory write Prefetch
Compiler inserts prefetch instructions (via analyzing codes)
Lockup-free Allowing the data cache to continue to supply cache hits d
uring a miss Helpful for processor that supports out-of-order completion
(eg. Via Tomasulo’s Algorithm)
39
Virtual memory
Virtual address (logical address) MMU (built in CPU) Physical address Page table (in Main Memory) Page frame Address translation TLB
Cache built within CPU for holding translated address just used Page fault Replacement algorithm
LRU
40
圖 5.26 虛擬記憶體組織
資料
資料
DMA 傳送
實際位址
實際位址
虛擬位址
磁碟儲存體
主記憶體
快取
MMU
處理器
41
記憶體中的分頁訊框
來自處理器的虛擬位址
位移
位移
虛擬分頁編號分頁表位址分頁表基底暫存器
圖 5.27 虛擬記憶體位址轉譯
控制位元
主記憶體中的實際位址
分頁表
分頁訊框
+
指向分頁表的起始位址
指向分頁表中某個 entry
指向實體分頁表的起始位址
指向實體分頁表中的某個 byte
Valid bit
Dirty bit
Access right of the program to the page
42
Intel IA-32 Processor’s Memory management
43
Intel IA-32 Page Translation
The entries in the page directory point to page tables, and the entries in a pagetable point to pages in physical memory. This paging method can be used to address up to 220
pages, which spans a linear address space of 232 bytes (4 GBytes).
44
To select the various table entries, the linear address is divided into three sections:• Page-directory entry—Bits 22 through 31 provide an offset to an entry in the page directory. The selected entry provides the base physical address of a page table.• Page-table entry—Bits 12 through 21 of the linear address provide an offset to an entry in the selected page table. This entry provides the base physical address of a page in physical memory.• Page offset—Bits 0 through 11 provides an offset to a physical address in the page.
45
圖 5.28 關聯式映射 TLB 的使用
No
Yes
Hit
Miss
來自處理器的虛擬位址
TLB
位移虛擬分頁編號
虛擬分頁編號 記憶體中的分頁訊框
控制位元
位移
主記憶體中的實際位址
分頁訊框
=?
儲存 CPU 剛才用過的實體與虛擬位址對應表
46
第 0 區 , 第 0 軌第 3 區 , 第 n 軌
圖 5.30 硬碟的表面組織
第 0 區 , 第 1 軌
47
Disk Access Time
Seek time Rotation time (latency time) Transfer time
48
處理器 主記憶體
系統匯流排
圖 5.31 連接到系統匯流排的磁碟機
磁碟控制器
磁碟機 磁碟機
49
RAID Redundant Array of Inexpensive Disk RAID0 : data stripping, no redundancy,Level 0 stripes data at block level RAID1 : mirroring (shadowing) RAID01(RAID0+1): mirrored stripes RAID2 :Error-Correcting Coding with hamming code
Not a typical implementation and rarely used, Level 2 stripes data at the bit level rather than the block level.
RAID3:Bit-Interleaved Parity Provides byte-level striping with a dedicated parity disk. Level 3, which c
annot service simultaneous multiple requests, also is rarely used. RAID4:Dedicated Parity Drive.
A commonly used implementation of RAID, Level 4 provides block-level striping (like Level 0) with a parity disk. If a data disk fails, the parity data is used to create a replacement disk. A disadvantage to Level 4 is that the parity disk can create write bottlenecks.
RAID5:Block Interleaved Distributed Parity, Provides data striping at the byte level and also stripe error correction in
formation. This results in excellent performance and good fault tolerance. Level 5 is one of the most popular implementations of RAID.
50
Compact Disc (CD)
CD-ROM 1X : 150KB/sec CD-ROM 40X:150 x 40 = 6MB/sec DVD (Digital Versatile Disk) DVD + R
is a non-rewritable format and it is compatible with about 89%of all DVD Players and most DVD-ROMs
DVD+R/W has some "better" features than DVD-R/W such as lossless linking and both
CAV and CLV writing. DVD – R
is a non-rewriteable format and it is compatible with about 93% of all DVD Players and most DVD-ROMs.
DVD-R/W was the first DVD recording format released that was compatible with
standalone DVD Players.
51
DVD Sizes
DVD-5, holds around 4 700 000 000 bytes and that is 4.37 computer GB where 1 kbyte is 1024 bytes* . DVD+R/W and DVD-R/W supports this format. Also called Single Sided Single Layered. This is the most common DVD Media, often called 4.7 GB Media.
DVD-10, holds around 9 400 000 000 bytes and that is 8.75 computer GB. DVD+R/W and DVD-R/W supports this format. Also called Double Sided Single Layered.
DVD-9, holds around 8 540 000 000 bytes and that is 7.95 computer GB. DVD+R supports this format. Also called Single Sided Dual Layered. This media is called DVD+R9, DVD+R DL or 8.5 GB Media.
DVD-18, holds around 17 080 000 000 bytes and that is 15.9 computer GB. DVD+R supports this format. Also called Double Sided Dual Layered.
52
DVD Write and read speeds Single Layer(4.7GB) write speeds
1x (CLV) = about 58 minutes2x (CLV) = about 29 minutes2.4x (CLV) = about 24 minutes4x (CLV) = about 14.5 minutes6x (CLV/ZCLV) = about 10-12 minutes8x (PCAV/ZCLV) = about 8-10 minutes12x (PCAV/ZCLV) = about 6.5-7.5 minutes16x (CAV/ZCLV) = about 6-7 minutes
Dual/Double Layer(8.5GB) write speeds1x CLV = about 105 minutes2.4x CLV = about 44 minutes4x CLV = about 27 minutes
Single Layer (4.7GB) read speeds6x CAV (avg. ~4x) read speed is max 7.93MB/s = ~14 minutes8x CAV (avg. ~6x) read speed is max 10.57MB/s = ~10 minutes12x CAV (avg. ~8x) read speed is max 15.85MB/s = ~7 minutes16x CAV (avg. ~12x) read speed is max 21.13MB/s = ~5 minutes
53
圖 5.33 磁帶上的資料組織
檔案標記檔案標記
檔案
7 或 9
檔案間隙 記錄間隙 記錄記錄 記錄間隙
位元••••
••••
Top Related