Modulator-Demodulator Discrete Multi-Tone (DMT) Ce Foloseste Procesorul de Semnal ADSP Blackfin
The World Leader in High Performance Signal Processing Solutions SMP Implementing on Blackfin BF561...
-
Upload
helena-neal -
Category
Documents
-
view
223 -
download
5
Transcript of The World Leader in High Performance Signal Processing Solutions SMP Implementing on Blackfin BF561...
The World Leader in High Performance Signal Processing Solutions
SMP Implementingon Blackfin
BF561
Graf Yang ( 杨明明 )Oct 18, 2008
Agenda
BF561 architecture Cache coherency solution Interrupt dispatch SMP status and applications SMP performance Limitations
BF561 architecture
BF561 architecture (cont.)
Block diagram
BF561 architecture (cont.)
Memory architecture• L1 run at core speed L1 scratchpad sram 4K L1 instruction cache 16K L1 instruction sram 16K L1 data cache 32K L1 data sram 32K
• L2 run at 1/2 core speed Data or instruction sram 128K Shared by CoreA/B Cached (Disabled in SMP)
BF561 architecture (cont.)
Compare to x86
BF561 x86
Cache Coherency N/A Cache coherency protocols
Atomic Instruction N/A
Local interrupt controller CEC LAPIC
System interrupt controller SIC (SICA, SICB) IOAPIC
Local timer Core timer LAPIC timer/TSC
Peripheral timer General purpose timer HPET/8254 PIT
Inter-Processor Interrupt SICB LAPIC
Lock# signal
Lock instruction prefix
BF561 architecture (cont.)
How to boot CoreB
Cores Booting Method Starting Address Settings
CoreA
BMODE[1:0]#=00
Boot from 8/16-bit external flash memory BMODE[1:0]#=01
BMODE[1:0]#=11
CoreB Execute from L1 instruction memory SICA_SYSCR[5]
Execute from 16-bit external memory (bypass) 0x2000 0000 (BANK0)
0xEF00 0000 (BOOTROM)
Boot from SPI serial EEPROM (16-bit addressable) 0xEF00 0000 (BOOTROM)
0xFF60 0000 (L1 I-SRAM)
Cache coherency solution
• Why cache coherence• Jiffies, Spin-lock, Semaphore, Mutex, ...
Cache coherence solution (cont.)
• Cache policy• Main memory - Write Through
• Shared on chip SRAM (L2 SRAM) – No cacheable
• Global Lock: protect atomic data• A special spin lock that stay in share on chip SRAM (L2 SRAM)
• Operate functions: _get_core_lock/_put_core_lock
• Parameter: address of atomic data
• Spin lock: based on global lock• Invalidate all the data cache if the same lock has been got by another CPU
• Atomic ops: based on global lock• Protect the atomic operations
• Memory barrier• Invalidate all the data cache
Interrupt dispatch
• Peripheral interrupt trigger both cores
• Two kinds of irq handlers
Interrupt dispatch (cont.)
• Time monotonicity problem• Using two Core timers causes not monotonic
• Using gptimer and 'handle_simple_irq' casues CoreB sticky
• Solution• Use general purpose timer0 instead of Core timers
• Use handle_percpu_irq() instead of handle_simple_irq()
Interrupt dispatch (cont.)
• Inter-processor interrupt: SICB_SYSCR• Write 1 to CA_supplement_int0 trigger an interrupt to CoreA
• Write 1 to CB_supplement_int0 trigger an interrupt to CoreB
• Interrupt handler write 1 to relevant bit to clear interrupt request
Inter-processor interrupt implementing • Per-cpu message queue
• Per-cpu spin lock
• Per-cpu interrupt
SMP status and application (cont.)
• SMP status• 2008R1.5svn://sources.blackfin.uclinux.org/svn/uclinux-dist/branches/2008R1/bfin_patch/smp_patch/
• Trunksvn://sources.blackfin.uclinux.org/svn/uclinux-dist/trunk/bfin_patch/smp_patch/
• Application - Multi-task
• Video encoder/decoder - codec1 on CoreB, codec2 on CoreA
• VoIP - codec on CoreB, network stack on CoreA
SMP performance
• Whetstone test result• Test software: Whetstone
• Test Hardware: BF561, Core Clock 600MHz, System Clock: 100MHz
• Test Environment 1: UP
• Test Environment 2: SMP
Command Line UP SMP
whetstone 15s 15s
whetstone ; whetstone 30s 30s
whetstone & whetstone 30s 20s
• Performance analysis
•Invalidate entire data cache: 79130 times in whetstone test
Limitations
• Store routines to L1 I-SRAM• Store shared data to L1 D-SRAM• User multi-threads running on different Cores
16
Questions?
The World Leader in High Performance Signal Processing Solutions
The End Thank you!