Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing
description
Transcript of Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing
![Page 1: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/1.jpg)
Interconnect-Centric Approach to System on a Chip (iSoC) for Low-P
ower Signal Processing
성균관대 조준동
![Page 2: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/2.jpg)
차례 • 재구성 플랫폼 재구성 플랫폼 • Software Defined Radio Software Defined Radio • SW/HW SW/HW 통합 설계 사례통합 설계 사례• SW/HW SW/HW 통합 설계 도구 통합 설계 도구 • Network on Chip Network on Chip • 연구실 소개연구실 소개• 연구 제안 연구 제안
![Page 3: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/3.jpg)
SoC and Customizable Platform Based-Design
ReconfigurableHardware
(Coarse Grain)ASIC 1
DSP Reconfigurabl
eHardware
(Fine Grain)ASIC 2
ControllerCPU
RAMROM
Flash
?
ControllerCPU
RAMROM
Flash
?
![Page 4: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/4.jpg)
Semiconductor Revolutions- Makimoto’s wave
TTL µproc.,memory
19571967
19771987
1997
2007
ASICs,accel’s
LSI,MSI
FPGAs
coarsegrain
soft CPU
s
hardware people CSpeople new breed needed
![Page 5: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/5.jpg)
Abstract• iSoC 는 SoC design 의 scalability, flexibili
ty 를 향상시키기 위한 on-chip communication architecture
• Dynamic Configuration• iSoC 의 규칙적이고 유연한 구조는 global
communication 을 위한 traffic, power, speed, area requirement 모델링을 위해 예측 가능한 framework 를 제공
![Page 6: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/6.jpg)
IBM’s Coreconnect
초기의 32 비트에서 시작하여 128 비트까지 대역폭을 확장
![Page 7: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/7.jpg)
Sonics Smart Interconnect IP
![Page 8: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/8.jpg)
SMART (Sonics Methodology and Architecture for Rapid Time-to-Market)• plug-and-play on-chip communications net
work • Packet-based• 50 employees in a year • IP 및 설계환경 제공 , SoC 설계 지원• Cadence 와 연합 • SiliconBackplne III 는 통신 + 미디어
![Page 9: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/9.jpg)
Nexperia Digital Video Platform
• Designing the initial platform, along with the pnx8500, wasn't quick and easy.
• It involved about 300 hardware, software and systems people working between 1999 and 2001, of which 60 were involved with hardware.
![Page 10: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/10.jpg)
발전 방향 • 멀티미디어 응용 제품의 확대와 이에 필요한 대용량의 burst 데이터 전송요구를 만족하기 위한 통신 대역폭을 확장 • Dual-Core Architecture
(ARM+DSP)
![Page 11: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/11.jpg)
온칩 네트워크 아키텍처
● Router/Scheduler 알고리즘 개발 ● SystemC 를 이용한 네트워크 모델 설계
및 검증 ● Star 형 /Mesh 형 온칩 네트워크 핵심 IP
설계 ● Master/Slave 네트워크 인터페이스 ,
고성능 메모리 관리 인터페이스 설계
![Page 12: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/12.jpg)
온칩 네트워크 기반 SoC 설계 플랫폼 구축 및 설계 환경
● 분산형 Crossbar Switch Topology 생성 및 IP 맵핑 툴 개발
● IP to Mesh Tile 맵핑 툴 개발 ● IP 간 데이터 플로우 분석 기반 네트워크
Topology 생성 툴 개발 , SoC 플랫폼 구축
![Page 13: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/13.jpg)
활용 분야- QoS 를 보장하는 프로토콜을 지원하여 Real Time Applicat
ion 및 대용량 데이터 대역폭이 요구되는 응용 분야에 적합
- 멀티미디어 SoC, 휴대 및 통신용 단말기 , 인터넷 셋톱 박스 , 게임기 , 네트워크 단말의 제품 구현에 필요한 시스템 레벨 칩 등
- high frame rate video 및 3D 그래픽 관련 등과 같은 멀티미디어 대용량 응용분야 SoC 설계
- 온칩 네트워크 핵심 IP 및 설계 지원 툴을 하나의 플랫폼화한 플랫폼 기반
설계 환경을 구축하여 이를 다양한 SoC 설계에 활용함
![Page 14: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/14.jpg)
최근 연구동향• Intel’s Reconfigurable Radio Architecture. (mesh + near
est neighbor)• Reconfigurable Baseband Processing, Picochip• Portable Components using Containers for Heterogene
ous Platforms, Mercury Computer Systems, Inc.• A configurable Platform, Altera, Excalibur, Xilinx Virtex F
PGA• Adaptive Computing Machine, Quicksilver Tech.• Mercury, Sky, Galileo, Tundra (crossbars, bridges)• Virginia Tech’s reconfigurable hardware
![Page 15: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/15.jpg)
66% chips are not OK on first silicon (2004)
Mid-90s – 6 months late = > 31% earnings lossToday 3 month late = $500M loss
![Page 16: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/16.jpg)
HIERARCHY OF PLATFORMS
![Page 17: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/17.jpg)
Full Application Platform• users design full applications on top of har
dware and software architectures • Nexperia• Texas Instrument's OMAP multimedia platf
orm• Infineon's M-Gold 3G wireless platform,• Parthus' Bluetooth platforms• ARM's PrimeXsys wireless platform
![Page 18: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/18.jpg)
processor-centric platform • focus on access to a configurable process
or but doesn't model complete applications
• Improv Systems• ARC• Tensilica• Triscend
![Page 19: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/19.jpg)
communication- centric platform
• interconnect architecture but doesn't typically provide a processor or a full application
• Sonics' SiliconBackplane • PalmChip's CoreFrame architectures.
![Page 20: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/20.jpg)
fully programmable platform
• consisting of FPGA logic and a processor core
• Altera's Excalibur, Xilinx' Virtex-II Pro and Quicklogic's QuickMIPS
• Xilinx-IBM XBlue architecture
![Page 21: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/21.jpg)
SDR solution 으로 5 단계Tier
0전통적인 하드웨어 구현
Tier 1
SCR(software controlled radios)
소프트웨어로 다중 하드웨어 요소에 대한 제어 특징을 구현Tier
2SDR(software
defined radios)
소프트웨어로 변조와 기저대역 처리를 구현하고 , 다중 주파수 RF 는 고정된 기능의 하드웨어로 구현
Sand-Bridge(ARM+4DSP’s)
Tier 3
ISR(Ideal Software radio)
안테나에서 아날로그 변환 기능을 갖는 RF 구현을 통해 프로그램 능력을 확장Tier
4USR(Ultimate s
oftware radio)
디지털 처리 능력에 추가하여 , 빠른( 수 millisecond 이내 ) 통신 프로토콜 전환 능력까지 제공
![Page 22: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/22.jpg)
Introduction• Wireless processing system 은 높은 throu
ghput 과 함께 많은 계산을 필요로 하지만 엄격한 power 제약이 있음• 재구성 SoC 구현은 parallelism 에 의해
성능향상을 시도하고 , IP reuse 를 사용• Hot spot bottleneck(or traffic) 에 의한 성능
예측을 통한 Algorithm partitioning
![Page 23: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/23.jpg)
Introduction• Scheduled interconnect
– Link utilizations are substantially smaller than the bus since communication is distributed and pipelined throughout the system.
– Eliminate the congestion caused by the bus and header overhead presen in dynamic routing.
• Reconfigurable Architecture Workstation (RAW) project has re-examined static communication as a mechanism for general-purpose computing.
• 규칙적인 연결구조와 정적인 스케줄링은 불필요한 interconnect switching 을 제거
• 전체 core 에서 Computational load 의 균형을 맞추어 성능향상• Overhead of the configuration streams
– Configuration streams must be scheduled periodically along with the data
– 4% 의 bandwidth 를 configuration stream 이 사용• Data content variation 과 system operating 환경에 따라 core interface 와
core 자체가 low power 모드로 동적 재설정
![Page 24: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/24.jpg)
Scheduled Communication• A tiled architecture• 각 tile 은 computational core 이며
각 interface 가 네트웍을 구성• Core interface 는 하나 이상의 tile
에서 발생하는 heterogeneous processing 의 사용을 제공함
• The system connect using statically scheduled mesh of interconnect
• Data 는 이웃하는 tile 과 communication pipeline 에 의해 이동하므로 fast clock rate 와 interconnection resource 의 시 분할이 가능
• Core 와 runtime interconnect 의 재설정 능력에 의해 dynamic power management 를 가능케 한다 .
![Page 25: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/25.jpg)
Adaptive System on Chip
![Page 26: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/26.jpg)
Communication Interface
-Stream data that passes through a communication interface is scheduled for a specific communication - clock cycle based on data link availability.-the result of scheduling for each interface is a set of instructions for its associated interconnect memory.
![Page 27: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/27.jpg)
9-core and 16-core Mode
![Page 28: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/28.jpg)
Evaluation Methodology
![Page 29: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/29.jpg)
Performance of the Benchmarks
![Page 30: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/30.jpg)
Dynamic Power Management
• Dynamic Power Management 는 data content 의 run-time variation 에 따른 서로 다른 clock domain 을 이용한 frequency 의 감소로 인한 power saving
• DCT 구현에서 계산 결과 값이 변하지 않는 high order bit 는 bypass 하여 switching 을 제거
• Valid data stream data 일 경우만 연결시켜 불필요한 switching 을 제거
• Prefetch many frames in a optimal-sized buffer [[email protected]]
![Page 31: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/31.jpg)
Dynamic Power Management
• Reconfigurable clock based system balancing creates an environment of just in time computing which can reduce overall power usage.
• Taking advantage of interconnect flexibility allows a system to dynamically change functionality and avoid unused computational units.
• Interconnect power consumption is low and the overhead due to configuration streams is under 10% for both bandwidth and power.
![Page 32: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/32.jpg)
Power Metric• Based on network activity and HSPICE circuit simulation o
f interconnect, the network power consumption(Pint) is:
T : represents the number of tilesPIF/D: overhead of the instruction memory fetch and decodes: the number of streamNvs and Nivs: the number of valid and invalid transfer for strea
m s while Ps is the power consumed in transferring 1 bit through stream s
![Page 33: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/33.jpg)
iSOC Compiler
• divides applications into parts, each of which fit into a specific core.
• determines data communications between the cores in a space-time fashion
• generate interconnect memory contents for each individual interface.
![Page 34: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/34.jpg)
References• aSOC: A Scalable, Single-Chip Communications Architecture
Jian Liang, Sriram Swaminathan, and Russell TessierDepartment of Electrical and Computer EngineeringUniversity of Massachusetts, Amherst, MA. 01003.{jliang, tessier}@ecs.umass.edu
• Configurable Platforms With Dynamic Platform Management: An Efficient Alternative to Application-Specific System-on-Chips
– Krishna Sekar Kanishka Lahiri Sujit Dey– [email protected] [email protected] [email protected]– Dept. of ECE, UC San Diego, La Jolla, CA– NEC Laboratories America, Princeton, NJ
![Page 35: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/35.jpg)
OMAPTM(open multimedia application platform)
• OMAP architecture는 platform 의 전체 clocking 과 idle mode 의 전체 control을 할 수 있는 SW/OS 가 있다 .
• Dual core architecture 는 task 에 대해 가정 적당한 process 에게 task 를 할당하는 것이 가능
![Page 36: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/36.jpg)
Memory vs Reused-IP
![Page 37: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/37.jpg)
ED2
• SMT (Simultaneous Multi-Threading) 20% speed-up and 24% power overhead [yin
[email protected]] using PowerTimer, PowerPC simulator
Slow-down using DVS: 10% energy gain, scheduling:15% every saving increase
![Page 38: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/38.jpg)
Time-Space Exploration• Enumerate all Trade-off’s and select the o
ne with the most benefit.• Branch and Bound method for estimating e
very SoC metric.
![Page 39: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/39.jpg)
Jiang Xu and Wayne WolfPrinceton University
First decide an architecture, and assign estimated requirements to unavailable modules.Adjust the requirements using performance analysis in a trial-and-error fashion.Based upon the requirements purchase IP cores and design customized modules.May need several iterations to reach a final design.It is very helpful, if designers can getperformance models of IP cores before buy them.Cadence Virtual Component Co-design(VCC)
![Page 40: Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing](https://reader036.fdocument.pub/reader036/viewer/2022070419/56815cf4550346895dcaf5f6/html5/thumbnails/40.jpg)
A Multimedia Embedded Chip