Processing while routing: a network-on-chip-based parallel system

download Processing while routing: a network-on-chip-based parallel system

If you can't read please download the document

description

Processing while routing: a network-on-chip-based parallel system. S.R. Fernandes 1 B.C. Oliveira 2 M. Costa 2 I.S. Silva 2 Computers & Digital Techniques, IET ,2009 Reporter: 陳健豪. OUTLINE. Introduction Related works IPNoSys architecture Results - PowerPoint PPT Presentation

Transcript of Processing while routing: a network-on-chip-based parallel system

Processing while routing: a network-on-chip-based parallel system

Processing while routing: a network-on-chip-based parallel systemS.R. Fernandes 1 B.C. Oliveira 2 M. Costa 2 I.S. Silva 2Computers & Digital Techniques, IET ,2009 Reporter:OUTLINEIntroduction

Related works

IPNoSys architecture

Results

ConclusionsIntroductionTechnology integration has increased to the point where the development of multi-core processor architectures is a market reality nowadays.Bus-based design remains useful while the number of cores in the processor is kept to a limit.More powerful interconnections, such as network-on-chip(NoC).IntroductionNoC requires more chip area and more power.

This paper proposes IPNoSys system, where the routers are also responsible for the execution of operations,besides the routing process.Related worksNoC

IPNoSys architectureThe NoC is not only interconnection mechanism but also becomes an active element in the execution of applications.

square 2D meshXY routing policyvirtual-cut-through (VCT) and wormholeswitching schemevirtual channelcredit-based control owdistributed arbitration and input bufferingIPNoSys architecture

IPNoSys architectureAn arithmetic logic unit (ALU) allowing the router to perform the most common logic-arithmetic operations usually found in applications.Routing packets and processing,being called routing and processing unit (RPU).The memory modules are accessed by memory access cores (MAC).

IPNoSys architecture

IPNoSys architecture

IPNoSys architecturespiral complement routing algorithm:

IPNoSys architectureDeadlock treatment:

The number of virtual channels is the number of times that the packet should pass through the same physical channel in the same direction.

In our case the maximum is three times (Fig. 3)

Thus, the IPNoSys system treats the deadlock through a solution called local executionIPNoSys architecturePacket format

IPNoSys architectureRouting and processing unit (RPU)

IPNoSys architecture

IPNoSys architectureMemory access core

The MACs placed in the corners are responsible for reading the packets from memory and to injecting them into the NoC, Resultsimplemented in cycle-accurate SystemCDifferent NoC dimensions

Three simulation casesSimple counterDCTRLEResultsSimple countersequential and a parallel execution

IPNoSys system allowed to reduce the maximum number of performed instructions around 80% comparing the sequential and parallel execution

ResultsDCTThe 2D-DCT is largely used in compression process of images.

The DCT application has much data dependencies,which is the worst case in ILP.

Required memory for IPNoSys is slightly increased with more parallelism because of the rise of the communication.

ResultsRLERLE is suited for compressing any type of data regardless of its information content.

For example, an uncompressed string formed by 15Acharacters would normally require 15 bytes to store:AAAAAAAAAAAAAAA.

It means the number of packets decrease,on average, at the end of its execution.

Detailed comparison(STORM x IPNoSys)STORMinstances with one, two, four or 15 SPARC V8 processors2D-mesh NoC two, three, ve and 16 routers,respectivelycache coherent directory-based MP-SoC platformXY routing scheme

ConclusionsThis paper presented an innovative NoC-based architecture that does not use traditional processors, IPNoSys.Architectures execution capability independent of the number of application instructions and NoC dimensions.In DCT,the execution time in the IPNoSys is 3.5 times smaller than the STORM best case that shows the efciency of the parallelism in this system.In RLE,the IPNoSys performance also was better than STORM.