06111328

download 06111328

of 8

Transcript of 06111328

  • 7/29/2019 06111328

    1/8

    IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 21, NO. 1, JANUARY 2013 239

    Power Optimization in Embedded Systems via Feedback Controlof Resource Allocation

    Martina Maggio, Henry Hoffmann, Marco D. Santambrogio, Anant Agarwal, and Alberto Leva

    Abstract Embedded systems often operate in so variable con-ditions that design can only be carried out for some worst-case scenario . This leads to over-provisioned resources, and unduepower consumption. Feedback control is an effective (and notyet fully explored) way to tailor resource usage online, therebymaking the system behave and consume as if it was optimized foreach speci c utilization case. A control-theoretical methodologyis here proposed to complement architecture design in a viewto said tailoring. Experimental results show that a so addressedarchitecture meets performance requirements, while consumingless power than any xed (i.e., uncontrolled) one capable of at-taining the same goals. Also, the methodology naturally inducescomputationally lightweight control laws.

    Index Terms Computing systems, control systems, embedded

    systems, power consumption, self-adaptive systems.

    I. I NTRODUCTION

    T HE ideas of embedded and dedicated device arenowadays less coincident than just a few years ago. Itcould make sense to design a dedicated chip for an old-fash-ioned mobile phone, while this is apparently meaningless for a modern smartphone [1]. Modern devices are not designedfor just one task. They host different applications, with variousrequirements, so that resource optimization is a key to meetrequirements without scarifying power. This scenario posesnontrivial design problems, tackled at different levels, fromenergy-ef cient code generation [2], [3] to consumption esti-mation [4], battery optimization [5] and more.

    Ideally, a device should activate a different subset of its ar-chitectural capabilities for any different load con guration, de- pending on the executing applications, and also on the partic-ular data processed [6]. This makes it impossible to design thearchitecture based on some general power consumption/perfor-mance trade-off: strictly, online architecture tailoring is needed,not only for any application, but for each run of it.

    This work proposes to do so by a novel use of feedback control, that makes an already designed architecture capable

    of adapting active features continuously, minimizing power consumption while preserving a prescribed performance

    Manuscript received September 14, 2011; revised November 15, 2011; ac-cepted November 18, 2011. Manuscript received in nal form November 19,2011. Date of publication December 21,2011; date of current version December 14, 2012. Recommended by Associate Editor Q. Wang.

    M. Maggio, M. D. Santambrogio, and A. Leva are with the Dipartimento diElettronica e Informazione, Politecnico di Milano, Milan 20133, Italy (e-mail:[email protected]).

    H. Hoffmannand A. Agarwalare with theComputer Science andArti cial In-telligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA02139 USA.

    Color versions of one or more of the gures in this paper are available onlineat http://ieeexplore.ieee.org.

    Digital Object Identi er 10.1109/TCST.2011.2177499

    level irrespective of the operating conditions. Peculiar of the presented research is the deliberate quest for extremelysimple control laws, which is particularly relevant when power constraints are strict. Notice that the proposed methodologycomplements classical design, and can be seamlessly appliedto existing devices, e.g., by means of software instrumentation.To demonstrate its potential, the methodology is here appliedstep by step to a relevant case study, and experimental resultsare commented.

    II. R ELATED WORKS AND MOTIVATION

    A vast literature exists on the use o f control theory in com- puting systems [7], including emb edded ones [8], [9]. Contribu-tions span from reliability [10] to security [11], software testing[12], scheduling [13], thread- pool management [14], and more.The l rouge of virtually all the documented research is that byclosing feedback loops around a computing (embedded) system,desired propertiessuch as a prescribed throughputcan beenforced online .

    However, as recently notic ed [15], control theory can domuch more, because several computing system functionalitiesare feedback controller s in nature. If one thus accepts not onlyto close control loops around existing systems, but also to

    design parts of said s ystems as controllers , a strong and unitaryframework is availa ble to carry out and formally assess thedesign [16], limiting the role of heuristic in the nal product.

    Also the importance of power optimization is testi ed by nu-merous research works. For example, [17] and [18] explore howto execute tasks an d turn off processors in homogeneous mul-tiprocessor systems. Other relevant works are multicore chipsthat manage reso urce allocation [19], take care of critical sec-tions [20] and o ptimize for power [6]. This research too showsa l rouge an almost ubiquitous focus on of ine design for op-timization. A lso in this case, more effective a use for feedback control can be envisaged: by controlling resource allocation on-line , an (alr eady designed) architecture can be tailored not onlyfor a given application, but for each speci c run of it.

    The work described here is therefore in some sense comple-mentary to those quoted. To the best of the authors knowledge,the literature results nearest to the presented research are therecent pa pers [21], [22]. The dynamical relocation of tasks pro- posed in [21] is however based on statically precomputed map- pings fr om processes to computing elements, while here staticdata ar e just the starting point for setting up the on-line resourcetailoring. Also, [22] relies on modi cations of the Linux kernel,which allows a ner control granularity but limits the applica- bility to devices with an operating system, while here only user space resource allocation is used, which can be done on any de-

    vice, with or without supporting software.

    1063-6536/$26.00 2011 IEEE

  • 7/29/2019 06111328

    2/8

    240 IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 21, NO. 1, JANUARY 2013

    Fig. 1. Bene ts and limits of the proposed methodology on top of existing de-sign choices.

    The proposed approach opens more than one interesting per-spective also from an industrial standpoint. It allows to designmore general-purpose devices, over-dimensioning the hardwareand delegating to a intermediate software level its case-spe-ci c exploitation, with the certainty that only the necessary re-sources will be used. Correspondingly, already designed archi-tectures can be extended to new applications without undue power losses.

    III. PROPOSED METHODOLOGY

    A. Overview

    Suppose that a hardware architecture needs designing for a

    given purpose, and to minimize power consumption. Experi-ments are carried out to explorethe spectrum of operating condi-tions. Based on their results, no globally optimal design is found,i.e., some optimal designs are found for some sets of operatingconditions, but different from oneanother. Now, order those lo-cally optimal designs by capability. This leads to order the lo-cally optimal architectures by some index (e.g., the number of cores) for which a lower and an upper bound exist. Ap- parently, setting is suboptimal in performance, setting

    is suboptimal in power consumption, and there is no wayto select any intermediate value a priori .

    As anticipated, the methodology proposed here comes down-

    stream architecture design. Standard design proceduressee,e.g., [23], [24]are used up to the determination of , the soobtained architecture is realized, and then complemented withthe proposed online resource tailoring.

    Fig. 1 illustrates in abstracto how the proposed methodologyimpacts the problem. Performance degradation monotonicallydecreases with hardware potential, and is zero from a certaincapability on. Area and power cost, given the used technology,conversely increase with said potential. If a threshold is put onthe area cost, a certain level of performance degradation is to be accepted irrespective of any online action. Also a power costthreshold re ects into performance degradation, but this can beacted on online. The proposed methodology thereby breakstothe maximum possible extent, denoted with in Fig. 1thedependence of the power cost on the architecture potential.

    B. Methodology Steps

    A procedure to realize the proposed idea can be summarizedas follows. It is assumed that the hardware is already designed,and capable of covering the heaviest utilization case.

    1) De ne measurements to assess desired behavior andresource use qualitye.g., throughput and consumed powerand actions on said measured quantities, likeenabling or disabling cores, or scaling frequencyi.e., de- ne sensors and actuators. Denote by the number of sensors, each one providing a value in a set , and by the number of actuators, each one applying a value

    in a set . Notice that and can be intrinsicallyheterogeneous.2a) Perform some conveniently designed ( of ine ) tests toassess how a generic application can exploit the hardwarecapabilities in the absence of online tailoring. This pro-

    vides an uncontrolled hardware pro le (UHP) , de

    ned asfollows. Denoting by the vector of ac-tuator values, with thevector of the performance metrics, each belonging to aset and being obtained from (but not necessarilycoinciding with) the sensor outputs, and with a value of

    providing the worst value of , the UHP is a map

    where . Somehow contrary to intuition, thetests of this step are not strictly tied to any speci c appli-

    cation, rather designed on the sole basis of sensors, actu-ators, and performance indices. Based on experience, justsome broad assumptions on the applications are generallyenough to produce the UHP, and this happens only whensome values are not obtainable with objective measure-ments. The application reported later on, where the UHPis substantiated by (possible) Power-Optimal (PO) states ,will clarify.2b) In parallel with step 2a, i.e., here too off line, devise acontrol structure (CS) capable of regulating the hardwareso as to attain the desired level of performance. If the pre-vious steps were carried out thoroughly, the required CS is

    very often surprisingly simple. This is merit both in prac-tice, as time and resources are always an issue in embeddedsystems, and methodologically, as simple structures allowfor a systematic domain-speci c design, generally safer and more effective than mere heuristics.3) Based on the UHP, parametrize the CS. Notice that withthe proposed approach, should the hardware change, only step 2a) would need repeating. Also, step 3) naturally leadsto produce dynamic models for the (now controlled) hard-ware, the controller, and the closed loop. This allows to an-alyze and possibly simulate the overall system, assessingits behavior in a general manner with respect to any subse-quent use.

    In the next section, the proposed methodology is demon-strated by going through a signi cant case study.

  • 7/29/2019 06111328

    3/8

    MAGGIO et al. : POWER OPTIMIZATION IN EMBEDDED SYSTEMS VIA FEEDBACK CONTROL OF RESOURCE ALLOCATION 2 41

    IV. CASE STUDY

    A video encoder processes frames from a raw video stream,for ef cient transmission. Each frame can be encoded pro-ducing information for all pixels (I frames) or by difference,either with respect to the previous one (P frames) or with boththe previous and the subsequent one (B frames). Encoders thus

    encounter both strict power requirements (e.g., on handhelddevices) and high workload variability (e.g., from quasi-staticcases like a lecture, to high-variations ones like sport events).In all possible cases, to limit the need for buffering, a constantencoding rate is desired. For example, the National TelevisionSystem Committee (NTSC) recommends a standard rate of 30 frames per second. The chosen domain is therefore hard,as testi ed by previous works such as [25], whence the testrelevance. As a side remark, if multiple applications are to becontrolled, the methodology could be extended such as in [26].

    A. Step 1: Sensors and Actuators

    A free software library (x264) for encoding video streamsinto the H.264/MPEG-4 AVC format is used here, releasedwithin the GNU GPL license and widely used. By analyzingthe hardware, an upper bound to the resources amount neededto meet the requirement for every kind of video was found.This upper bound architecture, used in the following tests,has eight cores with a clock speed of 2394 MHz. Intuitively,since the design was carried out to cover the worst case, theamount of available resources is disproportionate for a varietyof real cases. To introduce some baseline for the followingcomparisons, also a lower bound was determined, i.e., an ar-chitecture where at least one of the video was encoded meeting

    the performance requirements, which has three cores and thesame maximum clock speed.

    To assess the system performances, three sensors are re-quired. The rst one measures the number of encoded frames per second, belonging to the set . To obtain such data,the video encoder application was instrumented by means of the Application Heartbeat framework [27]. The encoder wasmodi ed to signal the termination of each frame encodingvia an heartbeat, so that an external entity (in this case thecontroller) can retrieve that information and act accordingly.Also, the encoder was turned into a soft real-time application, by allowing it to drop frames if the speci ed frame rate is too

    low with respect to the targetin the following tests, below 25frames per second. The second sensor provides the amount of dropped frames, in the set , as output by the encoder. Thethird sensor measures the power consumed in the encoding, belonging to , with a WattsUp device [28].

    To intervene on the system, two actuators are used, namelythe number of cores and their clock speed. Denote with theset of possible number of cores currently allotted to the encoding process and with the set of frequencies of all the cores. Withthe hardware architecture used for the subsequent tests, isde ned as the set of integer values in the range . The ac-tuator is implemented through the taskset Unix command.Similarly, is a set of seven available frequencies, in the range

    MHz. This actuator is implemented through thecpufrequtils Linux package.

    TABLE ISUMMARY OF POWER -OPTIMAL (X) AND NON -POWER -OPTIMAL (-) STATES

    B. Step 2a: Data Collection

    The architecture is stressed with the PARSEC benchmark suite [29]. Consumption is measured for all the 56 power states produced by the possible combinations of number of cores andfrequency, and only some of the possible control actions arefound to result in a PO state.

    In detail, to nd the PO states, the following procedure is fol-lowed. A set of software applications that present a wide varietyof workloads are chosen, and instrumented within the Applica-tion Heartbeat framework. Recall that this is an off-line phase,thus employing a very large set is not detrimental, and one couldreally think of trying any possible application that the device isreasonably expected to host. The selected applications are thenexecuted in each of the possible con gurations as for available( xed number of) cores and frequency with different data, col-lecting power and performance measurements. Con gurationsare nally sorted based on their power consumption, and eachof them is included in the PO set if for at least one of the runs,

    any other con guration that consumes more power, also pro-duces a higher performance. In other words, PO states are thosecontrol input combinations for which in all the runs of all theapplications allotting less resources diminishes performance.

    With the given hardware, only 25 of the 56 possible statesare power-optimal. Surprisingly, none of the combinations of seven cores and any possible frequency is PO, while for exampleassigning eight cores is PO with any given frequency. Table Ireports the optimal power states found with the analysis.

    Anticipating what is explained in general later on, the con-trol rationale of PO states is the following. Each PO state is provided with an estimated speedup value. The controller pro-

    duces a speedup request, the nearest values to that request (onelower and one higher) are found in the UHP map, and a coupleof PO states is thus selected. Those two states are applied, the rst for a fraction of the subsequent sampling period and thesecond for the rest of the period, so that the time-weighed av-erage of their two speedups coincide with the request. Thus, thefeedback controller decides the amount of resources to allot,and the UHP map is used to do so by setting the system alwaysto a PO state. This in some sense decouples power from per-formance, although relying on off-line estimates: the controller operates like if it had just to allot a comprehensive resourceaffecting performance, and the off-line pro ling is used to se-lect the most power-ef cient input combination that has the de-sired effect. To estimate the speedup of a PO state, assumptionsare of course needed. Here, applications are supposed to scale

  • 7/29/2019 06111328

    4/8

    242 IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 21, NO. 1, JANUARY 2013

    linearly as the frequency increases and to obtain a speedup of when cores are devoted to execute them, and

    is the total number of available cores. This is an experimen-tally determined relationship, suitable for the presented case, butin general one can conduct tests on the speci c software to beexecuted to provide application dependent speedup values.

    C. Step 2b: Control Design

    The desired level of performance is measured by the FrameEncoding Rate output, while the actuation mechanism is de-signed to select only control actions that result in PO states for any admissible speedup value. Control structuring thus simplymeans adding a feedback block acting on the speedup andhaving the frame rate as controlled variable, the set point being xed at 30 frames per second.

    To complete Step 3) it is necessary to choose a form for thecontrol law, which can here be done by applying some verysimple results of the discrete-time control theory for linear sys-tems. The controller will calculate the speedup to be applied(with core and frequency actuation) periodically. Denoting by

    the time index for said periodic calculation, and assumingthat the time span between two subsequent calculations is longenough for the applied speedup to exert all of its action on the plant, the relationship between the speedup and the frameencoding rate (fer) is expressed in the discrete time domain by

    (1)

    where is the possibly time varying application workload ,i.e., the (nominal) amount of time consumed between two sub-

    sequent frame encodings, and an exogenous disturbance ac-counting for any non nominal time behavior. Of course, is ingeneral unknown but the pro ling phase can easily provide areliable average and bounds for it. The decision was here takento treat not as a time varying quantity but as an unknown pa-rameter, denoted in the following by . From (1) the transfer function from to fer is obviously

    (2)

    being the transform of the speedup signal andthat of the frame encoding rate. To testify how simple control

    design can be with the proposed approach, which we recall to bea speci c aim of this research, a simple choice is made, namelythe deadbeat controller obtained by solving

    (3)

    with respect to , where is the transform of thedesired frame rate. This yields

    (4)

    In the time domain this corresponds to the control law

    (5)

    Of course a minimum applicable speedup and a maximum onedo exist and are determined in the pro ling phase, but the re-sulting control saturations can be easily managed by standardanti-windup.

    Since is just a nominal value for the workload, at leasta minimal robustness analysis is in order. Trivial compu-tations show that if the actual workload is expressed as

    , thereby introducing the unknown quantityas a multiplicative error, then the eigenvalue of the closed-loopsystem is . Requiring the magnitude of saideigenvalue to be less than the unity, one nds that closed-loopstability is preserved for any in the rangehence if the workload is not excessively overestimated sucha simple control law can effectively regulate the system de- spite acceptable variations . Incidentally, this justi esfrom a practical standpointthe idea of treating as a time-varying parameter. Strictly speaking, the system is to be considereda linear discrete-time switching one with state-independentswitching signal, and a deeper analysis should be done. How-ever, for such systems, if the dynamic matrix eigenvalues lie inthe unit circle for any value of the switching signal there surelyexist a nite dwell time ensuring stability [30]. Delving intoestimates of that time would stray from the scope of this paper,and is thus omitted.

    A nal point needs addressing, since the speedup values ob-tained with the pro ling phase are isolated points while (5) pro-duces an output covering the whole (continuous) rangefrom the minimum to the maximum speedup. To cope with this,denoting by the speedup computed from (5), observe that it is by construction possible to nd two vectors of actuator values

    and such that

    (6)

    Denoting by the time span between two subsequent controller interventions, a time division output (TDO) actuation mecha-nism can be applied. This means computing the amounts of time

    and in which to apply respectively the control vectorsand as

    (7)

    The TDO actuation mechanism allows translation of a set of discrete (speedup) control values into a continuous one, there-fore obtaining the computed control signal. Notice that TDO isstandard practice in many control domains, when quantized ac-tuation comes into play.

    D. Step 3: Parametrization

    In this particular case the parametrization phase simply con-sists of plugging the numbers coming from the pro ling phaseinto the control law (5) and into the actuation policy. This task can be simply realized by including a C header le, thereforethe controller is modular and there is no need to change its codeif the architecture changes.

  • 7/29/2019 06111328

    5/8

    MAGGIO et al. : POWER OPTIMIZATION IN EMBEDDED SYSTEMS VIA FEEDBACK CONTROL OF RESOURCE ALLOCATION 2 43

    Fig. 2. (left, upper plot) Frame encoding rate during part of a video encoding, with some xed architectures and the controlled one; (left, lower plot) control actionin the controlled case.

    V. EXPERIMENTAL R ESULTS

    First of all, the results of a single run are presented. The upper diagram of Fig. 2 shows the time behavior of the frame rateduring part of a video encoding, with some xed architectures,and the controlled one. The lower picture of Fig. 2 shows howthe actuators affect the system in the controlled case. The videoused for this test is old_town_cross_1080p.yuv 1 and, for a mean-ingful comparison, the xed architectures were selected as thosethat have similar capabilities with respect to the most frequentlyselected values of the vector of actuators in the controlled case.Among the many tests presented later on, this one shows an av-erage control quality.

    Notice that on average the power consumption of the con-trolled architecture is lower than all the xed ones except for theleast capable. Numbers for the least capable architecture (1 core)are reported just for the purpose of normalization, since de factoit would never be able to attain the required goals and thereforenever used in practice. Also, the maximum power of the con-trolled solution is not the minimum one, therefore proving thatin some part of the video encoding more power with respect toothers architectures is needed to attain the performance speci -cations and that feedback takes care of the matter correctly.

    Coming to a more extensive evaluation campaign, the systemwas tested with 16 different videos. Each video was encoded25 times, and the so obtained results were averaged for a sta-tistically meaningful analysis. Results are shown in Fig. 3. For each video, the encoding procedure was conducted with 1 en-abled core and the minimum frequency, to obtain the lowest the power consumption, and than repeated with different xed (un-controlled) architectures, namely with a number of cores from 3to 8, and the maximum frequency. The test was nally repeatedactivating the controller to adapt the number of cores on line.The top row of Fig. 3 shows the average frame encoding rate (or heart rate, according to the used framework terminology), whilethe bottom row provides reports the power consumption, nor-

    malized to the minimum value above. Note that the controller is1All the videos used for the experiments are available at http://xiph.org.

    almost invariantly able to maintain the encoding rate, while con-suming an amount of power that is comparable or lower to thatof the uncontrolled architecture best attaining that rate (that, it isworth stressing, is not the same architecture as the video varies).For example, for the video named old_town_cross_1080p.yuv ,the encoding rate with 3 cores is less than 30 frames per second,while the one with four cores is above the set point value. Thecontrolled solution provides a frame rate that is closer to the de-sired value, and the power it consumes is slightly less than thethree cores solution.

    In Fig. 4, the drop rate is brought in, producing a 3-D chartwhere the axes report the (average) normalized power, the(average) encoding frame rate, and the (average) drop rate

    per video. Each architecture produces 16 points in the chart,each one being the average over 25 runs of one of the 16videos. The black point nally identi es the optimum point,corresponding to zero drop rate, 30 frames per second, andthe lowest power consumption. As can be seen, a xed lownumber of cores is better as for power, but fails at maintainingthe frame and drop rates, while a xed high number of coresdrops no frames, but consumes more power and producesan excessive frame rate. On the other hand, the controlledarchitecture remains in the vicinity of the optimum point, andabove all its results are signi cantly nearer to one another (i.e.,more uniform for the various videos) than if a xed number of

    cores is used.A further analysis can be conducted on the data to verify theapproach validity, computing the distance between the point de- picted in Fig. 4 and the optimum point depicted in black, for each of the 16 videos.

    Table II contains the results, each row referring to an ar-chitecture. The rst two columns report the minimum and themaximum value of the mentioned distance, i.e., the valuesfor the videos that are encoded on average in the best andthe worst way by each architecture. This gives an idea of the performance range produced by each architecture. The thirdand fourth column give the average distance value and itsvariance, to synthetically appreciate its distribution. Noticethat the controlled solution produces in each case comparableranges with respect to the best uncontrolled one for that case,

  • 7/29/2019 06111328

    6/8

    244 IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 21, NO. 1, JANUARY 2013

    Fig. 3. (top row) Average frame encoding (or heart) rate and (bottom row) average normalized power consumption of the x264 video encoder with 16 differentinput data over 25 different runs. The power consumption is normalized to its minimum value.

    Fig. 4. Summary of the x264 modi ed video encoder results. Average frame rate, percentage of dropped frames (as a quality measure) and average power con-

    sumption for the16 input data of Fig. 3. Thecontroller is able to meet theapplication goals (30frames persecond) while minimizing theaverage power consumption( -axes). The black dot represents the optimum point (minimum power consumption, performance guarantees and no frames dropped).

  • 7/29/2019 06111328

    7/8

    MAGGIO et al. : POWER OPTIMIZATION IN EMBEDDED SYSTEMS VIA FEEDBACK CONTROL OF RESOURCE ALLOCATION 2 45

    TABLE IISUMMARY OF DISTANCES BETWEEN THE VIDEO E NCODER POINTS AND THE

    OPTIMUM O NE AS IDENTIFIED IN FIG . 4

    but with a signi cantly lower variance. To get a visual idea of that, see how the red dots in Fig. 4 are nearer to one another

    than those in other colors.

    VI. CONCLUSION AND FUTURE WORK

    A methodology was presented to introduce feedback controlin the management of embedded system resources or, more pre-cisely, to structure such problems so that said introduction rele-gates heuristics in a region as small as possible, and not in the de-sign of the core control law. Devices endowed with the so ob-tained controllers are made capable of using only the necessaryresources in any reasonably expectable environmental condi-

    tion. Such devices can therefore be designed for the worst case,with the certainty that this will not lead to undue power con-sumption. Most important, thanks to the use of feedback controlin the way just recalled, and somehow contrary to several pro- posals in the literature, any involved entity can be characterizedand quanti ed, and both the results and their validity limits can be assessed formally . Also, the problem structuring induced bythe proposed methodology often allows the core control law to be extremely simple, like the deadbeat one used here. Appar-ently, this is a merit wherever time and resource constraints arerelevant.

    The potential of the proposed approach was demonstrated by

    its application to video encoding. The presented study showshow the controlled architecture not only uses just the necessaryresources for each application run , but also provides more uni-form performances with respect to all the uncontrolled ones.In the opinion of the authors, this work is a step forward inthe direction of ful lling the need for self adaptation that isemerging in the computing system community. Future researchwill further investigate the sensing and actuation issues hereencountered, in a view to ease the adoption of the proposedmethodology in a category of devices and applications as vast as possible. In addition, since even soft failures can be addressedwithin the proposed framework, techniques to deal with such problems will be developed. Finally, attention will always be paid to keep the architecture design and the subsequent intro-duction of control clearly separated.

    R EFERENCES

    [1] T. Hubbard, R. Lencevicius, E. Metz, and G. Raghavan, PerformanceValidation on Multicore Mobile Devices, in Veri ed Software: Theo-ries,Tools, Experiments , B.Meyer and J.Woodcock, Eds. NewYork:Springer-Verlag, 2008, pp. 413421.

    [2] V. Tiwari, S. Malik, and A. Wolfe, Power analysis of embedded soft-ware: A rst step towards software power minimization, IEEE Trans.Very Large Scale Integr. (VLSI) Syst. , vol. 2, no. 4, pp. 437445, Dec.1994.

    [3] A. Muttreja, A. Raghunathan, S. Ravi, and N. K. Jha, Automatedenergy/performance macromodeling of embedded software, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst. , vol. 26, no. 3, pp.542552, Mar. 2007.

    [4] Y. Fei, S. Ravi, A. Raghunathan, and N. K. Jha, A hybrid energy-estimation technique for extensibleprocessors, IEEE Trans. Comput.- Aided Design Integr. Circuits Syst. , vol. 23, no. 5, pp. 652664, May2004.

    [5] K. Lahiri, A. Raghunathan, and S. Dey, Ef cient power pro ling for battery-driven embedded system design, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst. , vol. 23, no. 6, pp. 919932, Jun. 2004.

    [6] R. Kumar, K. Farkas, N. Jouppi, P. Ranganathan, and D. Tullsen, Pro-cessor power reduction via single-isa heterogeneous multi-core archi-tectures, Comput. Arch. Lett. , vol. 2, no. 1, pp. 58, 2003.

    [7] J. Hellerstein, Achieving service rate objectives with decay usagescheduling, IEEE Trans. Softw. Eng. , vol. 19, no. 8, pp. 813825,Aug. 1993.

    [8] J. Hellerstein, Y. Diao, S. Parekh, and D. Tilbury , Feedback Control of Computing Systems . New York: Wiley, 2004.

    [9] G. Buttazzo, Research trends in real-time computing for embeddedsystems, ACM SIGBED Rev. , vol. 3, no. 3, pp. 110, 2006.

    [10] O. Kreidl and T. Frazier, Feedback control applied to survivability: Ahost-based autonomic defense system, IEEE Trans. Reliab. , vol. 53,no. 1, pp. 148166, Mar. 2004.

    [11] R. Dantu, J. Cangussu, and S. Patwardhan, Fast worm containmentusing feedback control, IEEE Trans. Depend. Secure Comput. , vol. 4,no. 2, pp. 119136, Apr. 2007.

    [12] M. Bayanand J. Cangussu,ACM, Automaticfeedback,control-based,stress and load testing, in Proc. ACM Symp. Appl. Comput. , 2008, pp.

    661666.[13] T. Cucinotta, F. Checconi, L. Abeni, and L. Palopoli, Self-tuningschedulers for legacy real-time applications, in Proc. 5th Euro. Conf.Comput. Syst. , 2010, pp. 5568.

    [14] J. Hellerstein,V. Morrison, andE. Eilebrecht,Applyingcontroltheoryin the real world: Experience with building a controller for the .netthread pool, SIGMETRICS Perform. Eval. Rev. , vol. 37, no. 3, pp.3842, 2009.

    [15] A. Leva and M. Maggio, Feedback process scheduling with simplediscrete-time control structures, IET Control Theory Appl. , vol. 4, no.11, pp. 23312342, 2010.

    [16] C. Karamanolis, M. Karlsson, and X. Zhu, USENIX Association, De-signing controllable computer systems, in Proc. 10th Conf. Hot TopicsOperat. Syst. , 2005, pp. 915.

    [17] R. Xu, D. Zhu, C. Rusu, R. Melhem, and D. Moss, Energy-ef cient policies for embedded clusters, in Proc. LCTES , 2005, pp. 110.

    [18] J. Chen, H. Hs, and T. Kuo, Leakage-aware energy-ef cient sched-uling of real-time tasks in multiprocessor systems, in Proc. RTAS ,2006, pp. 408417.

    [19] R. Bitirgen, E. Ipek, and J. Martinez, IEEE Computer Society,Coordinated management of multiple interacting resources in chipmultiprocessors: A machine learning approach, in Proc. 41st Annual IEEE/ACM Int. Symp. Microarch. (MICRO 41) , 2008, pp. 318329.

    [20] M. Suleman, O. Mutlu, M. Qureshi, and Y. Patt, Accelerating criticalsection execution with asymmetric multi-core architectures, in Proc. ASPLOS , 2009, pp. 253264.

    [21] A. Schranzhofer, J.-J. Chen, and L. Thiele, Dynamic power-awaremapping of applications onto heterogeneous MPSoC platforms, IEEE Trans. Ind. Inform. , vol. 6, no. 4, pp. 692707, Nov. 2010.

    [22] E. Bini, G. Buttazzo, J. Eker, S. Schorr, R. Guerra, G. Fohler, K.-E.Arzen, V. R. Segovia, and C. Scordino, Resource management onmulticore systems: The actors approach, IEEE Micro , vol. 31, no. 3, pp. 7281, May/Jun. 2011.

    [23] T. Lv, J. Xu, W. Wolf, I. Ozer, J. Henkel, and S. Chakradhar, Amethodology for architectural design of multimedia multiprocessor socs, IEEE Design Test Comput. , vol. 22, no. 1, pp. 1826, Jan. 2005.

  • 7/29/2019 06111328

    8/8

    246 IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 21, NO. 1, JANUARY 2013

    [24] H. Hsu, J. Chen, and T. Kuo, Multiprocessor synthesis for periodichard real-time tasks under a given energy constraint, in Proc. Conf. Design, Autom., Test Euro. (DATE) , 2006, pp. 10611066.

    [25] M. Sha que, L. Bauer, and J. Henkel, Enbudget: A run-time adaptive predictive energy-budgeting scheme for energy-aware motion estima-tion in h.264/MPEG-4 AVC video encoder, in Proc. Design, Autom.Test Euro. Conf. Exhib. (DATE) , 2010, pp. 17251730.

    [26] M. Maggio, H. Hoffmann, A. Agarwal, and A. Leva, Control-theoret-ical cpu allocation:Design and implementationwith feedback control, presented at the 6th Int. Workshop Feedback Control ImplementationDesign Comput. Syst. Netw., New York, 2011.

    [27] H. Hoffmann, J. Eastep, M. Santambrogio, J. Miller, and A. Agarwal,ACM, Application heartbeats: A generic interface for specifying pro-gramperformanceand goalsin autonomous computingenvironments,in Proc. 7th Int. Conf. Autonomic Comput. (ICAC) , 2010, pp. 7988.

    [28] Watts up?, Denver, CO, Wattsup .net meter, 2010. [Online]. Avail-able: http://www.wattsupmeters.com/

    [29] C. Bienia, S. Kumar, J. Singh, and K. Li, The PARSEC benchmark suite: Characterization and architectural implications, in Proc. 17th Int. Conf. Paral. Arch. Compilation Techniques , 2008, pp. 7281.

    [30] J. Geromel and P. Colaneri, Stability and stabilization of discrete timeswitched systems, Int. J. Control , vol. 79, no. 7, pp. 719728, 2006.