Embedded Systems - University of Virginia School of ...

Autoware on Board: Enabling Autonomous Vehicles with Embedded Systems

Shinpei Kato∗‡§, Shota Tokunaga†, Yuya Maruyama†, Seiya Maeda†, Manato Hirabayashi‡,Yuki Kitsukawa‡, Abraham Monrroy‡, Tomohito Ando‡, Yusuke Fujii§, and Takuya Azumi†§¶

∗ Graduate School of Information Science and Technology, The University of Tokyo† Graduate School of Engineering Science, Osaka University

‡ Graduate School of Informatics, Nagoya University§ Tier IV, Inc.

¶ JST, PRESTO

Abstract—This paper presents Autoware on Board, a new profileof Autoware, especially designed to enable autonomous vehicleswith embedded systems. Autoware is a popular open-sourcesoftware project that provides a complete set of self-drivingmodules, including localization, detection, prediction, planning,and control. We customize and extend the software stack ofAutoware to accommodate embedded computing capabilities.In particular, we use DRIVE PX2 as a reference computingplatform, which is manufactured by NVIDIA Corporationfor development of autonomous vehicles, and evaluate theperformance of Autoware on ARM-based embedded processingcores and Tegra-based embedded graphics processing units(GPUs). Given that low-power CPUs are often preferred overhigh-performance GPUs, from the functional safety point ofview, this paper focuses on the application of Autoware onARM cores rather than Tegra ones. However, some Autowaremodules still need to be executed on the Tegra cores to achieveload balancing and real-time processing. The experimentalresults show that the execution latency imposed on the DRIVEPX2 platform is capped at about three times as much asthat on a high-end laptop computer. We believe that thisobserved computing performance is even acceptable for real-world production of autonomous vehicles in certain scenarios.

1. IntroductionAutonomous vehicles are becoming the next core of

mobility infrastructures. They will improve our quality oflife and social welfare significantly in the near future. Thisparadigm shift has encouraged automotive manufacturers todevelop production-quality autonomous vehicles and evenplan their mass production. Electronics and semiconductormanufacturers, as well as technology suppliers, are also ea-ger to participate in strategic cross-modal alliances with au-tomotive manufacturers. This alliance trend is due to the factthat applications of autonomous vehicles are truly diverse.Autonomous individual owned vehicles could reduce trafficaccidents and mitigate traffic jams. Autonomous mobility-as-a-service vehicles, on the other hand, could providemore efficient logistics and passenger services. There aremany other potential markets for autonomous vehicles, and

technology transfer from academia to industry is now reallyneeded.

Common functional pieces of autonomous vehicles oftenfall into sensing, computing, and actuation. Examples ofsensing devices or sensors include cameras, laser scanners(a.k.a., LiDAR), milliwave radars, and GNSS/IMU. Usingsensor data, autonomous vehicles perform localization, de-tection, prediction, planning, and control, which are coremodules of computing intelligence. Classical algorithms arein favor of rule-based systems and pattern recognition, whilerecent data-driven approaches to new algorithms leveragemachine learning and deep neural networks. More efficientapproaches and algorithms are still extensively studied to-day. Actuation of autonomous vehicles, finally, require aby-wire controller to manipulate steering (angular velocity)and throttle (linear velocity). Towards real-world production,a secure gateway must be installed between the originalCAN-bus controllers and the new by-wire controller toprotect primitive vehicle manipulation from cyber-attacks.These technical challenges of autonomous vehicles indeedrepresent the domain of cyber-physical systems.

Despite the well-studied entire architecture and indi-vidual algorithms, not much is known about the detail ofcomputation for autonomous vehicles. In particular, thereare few case studies on embedded systems, concerning realproblems of real vehicles. This is largely due to the fact thatcommon libraries and platforms have not been provided forthe community. To overcome this circumstance, we havedeveloped open-source software, called Autoware [1], [2]which includes a rich set of software packages and librariesrequired for autonomous vehicles. Autoware is now widelyused for research and development, and is probably thelargest open-source project for self-driving technology.

Contribution: This paper aims at building Autowarewith embedded systems while clarifying its performance inreal environments. To this end, we customize and extendAutoware to accommodate with ARM-based embedded pro-cessing cores and Tegra-based embedded graphics process-ing units (GPUs) of DRIVE PX2, which is manufacturedby NVIDIA Corporation for autonomous vehicles. To thebest of our knowledge, this is the first work that clarifiedthe performance of complete software stack for autonomous

287

2018 9th ACM/IEEE International Conference on Cyber-Physical Systems

0-7695-6378-3/18/$31.00 ©2018 IEEEDOI 10.1109/ICCPS.2018.00035

vehicles with embedded systems. This paper is more inter-ested in ARM cores than Tegra cores, because low-powerCPUs are often preferred to high-performance GPUs fromthe functional safety point of view. It is true that GPUsare more promising solutions to break through a trade-offrelation between power and performance of autonomousvehicles. Thus, we will explore more detailed advantagesof GPUs for Autoware in future work.

Organization: The rest of this paper is organized asfollows. Section 2 describes the system model of Autoware,including the system stack and underlying basic software.Section 3 presents the detail of Autoware with a particu-lar emphasis on the modules that we make a significantimplementation effort for DRIVE PX2. Section 4 evaluatesthe performance of Autoware on Board using DRIVE PX2.Related work is discussed in Section 5, and this paperconcludes in Section 6.

2. System ModelAutonomous vehicles as cyber-physical systems can be

abstracted into sensing, computing, and actuation modules,as shown in Figure 1. Sensing devices, such as laser scanners(LiDAR) and cameras, are typically used for self-driving inurban areas. Actuation modules handle steering and strokingwhose twisted control commands that are typically gener-ated by the path following module. Computation is a majorcomponent of self-driving technology. Scene recognition, forinstance, requires the localization, detection, and predictionmodules, whereas path planning is handled by mission-based and motion-based modules. Each module employsits own set of algorithms. The modules implemented inAutoware are discussed in Section 3.

Figure 1 shows the basic control and data flow for anautonomous vehicle. Sensors record environmental informa-tion that serves as input data for the artificial intelligencecore. 3D maps are becoming a commonplace for self-drivingsystems, particularly, in urban areas as a complement to theplanning data available from sensors. External data sourcescan improve the accuracy of localization and detection with-out increasing the complexity of the vehicle’s algorithms.Artificial intelligence cores typically output values for an-gular and linear velocities, which serve as commands forsteering and stroking, respectively.

2.1. System StackWe designed a complete software stack for autonomous

vehicles that are implemented with open-source software, asshown in Figure 2. The self-driving technology discussedherein is intended for vehicles that are driven in urban areasinstead in freeways or highways. We contributed to thedevelopment of Autoware, a popular open-source softwareproject developed for autonomous vehicles intended forusing in urban areas. Autoware is based on Robot OperatingSystem (ROS) and other well-established open-source soft-ware libraries, as shown in Figure 2. Section 2.2 providesa brief overview of ROS. Point Cloud Library (PCL) [3] ismainly used to manage LiDAR scans and 3D mapping data,in addition to performing data-filtering and visualization

��

��

��

��

��

��

� ��

��

�

��

��

��

��

��

��

��

��

��

��

��

Figure 1. Basic control and data flow of autonomous vehicles.

functions. CUDA [4] is a programming framework devel-oped by NVIDIA and is used for general-purpose computingon GPUs (GPGPU). To handle the computation-intensivetasks involved in self-driving, GPUs running CUDA arepromising solutions for self-driving technology, though thispaper does not focus on them. Caffe [5], [6] is a deeplearning framework designed with expression, speed, andmodularity in mind. OpenCV [7] is a popular computervision library for image processing.

Autoware uses such tools to compile a rich set ofsoftware packages, including sensing, perception, decisionmaking, planning, and control modules. Many drive-by-wire vehicles can be transformed into autonomous vehicleswith an installation of Autoware. In several countries, suc-cessful implementations of Autoware have been tested forautonomous vehicles that are driven for long distances inurban areas. Recently, automotive manufacturers and sup-pliers have begun implementing Autoware as a baseline forprototype autonomous vehicles.

This section briefly introduces the hardware componentsthat are generally installed in autonomous vehicles. Com-mercially available vehicles require minor modifications fora successful application of our software stack. This willenable us to control the vehicles using external computersthrough a secure gateway and install sensors on the vehicles.The vehicle, computers, and sensors can be connected byController Area Network (CAN) bus, Ethernet, and/or USB3.0. However, the specifications of sensors and computersrequired for a particular application depend closely on spe-cific functional requirements of the autonomous vehicle. Au-toware supports range sensor, and we chose Velodyne HDL-32e LiDAR scanners and PointGrey Grasshopper3 camerasfor our prototype system. Autoware also supports a rangeof processors. Many users run Autoware on desktops andlaptops. The NVIDIA DRIVE PX2 is used herein thoughour previous study ported Autoware to another embeddedcomputing platform, the Kalray Massively Parallel ProcessorArray (MPPA) [8].

2.2. Robot Operating System (ROS)As mentioned earlier, Autoware is based on ROS [9],

[10], which is a component-based middleware framework

288

Application

Runtime

OS

Hardware

ROS

Perception

Many/Multi Cores

Real-Time OS

Decision Planning

GPU FPGAsCamera GNSS

CUDAPCL Caffe

Real-Time Linux

ECUs

Control

LiDAR

Self-driving Unit

Ether

OpenCV

Driver Interface

ADAS

Sensing

IMU

CAN

Figure 2. Complete software stack for autonomous vehicles using Auto-ware.

developed for robotics research. ROS is designed to enhancethe modularity of robot applications at a fine-grained leveland is suitable for distributed systems, while allowing forefficient development. Since autonomous vehicles requiremany software packages, ROS provides a strong foundationfor Autoware’s development.

In ROS, the software is abstracted as nodes and top-ics. The nodes represent individual component modules,whereas the topics mediate inputs and outputs betweennodes, as illustrated in Figure 3. ROS nodes are usuallystandard programs written in C++. They can interface withany other software libraries that are installed.

Communication among nodes follows a pub-lish/subscribe model. This model is a strong approach formodular development. In this model, nodes communicateby passing messages via a topic. A message is structuredsimply (almost identical to structs in C) and is stored in.msg files. Nodes identify the content of the message interms of the topic name. When a node publishes a messageto a topic, another node subscribes to the topic and uses themessage. For example, in Figure 3, the “Camera Driver”node sends messages to the “Images” topic. The messagesin the topic are received by the “Traffic Light Recognition”and “Pedestrian Detection” nodes. Topics are managedusing first-in, first-out queues when accessed by multiplenodes simultaneously. At the same time, ROS nodes canlaunch several threads implicitly; however, issues withreal-time processing must be addressed.

ROS also includes an integrated visualization tool, RViz,and a data-driven simulation tool, ROSBAG. Figure 4 (a)shows examples of visualizations produced by RViz forperception tasks in Autoware. The RViz viewer is useful formonitoring the status of tasks. ROSBAG is a set of toolsfor recording and playing back ROS topics. It provides de-velopment environments for testing self-driving algorithmswithout hardware, thereby making development processesmore efficient.

3. AutowareThis section describes the software stack of Autoware

as highlighted in Figure 2, and emphasizes on the exten-

��

��

��

��

��

Figure 3. The publish/subscribe model in ROS.

sions required to adapt it to the DRIVE PX2 platform.The complete Autoware system can be analysed on itsproject repository [1]. It is important to note that Autowareis designed for such autonomous vehicles driven in urbanareas. Therefore, additional modules might be required toadapt it for freeways or highways. Discussions regardingaccuracy and optimization for each algorithm implementedin Autoware are beyond the scope of this paper.

3.1. Sensing

Autoware mainly recognizes road environments with thehelp of LiDAR scanners and cameras. LiDAR scannersmeasure the distance to objects by illuminating a targetwith pulsed lasers and measuring the time of the reflectedpulses. Point cloud data from LiDAR scanners can be usedto generate digital 3D representations of the scanned objects.Cameras are predominantly used to recognize traffic lightsand extract additional features of the scanned objects.

To achieve real-time processing, Autoware filters andpre-processes the raw point cloud data obtained from theLiDAR scanners. As a case in point, voxel grid filtering isrequired to localize, map, and detect objects in the pointcloud. This kind of filtering downsamples the point cloud,replacing a group of points contained in a cubic latticegrid to its centroid. Having this downsampled point cloud,the Normal Distributions Transform (NDT) algorithm [11]can obtain the relative position between two point clouds[12]. Autoware takes advantage of this result to get anaccurate position between the 3D map and the LiDARscanner mounted on the vehicle.

Data from other sensors such as radars, GNSS, and IMUcan be used to refine the localization, detection, and map-ping. Radars are already available in commercial vehiclesand are often used for ADAS safety applications. GNSSand IMU sensors can be coupled with gyroscope sensorsand odometers to improve the positioning information.

289

��

��

��

��

��

��

��

��

Figure 4. Autoware’s self-driving packages: (a) RViz visualization with high-definition and high-resolution geographical information, (b) NDT scan matchinglocalization using a 3D map and a 3D LiDAR scan, (c) deep learning based object detection based on You only look once (Yolo2) algorithm, (d) projectionof the 3D point-cloud data onto an image, (e) fusion for calibrated point clouds and images, (f) traffic light positions extracted from the 3D map, (g)traffic light recognition with the region of interest marked, (h) trajectory planning in a parking lot using hybrid-state A* search, (i) trajectory planningfor object avoidance using hybrid-state A* search, (j) trajectory generation for object avoidance using the state-lattice algorithm, and (k) steering angularvelocity calculations for path following using the pure pursuit algorithm.

3.2. Computing

The computing intelligence is an essential componentfor autonomous vehicles. Using sensor data and 3D maps,the vehicle can compute the final trajectory and commu-nicate with the actuation modules. This section explainsthe core modules used for perception, decision-making, andplanning.

3.2.1. Perception. Safety in autonomous vehicles is a high-priority issue. Therefore, the perception modules must cal-culate the accurate position of the ego-vehicle inside a 3Dmap, and recognize objects in the surrounding scene as wellas the status of traffic signals.

Localization: The localization module is one of the cru-cial parts of the self-driving system. Many other perceptionmodules rely on the location of the vehicle. The localizationalgorithms implemented in Autoware exploit scan matchingbetween 3D maps and LiDAR scanners. The average error isin the order of a few centimeters. Furthermore, localizationis also required to create 3D maps. This approach is of-

ten referred to as Simultaneous Localization And Mapping(SLAM). To create a map using this strategy, the matchingprocess is performed against the previous LiDAR scan,instead of a 3D map. Once the transformation has beenobtained, the current scan is transformed to the coordinatesystem of the previous one, and the point cloud is summedup. This is repeated continuously when building the map.

Autoware mainly uses the NDT [11] algorithm for local-ization. This is because the computational cost of the NDTalgorithm is not dominated by the map size, thus enablingthe deployment of high-definition and high-resolution 3Dmaps at larger scales.

Autoware also supports other localization and mappingalgorithms such as the Iterative Closest Point (ICP) algo-rithm [13]. This allows Autoware users to select the best-suited algorithm for their applications.

Detection: Autonomous vehicles must detect the sur-rounding objects, such as other vehicles, pedestrians, andtraffic signals. Having this information readily available, wecan follow traffic rules and avoid accidents.

290

Autoware supports deep learning [14], [15] and tradi-tional machine learning approaches [16] for object detectionusing Caffe and OpenCV. Figure 4 (c) shows the applicationof the Single Shot MultiBox Detector (SSD) [14] and theYou only look once (Yolo2) algorithms [15]. Both are basedon fully convolutional neural networks (CNN) architec-tures designed to provide fast computation. As previouslymentioned, Autoware also includes a pattern recognitionalgorithm based on the Deformable Part Models (DPM)[16], which searches and scores the histogram of orientedgradients features of target objects in 2D images [17].

Perception in Autoware is mainly performed using pointcloud data obtained from LiDAR scanners. The point cloudis pre-processed and segmented using the nearest neighborsalgorithm [18]. This process calculates the Euclidean dis-tance between points according to a distance threshold in3D space. Once the point cloud is clustered, the distancebetween surrounding objects and the ego-vehicle can becalculated. Moreover, using the distance of each clusterin conjunction with the classified 2D image processingalgorithms, objects can be tracked on a time to improvethe perception information.

Assuming that localization in a 3D map is precise,Autoware implements an elegant scheme for traffic lightrecognition. Knowing the current position of the ego-vehiclein a map, and having an extrinsically and intrinsically cali-brated camera and LiDAR scanner, the 3D map features canbe projected to the image captured by the front camera ofthe vehicle, with the body rigid transformations across thecamera, the LiDAR scanner, and the 3D map. Using thisstrategy, the region of interest (ROI) for traffic lights canbe limited, reducing the computation time and increasingthe accuracy of the traffic light state classifiers at the sametime. Figures 4 (f) and (g) depict the result of this approach.In general, traffic light recognition is considered to be oneof the most difficult problems associated with autonomousvehicles. Nonetheless, Autoware is able to accurately rec-ognize traffic lights using 3D map features and a highly-reliable localization algorithm.

Prediction: Object detection algorithms are computedon each synchronized camera frame and point cloud scan.These results must be associated among the time, so thetrajectories of moving objects can be estimated for themission and motion planning modules.

In Autoware, Kalman filters and particle filters are em-ployed to solve these problems. The Kalman filters areused under the assumption that the ego-vehicle is drivingat a constant velocity while tracking moving objects [19].The computational cost of the Kalman filters is lightweight,thereby making it suitable for real-time processing. Particlefilters, in contrast, can be applied to non-linear trackingscenarios, which are appropriate for realistic driving [20].Autoware uses both Kalman filters and particle filters, de-pending on the given scenario. They are used to track inboth the 2D image plane and the 3D point cloud plane.

The detected and tracked objects in the 2D image can becombined with clustered and tracked objects obtained fromthe LiDAR scanners; this idea is also known as “sensor

fusion” with projection and re-projection. The sensor fusionparameters are determined with the calibrated LiDAR scan-ners and cameras.

Point cloud can be projected onto the 2D image, therebyadding depth information to the image. Figures 4 (d) and (e)show the result of bounding box calculation to clustered 3Dpoint cloud objects as projected to the camera image.

Another supported approach consists on back-projectingthe detected and tracked objects on the image, onto the 3Dpoint cloud coordinates, using the same extrinsic parame-ters. These reprojected positions can partially be used todetermine the motion and mission plans.

3.2.2. Decision. Once the obstacles and traffic signals aredetected, the trajectories of other moving objects can beestimated. Mission planning and decision making modulesuse these estimated results to determine an appropriatedirection to which the ego-vehicle should move.

Autoware implements an intelligent state machine tounderstand, forecast, and make decisions in response to theroad status. Moreover, Autoware also allows persons in theego-vehicle to supervise automation, overwriting the statedetermined by the planning module.

3.2.3. Planning. The planning module generates trajectoriesfollowing the output from the decision-making module. Pathplanning can be classified into mission and motion planning.Autoware makes a plan for global trajectories based on thecurrent location and the given destination. Local trajectories,on the other hand, are generated by the motion planningmodule, along with the global trajectories. Available graph-search algorithms include hybrid-state A* [21], and trajec-tory generation algorithms [22] such as lattice-based meth-ods [23]. A motion planning module suited to the decision-making module is selected according to the road scenario.

Mission planning: Depending on the driving states, suchas lane changes, merges, and passing, the mission planneruses a rule-based system to determine path trajectories. Nav-igation from the current position to the destination is alsoincluded as part of mission planning. The high-definition3D map information contains static road features that canbe used for this mission planning function. In more complexscenarios, such as parking and rerouting, persons in theego-vehicle could supervise the mission capability. In eithercase, once the global path is generated, the motion planningmodule is launched to make a plan for local trajectories.

The basic policy of the mission planner is to followthe center lines of the lanes over the route generated bythe map navigation system, involving the high-definition 3Dmap information. The mission planner changes the lane onlywhen the ego-vehicle needs to pass a preceding vehicle, orapproaches an intersection followed by a turn.

Motion planning: The motion planner is in charge ofgenerating local feasible trajectories along with the givenglobal trajectories, taking into account the vehicle states,the drivable areas indicated by the 3D map, the surroundingobjects, the traffic rules, and the desired goal.

When planning on unstructured environments, such asparking lots, graph-search algorithms, such as A* [24]

291

and hybrid-state A* [21] are used to find the minimum-cost path to the goal. Figures 4 (h) and (i) show thesemethods in action. Although graph-search algorithms requireconsiderable computing capabilities, they can even analyzecomplex scenarios. On the other hand, when navigating onstructured environments, such as roads and traffic lanes,vertices are likely to be dense and irregularly distributed.This provides constrained options for feasible trajectories.For this reason, Autoware preferably uses the spatiotemporallattice-based algorithms to adapt motion plans to urban envi-ronments [25]. State-of-the-art research has also encouragedimplementation of these algorithms in previous autonomousvehicles [26]. Trajectories for obstacle avoidance and lanechanges must be calculated in real time. The lattice-basedalgorithms [27], [25] are commonly affordable for real-timeprocessing. Figure 4 (j) outlines the trajectories generatedby the lattice planner and one trajectory is selected by thecost evaluation.

3.3. Actuation

Once local trajectories are determined, autonomous ve-hicles need to follow them.

Path following: The pure pursuit algorithm [28] is usedto generate the actuation commands for the ego-vehicle. Thismethod breaks down the path into multiple waypoints, whichare discrete representations of the path. During every controlcycle, the algorithm searches for the next closest waypointin the ego-vehicle’s heading direction. The waypoint issearched under a threshold distance. This reduces the abruptchanges in angle that might occur when turning or followinga deviation path. As shown in Figure 4 (k), the velocity andangle of the following movement are set to the values thatwill move the vehicle to the selected waypoint following apredefined curvature.

Vehicle control: The target waypoint is continuouslyupdated until the goal is reached. In other words, the ego-vehicle continues following these waypoints until it reachesthe user-specified destination. However, if the steering andthrottle of the ego-vehicle are not successfully controlled inaccordance with the velocity and angle output produced bythe pure pursuit algorithm, the ego-vehicle can temporarilydeviate from the planned trajectory. To actuate the steeringand throttle based on the velocity and angle commands,a PID controller is often applied. The parameters of thePID controller highly depend on the given vehicle platform.In certain scenarios where the parameters are not correctlytuned, the PID controller will not be able to control thevehicle stably. If localization is not accurate, even worse, thiscould cause gaps between the vehicle state and the outputsof the pure pursuit algorithm. In extreme cases, the ego-vehicle could collide with unexpected obstacles. To copewith these accidental scenarios, the path following modulemust ensure a minimum distance between the ego-vehicleand detected obstacles, overriding the planned trajectory.The motion planner presented in this paper indeed reactsto this scenario, by updating the waypoints in considerationof obstacles in the heading lane area.

TABLE 1. INFORMATION OF EACH EVALUATED NODE AND ITS TOPICS.

packageTopic

Publish Subscribe

Points localizer Current vehicle poseFiltered 3D

point-cloud data

Clustering Cluster informationFiltered 3D

point-cloud data

Image detectorObjects detected Image captured

in the image form a camera

4. EvaluationThis section provides a performance evaluation of Auto-

ware on the DRIVE PX2 platform, as compared to that on ahigh-end computer. DRIVE PX2 is currently one of the mostcomputationally capable embedded multicore platforms in-tegrated with high-performance GPUs. We believe that theperformance result obtained using the DRIVE PX2 platformis a highly useful contribution for the future development ofautonomous vehicles.

The minimum set of perception packages required tooperate autonomous vehicles includes the localization, clus-tering, and detection packages. The functional nodes incor-porated in these packages require non-trivial computationalcosts. Due to a space constraint, performance evaluations ofthe planning and the control modules are omitted. In con-sequence, we evaluate the performance of those packageshighlighted in Figure 5.

We define an end-to-end time as the total run-to-completion time taken for a specific given set of packages.Measuring this end-to-end time, we can also evaluate thatthe tested computing platforms meet timing requirementsof sensing and actuation. The functional nodes highlightedby the red color in Figure 5 are those measured as theend-to-end time in this evaluation. Note that a series ofmeasurements was conducted using ROSBAG recordingsof real data obtained from LiDAR scanners and camerasmounted with our autonomous vehicle. In particular, weused a Velodyne VLP-16 LiDAR scanner and a PointGreyGrasshopper3 camera to obtain point cloud and image data,respectively. The LiDAR scanner produces about 30,000points at 10 Hz, while the camera captures a three colorchannel image of 960x604 pixels at 20 Hz.

4.1. Experimental SetupExperiments were conducted using Autoware v1.5.1,

CUDA 8.0, and ROS Kinetic on two machines: DRIVEPX2 (AutoChauffeur) [29] and a high-end laptop computerwith a GPU. The DRIVE PX2 platform integrates twoNVIDIA TEGRA X2 processors [30] (referred as TEGRAA and TEGRA B, henceforth) and a single GPU. Table 2summarizes the specifications of the DRIVE PX2 platformand the laptop computer.

4.2. MeasurementThe execution time for each tested package and the end-

to-end time were measured as the difference in time betweenthe start and the end times of the corresponding process,

292

��

��

��

��

�!�� "�� "��#

$��

!��%�&

��

��

��'(��

��

'(�� '(��

'(�� )��

'(��

��

�� #��

�� #��

*�*��

��

� �� ' �� #��

� #��

$ ��+� �� "��#

��

"��

��

��

��

��"�� ,

�� '(��

.*!�"��#.*!

��

��

��

��

�� )��

��

��0��

��"��#��*1 ��

��

"��#��

��

*��

Figure 5. A package configuration diagram of Autoware. Some packages include multiple nodes.TABLE 2. EXPERIMENTAL SETUP.

CPU Discrete GPUMemory

Model name Cores Frequency Architecture Cores

The laptop computer Intel Core i7-6700K 4 4.0 GHz Maxwell 2,048 32 GBDRIVE PX2 2× ARMv8 Denver 2.0 2× 2 2.00 GHz

2× Pascal 2× 1152 2× 8 GB(AutoChauffeur) 2× ARMv8 Coretex A57 2× 4 2.00 GHz

which are acquired by the timespec get function. The pre-cision of this function is a nanoseconds order. The overheadof function calls was also considered. Each measurementwas repeated 1,000 times, and the average performance wasobtained.

4.2.1. Execution time of target processes. The executiontime for each tested package was obtained as the differencein time between the subscription to the topic shown inTable 1 and the publication of the computation result. Inthese measurements, only the tested package and a minimumset of necessary nodes were executed to avoid additionaloverhead and system load. The measurements conducted onthe laptop computer executed all the relevant processes onthe on-board CPU and GPU, whereas those conducted onthe DRIVE PX2 platform distributed the relevant processesin such a way that the tested package was executed onthe TEGRA A and the remaining nodes were executed onTEGRA B.

Points localizer: The points localizer package was im-plemented using the NDT algorithm as described in Sec-tion 3.1. The points localizer calculates the position andorientation of the autonomous vehicle using point cloud datadownsampled by the voxel grid filter.

The execution times, when using point cloud data ex-tending to 50, 100, and 200 meters around the autonomousvehicle, are shown in Figure 6. The horizontal axis of eachfigure represents the leaf size, which is the length of the sideof the voxels. The result shows that larger leaf sizes reducethe execution time on both the DRIVE PX2 platform andthe laptop computer. The execution time is also shorter whenfiltering the point cloud data out to only those located closeto the LiDAR scanner position. These results indicate thatthe execution time decreases as the number of the voxels tobe processed decreases.

However, if the leaf size is too large, the computation ofthe NDT algorithm does not converge, and as a result, theexecution time could become very long due to many trialsof scan matching, which are never converged. This meansthat localization is only reliable within a certain threshold ofresolution for the point cloud data. Therefore, the executiontime and the accuracy of localization must be balanced withappropriate parameters under practical requirements.

Clustering: The clustering package plays a segmenta-tion of objects using point cloud data. Points above thevehicle and below the ground are ignored by specify-ing the parameters (clip min height and clip max height).In this evaluation, we set clip min height = -1.3 and

293

020

4060

8010

0

Leaf Size [Meter]

Exe

cutio

n tim

e [m

s]0

2040

6080

100

020

4060

8010

0LaptopDRIVE PX2

1.5 2.0 2.5 3.0

(1) Using point cloud data up to 50 meters.

050

100

150

Leaf Size [Meter]

Exe

cutio

n tim

e [m

s]0

5010

015

00

5010

015

0

LaptopDRIVE PX2

1.5 2.0 2.5 3.0


050

100

150

Leaf Size [Meter]

Exe

cutio

n tim

e [m

s]0

5010

015

00

5010

015

0

LaptopDRIVE PX2

1.5 2.0 2.5 3.0


Figure 6. Execution time of the points localizer package.

020

4060

80

Leaf Size [Meter]

Exe

cutio

n tim

e [m

s]0

2040

6080

020

4060

800

2040

6080

020

4060

80

Laptop (without GPU)Laptop (with GPU)DRIVE PX2 (without GPU)DRIVE PX2 (with GPU)

0.1 0.2 0.3

Figure 7. The execution time of the clustering package.

650

700

750

800

850

Laptop (without GPU)

Exe

cutio

n tim

e [m

s]

1640

016

800

1720

0

DRIVE PX2 (without GPU)

Exe

cutio

n tim

e [m

s]

2426

2830

3234

Laptop (with GPU)

Exe

cutio

n tim

e [m

s]

7580

8590

9510

0

DRIVE PX2 (with GPU)

Exe

cutio

n tim

e [m

s]

Figure 8. The execution time of the image detector package.

clip max height = 0.5.

We tested two cases: the clustering package was exe-cuted with and without using the GPU. Measurement resultswith the leaf size of 0.1, 0.2, and 0.3 meters are shown inFigure 7. On the DRIVE PX2 platform, the execution timedecreased as the leaf size increased, similarly to the case forthe laptop computer. Moreover, one can see that the presenceof the GPU helped to reduce the execution time. In casethat the laptop computer was used, the execution time wasnot influenced significantly by the presence of the GPU.We conjecture that the overhead for using the GPU wasequivalent to the extra computation time required withoutthe GPU. Therefore, in some cases, the GPU is not reallydesired to reduce the execution time. The parameters for theclustering package must also be configured appropriately,because the accuracy of clustering and the execution timehas a trade-off relation.

Image detector: The image detector package is basedon the SSD algorithm as described in Section 3.2.1. Themeasurement results with and without using the GPU areshown in Figure 7. Without using the GPU, the SSD al-gorithm required a longer execution time, meaning that theGPU is truly useful for this algorithm. The execution timeon the DRIVE PX2 platform was approximately three times

as much as that on the laptop computer. The SSD algorithmwas executed using the 1152 cores on the DRIVE PX2platform, whereas the laptop computer could use up to 2,048cores. This indicates that the difference in the number ofcores between the two platforms led to the difference inperformance as well. Therefore, the execution time could beimproved by distributing the computing workload across theTEGRA A and B processors for the DRIVE PX2 platform.The same is true of other state-of-the-art algorithms, such asYolo2 [15]: the execution time on the DRIVE PX2 platformcould be improved by load balancing.

4.2.2. End-to-end time. To measure the end-to-end time,the packages that are necessary for our autonomous vehiclewere launched at the beginning. The execution times ofa specific package set were measured and summed up. Inthese measurements, the leaf size of the points localizer wasset to 2.0 meters, and the point cloud data extending upto 200 meters ahead was used. These conditions representa similar workload model that would be required duringa real self-driving scenario. As for measurements with theDRIVE PX2 platform, the image detector package and theclustering package were launched on TEGRA A, while theother packages were launched on TEGRA B to distributethe computational load.

294

Laptop DRIVE PX2

010

020

030

040

0

Exe

cutio

n tim

e [m

s]

Figure 9. End-to-end time.

TABLE 3. CONFIGURATION OF EACH NODE.

Node parameter value

voxel grid filterVoxel Leaf Size 2 m

Measurement Range 200 mndt matching Resolution 1 m

lane rule Acceleration 1 m/s2

lane selectDistance Threshold

5 mto Neighbor Lanes

velocity setDeceleration 0.7 m/s2

Velocity Change Limit 10 km/h

waypoint followerLookahead Ratio 2 m/s

Minimum Lookahead Distance 6 m/stwist filter Lateral Accel Limit 5 m/s2

The measured end-to-end times are shown in Figure 9.The worst execution time observed on the DRIVE PX2platform was about three times as much as that on the laptopcomputer. Our observation is that this difference is causedby the performance of CPU cores rather than the GPUs.

Based on the above results, we may conclude that theDRIVE PX2 platform provides the acceptable level of per-formance for Autoware installed to real autonomous vehi-cles, if the parameters of each functional node are configuredappropriately. A demonstration movie of our autonomousvehicle powered by the DRIVE PX2 platform, with theconfiguration of Table 3, can be found at the followinghyperlink: https://youtu.be/zujGfJcZCpQ.

5. Related WorkSelf-driving technology has made a significant leap in

both theory and practice since prototype autonomous vehi-cles shown off on the DARPA Urban Challenge projects[26], [31], [32], [33], [34]. Common features found onthese prototype autonomous vehicles were 3D laser scan-ners widely used in addition to traditional cameras, radars,and GNSS/IMU. Today, many industrial companies aimingfor urban-area self-driving have followed the achievementof the DARPA Urban Challenge projects [35], [36], [37],[38], [39]. Autoware is also expected to be deployed onautonomous vehicles in urban areas, rather than freewaysand highways, and thus its design concept has been inspiredby the DARPA Urban Challenge projects.

In terms of open-source software contributions, Apollo[40] and PolySync [41] can be compared to Autoware.Apollo was originated from the Baidu’s alliance, it mainlydepends on high-precision GNSS location and 3D percep-tion. On the other hand, PolySync provides an integrateddevelopment environment for the backend framework ofself-driving technology. These two projects are likely to con-sider a specific configuration for platforms and algorithms.Autoware differs in that it is designed and implementedfor more general configurations, where users can chooseplatforms and algorithms according to their requirements.Autoware’s general purposiveness and openness have al-lowed its integration, as the base for online education ofself-driving technology presented by Udacity [38]. It alsohelped to produce autonomous vehicle platforms such asthe one created by AutonomouStuff [42].

So far, none of the above alternative contributions havesucceeded in revealing their technology with embeddedsystems. To the best of our knowledge, this paper is the firstcontribution that disclosed the complete software stack ofautonomous vehicles on embedded systems, and quantifiedits performance using real hardware.

6. ConclusionIn this paper, we have presented Autoware on Board,

a new profile of Autoware using embedded systems forautonomous vehicles. We clarified the integration feasibilityof Autoware on Board, using the DRIVE PX2 platform.We also highlighted our implementation effort to accom-modate with ARM-based embedded processing cores onthe DRIVE PX2 platform, whereas only a few functionalnodes of Autoware are offloaded to the Tegra-based GPUs.The experimental results demonstrated that the latency ofan individual module of Autoware imposed on the DRIVEPX2 platform is maintained in the order of milliseconds. Ascompared to a high-end laptop computer, the latency was,at most, threefold.

The source code of Autoware on Board implementedthrough this paper has been merged into the master branchof the Autoware GitHub repository [1]. To the best of ourknowledge, this is the first contribution that achieved apublicly-available embedded system profile for autonomousvehicles with quantified performance evaluation.

In future work, we plan to improve the design and im-plementation of Autoware for real-world production. We arealso interested in the application of it to real-world services.To that end, this work can be optimized for further low-power and vehicular-grade platforms. Particular examplesinclude the next generation of the DRIVE PX2 platform,which is expected to meet an order of TFLOPS performanceunder 30 W power, and an alternative solution withoutGPUs, called Kalray MPPA, which presumably achievesTFLOPS performance under 20 W power using many-coretechnology.

AcknowledgmentsThis work was partially supported by JST CREST Grant

Number JPMJCR1683 and JST PRESTO Grant Number

295

JPMJPR1751, Japan.

References

[1] “Autoware: Open-source software for urban autonomous driving,”https://github.com/CPFL/Autoware, accessed: 2018-02-14.

[2] S. Kato, E. Takeuchi, Y. Ishiguro, Y. Ninomiya, K. Takeda, andT. Hamada, “An Open Approach to Autonomous Vehicles,” IEEEMicro, vol. 35, no. 6, pp. 60–68, 2015.

[3] “Point Cloud Library (PCL),” http://pointclouds.org/, accessed: 2018-02-14.

[4] “Compute Unified Device Architecture (CUDA),”https://developer.nvidia.com/cudazone, accessed: 2018-02-14.

[5] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick,S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture forfast feature embedding,” arXiv preprint arXiv:1408.5093, 2014.

[6] “Caffe,” http://caffe.berkeleyvision.org/, accessed: 2018-02-14.

[7] “OpenCV,” http://opencv.org/, accessed: 2018-02-14.

[8] Y. Maruyama, S. Kato, and T. Azumi, “Exploring Scalable DataAllocation and Parallel Computing on NoC-based Embedded ManyCores,” in Proc. of IEEE International Conference on ComputerDesign (ICCD), 2017.

[9] M. Quigley, K. Conley, B. Gerkey, J. Faust, T. Foote, J. Leibs,R. Wheeler, and A. Y. Ng, “ROS: an open-source Robot OperatingSystem,” in Proc. of IEEE International Conference on Robotics andAutomation Workshop on Open Source Software, vol. 3, no. 3.2, 2009,p. 5.

[10] “ROS.org,” http://www.ros.org/, accessed: 2018-02-14.

[11] P. Biber and W. Straßer, “The normal distributions transform: A newapproach to laser scan matching,” in Proc. of IEEE/RSJ InternationalConference on Intelligent Robots and Systems (IROS), vol. 3, 2003,pp. 2743–2748.

[12] M. Magnusson, “The three-dimensional normal-distributions trans-form: an efficient representation for registration, surface analysis, andloop detection,” Ph.D. dissertation, Orebro universitet, 2009.

[13] P. J. Besl and N. McKay, “A method for registration of 3-d shapes,”IEEE Transactions on Pattern Analysis and Machine Intelligence,vol. 14, pp. 239–256, 1992.

[14] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu,and A. C. Berg, “SSD: Single shot multibox detector,” in Proc. ofEuropean Conference on Computer Vision (ECCV), 2016.

[15] J. Redmon and A. Farhadi, “YOLO9000: better, faster, stronger,”CoRR, vol. abs/1612.08242, 2016, accessed: 2018-02-14. [Online].Available: http://arxiv.org/abs/1612.08242

[16] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan,“Object detection with discriminatively trained part-based models,”IEEE transactions on pattern analysis and machine intelligence,vol. 32, no. 9, pp. 1627–1645, 2010.

[17] N. Dalal and B. Triggs, “Histograms of oriented gradients for hu-man detection,” in Proc. of IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR), vol. 1, 2005, pp.886–893.

[18] R. B. Rusu, “Semantic 3d object maps for everyday manipulation inhuman living environments,” October 2009.

[19] R. E. Kalman et al., “A new approach to linear filtering and predictionproblems,” Journal of basic Engineering, vol. 82, no. 1, pp. 35–45,1960.

[20] M. S. Arulampalam, S. Maskell, N. Gordon, and T. Clapp, “Atutorial on particle filters for online nonlinear/non-gaussian bayesiantracking,” IEEE Transactions on signal processing, vol. 50, no. 2, pp.174–188, 2002.

[21] D. Dolgov, S. Thrun, M. Montemerlo, and J. Diebel, “Path planningfor autonomous vehicles in unknown semi-structured environments,”The International Journal of Robotics Research, vol. 29, no. 5, pp.485–501, 2010.

[22] B. Nagy and A. Kelly, “Trajectory generation for car-like robotsusing cubic curvature polynomials,” Field and Service Robots, vol. 11,2001.

[23] H. Darweesh, E. Takeuchi, K. Takeda, Y. Ninomiya, A. Sujiwo, L. Y.Morales, N. Akai, T. Tomizawa, and S. Kato, “Open source integratedplanner for autonomous navigation in highly dynamic environments,”Journal of Robotics and Mechatronics, vol. 29, no. 4, pp. 668–684,2017.

[24] P. E. Hart, N. J. Nilsson, and B. Raphael, “A formal basis for theheuristic determination of minimum cost paths,” IEEE transactionson Systems Science and Cybernetics, vol. 4, no. 2, pp. 100–107, 1968.

[25] M. McNaughton, C. Urmson, J. M. Dolan, and J.-W. Lee, “Motionplanning for autonomous driving with a conformal spatiotemporallattice,” in Proc. of IEEE International Conference on Robotics andAutomation (ICRA), 2011, pp. 4889–4895.

[26] C. Urmson, J. Anhalt, D. Bagnell, C. Baker, R. Bittner, M. Clark,J. Dolan, D. Duggins, T. Galatali, C. Geyer et al., “Autonomousdriving in urban environments: Boss and the Urban Challenge,”Journal of Field Robotics, vol. 25, no. 8, pp. 425–466, 2008.

[27] M. Pivtoraiko, R. A. Knepper, and A. Kelly, “Differentially con-strained mobile robot motion planning in state lattices,” Journal ofField Robotics, vol. 26, no. 3, pp. 308–333, 2009.

[28] R. C. Coulter, “Implementation of the pure pursuit path track-ing algorithm,” CARNEGIE-MELLON UNIV PITTSBURGH PAROBOTICS INST, Tech. Rep., 1992.

[29] “NVIDIA: DRIVE PX2,” http://www.nvidia.com/object/drive-px.html, accessed: 2018-02-14.

[30] “NVIDIA: Parker,” https://blogs.nvidia.com/blog/2016/08/22/parker-for-self-driving-cars, accessed: 2018-02-14.

[31] M. Montemerlo, J. Becker, S. Bhat, H. Dahlkamp, D. Dolgov, S. Et-tinger, D. Haehnel, T. Hilden, G. Hoffmann, B. Huhnke et al., “Junior:The stanford entry in the urban challenge,” Journal of field Robotics,vol. 25, no. 9, pp. 569–597, 2008.

[32] J. Leonard, D. Barrett, J. How, S. Teller, M. Antone, S. Campbell,A. Epstein, G. Fiore, L. Fletcher, E. Frazzoli et al., “Team mit urbanchallenge technical report,” 2007.

[33] M. Buehler, K. Iagnemma, and S. Singh, The DARPA urban chal-lenge: autonomous vehicles in city traffic. Springer, 2009, vol. 56,ch. The MIT - Cornell Collision and Why it Happened, pp. 509–548.

[34] M. Buehler, K. Iagnemma, and S. Singh, Eds., The DARPA urbanchallenge: autonomous vehicles in city traffic. Springer, 2009,vol. 56, ch. A Perception-Driven Autonomous Urban Vehicle, pp.163–230.

[35] “Waymo,” https://waymo.com/, accessed: 2018-02-14.

[36] “Toyota Research Institute,” http://www.tri.global/toyota-research-institute-releases-video-guardian-and-chauffeur, accessed: 2018-02-14.

[37] “Uber,” https://www.uber.com/cities/pittsburgh/self-driving-ubers/,accessed: 2018-02-14.

[38] “Udacity,” https://www.udacity.com/self-driving-car, accessed: 2018-02-14.

[39] “Apollo,” http://apollo.auto/, accessed: 2018-02-14.

[40] “Apollo: an open autonomous driving platform,”https://github.com/ApolloAuto/apollo.

[41] “PolySync: tools and software infrastructure at the center of au-tonomous vehicle development,” https://polysync.io/, accessed: 2018-02-14.

[42] “AutonomouStuff,” https://autonomoustuff.com/, accessed: 2018-02-14.

296

Embedded Systems - University of Virginia School of ...

Documents

Transcript of Embedded Systems - University of Virginia School of ...