Echtzeit und Nicht-Echtzeit Integration von MATLAB und Simulink in das neue Robotik Software
Framework aRDx des DLR
Berthold Bäuml !
Autonomous Learning Robots Lab Institute of Robotics and Mechatronics
German Aerospace Center (DLR) [email protected]
“Agile Justin” — An Advanced Robotic Systemsensing • stereo cameras (2MPixel/25Hz)RGB-D sensor (0.5MPixel/33Hz) • torque sensor (all DOF, 1kHz)tactile skin on hands (3000taxel/750Hz) •IMU (6D, 500Hz)
acting • 53 DOF = 8 (plattform) + 19 (torso) + 26 (hands) • torque control over all DOF • 1kHz, <3ms latency, <100us Jitter
computing • 4x Core i7 Quadcore (onboard) • CPU cluster with 64 cores • GPGPU cluster 16 NVidia K20
“Playing Balls”: Perception and Action at the Limits
“Mars Habitat”: Dexterous Mobile Manipulation
Planning and Object Recognition
Tactile Skin
camera
IMU
skin
kinect
pose filter
comp. combine
skin recognition
decombine
object recognition
planner task scheduler
sim. torso
sim. mobile platform
sim. hands
motion coordinator
fused 3d modeller
balltracker
Onboard Linux
Onboard QNX Planning Cluster
Object Recognition
GPU Cluster
camera
comp. comp.
comp.
comp.
drive iPad
Software Architecture for Advanced Robotic Systems
• communicating components central abstraction in robotic frameworks
• packet/message based communication
• dynamic connecting and disconnecting of components
• concurrent, parallel execution
• distributed over network of computing resources
• components decoupled
• as modules for team development
• as process for robustness at runtime
• popular examples: ROS, YARP, OROCOS
Software Architecture for Advanced Robotic Systems
domainlow level driver, joint controller
robot controller, sensor
preprocessing
world modeling, path planning
“AI” (logic planner, cognitive model)
computermicrocontroller/
FPGArealtime PC
(QNX)CPU/GPGPU cluster
(Linux)cluster, internet
server cloud
communi-cation
hardware bus (SPI, I2C, …)
hard realtime, distributed, QoS, up to 1kHz, 5MB
“optimal” transport, distributed, QoS, ~10Hz, up to 1GB
“fast” transport, distributed
<10Hz, <10MBbits/bytes &
simple structsnested static
structs & arraysnested static structs
& dynamic arraysflexible, recursive
data types & program snips
languagehard realtime, small footprint,
HDL
hard realtime, efficient, parallel
efficient on large data, parallel, OOP
parallel, high level (functional,
declarative, …)
Software Architecture for Advanced Robotic Systems
big challenge: wide range of software domains
ROS (Robot Operating System, www.ros.org)
• quasi-standard in robotics
• communication
• no hard realtime
• no QoS
• not optimal (e.g., copy-once for inter-process)
• nested structs and dynamic 1D arrays
• but no multidimensional arrays and beyond (e.g., recursive types)
• languages
• C/C++ for low level, planning/modelling
• Python as “high level” language
• not compiled, no parallel execution
• -> bad performance (50x slower than C)
• -> lot of higher level stuff has to implemented in C/C++
• bindings to high level languages (diverse Lisps and Prolog, …)
domainlow level driver, joint controller
robot controller, sensor
preprocessing
world modeling, path planning
“AI” (logic planner, cognitive model)
computermicrocontroller/
FPGArealtime PC
(QNX)CPU/GPGPU cluster
(Linux)cluster, internet
server cloud
communi-cation
hardware bus (SPI, I2C, …)
hard realtime, distributed, QoS, up to 1kHz, 5MB
“optimal” transport, distributed, QoS, ~10Hz, up to 1GB
“fast” transport, distributed
<10Hz, <10MBbits/bytes &
simple structsnested static
structs & arraysnested static structs
& dynamic arraysflexible, recursive
data types & program snips
languagehard realtime, small footprint,
HDL
hard realtime, efficient, parallel
efficient on large data, parallel, OOP
parallel, high level (functional,
declarative, …)
C/C++ Python, Lisp, Prolog
ROSROS-core
aRDx (Agile Robot Development “Next Generation”)• communication
• no single stack can fulfill all demands -> two stacks!
• highly performant hard realtime stack
• minimal latency and jitter, detailed QoS
• optimal transport (zero-copy for IPC, copy-once to each host)
• nested structs and static multidimensional arrays
• high level directly by modern programming language
• automatic serialization of
• arbitrary complex (recursice data types)
• code (sending closures)
• (amlost) optimal transport (copy-once to each VM on each host)
• language
• C/C++ for low level
• Racket (http://racket-lang.org): modern language from Lisp family
• functional, compiled, parallel (to some extent)
• only 5x slower than C (compared to Python’s 50x slower than C)
• “programmable programming language” -> OOP, DSLs, Prolog, …
A Comparison of the Raw Communication Performance of Robotic Frameworks
Ê Ê Ê Ê Ê Ê Ê ÊÙ Ù Ù ÙÙ
Ù
Ù
Ù
Á Á ÁÁ
Á
Á
Á
Á
‡ ‡ ‡ ‡‡
‡
‡
‡
Ï Ï Ï Ï ÏÏ Ï ÏÚ Ú Ú Ú Ú Ú Ú Ú
Ê aRDx Ù aRD
Á Orocos ‡ ROS
Ï ROS HfixedL Ú YARP
1 102 104 106 10810-6
10-4
10-2
1
packet size @byteD
roun
d-tri
ptim
e@sD
process
Ê Ê Ê Ê Ê Ê Ê ÊÙ Ù Ù Ù
Ù
Ù
Ù
Ù
Á Á Á ÁÁ
Á
Á
Á
‡ ‡
‡ ‡
‡
‡
‡
‡
Ú Ú Ú Ú ÚÚ
Ú
Ú
10-3 110-4
10-2‡ ‡
‡
‡ ‡
*
pause @sD
1 102 104 106 10810-6
10-4
10-2
1
packet size @byteD
host
Ê Ê ÊÊ
Ê
Ê
Ê
Ê
Ù Ù ÙÙ
Ù
Ù
Ù
Ù
Á Á Á Á
Á
Á
Á
Á
‡ ‡
‡
‡‡
‡
‡
‡
Ú Ú Ú ÚÚ
Ú
Ú
Ú
1 102 104 106 10810-4
10-3
10-2
10-1
1
10
packet size @byteD
distributed
Ê Ê Ê Ê Ê Ê ÊÙ Ù Ù Ù
Ù
Ù
Ù
Á Á ÁÁ
Á
Á
‡ ‡ ‡ ‡ ‡
‡
‡
‡
Ï Ï Ï Ï Ï Ï Ï ÏÚ Ú Ú Ú Ú Ú Ú
1 102 104 106 10810-6
10-4
10-2
1
packet size @byteD
roun
d-tri
ptim
e@sD
process
Ê Ê Ê Ê Ê Ê Ê
Ù Ù Ù Ù
Ù
Ù
Ù
Á Á Á ÁÁ
Á
Á
‡ ‡
‡ ‡
‡
‡
‡
Ú Ú Ú Ú Ú
Ú
ÚÚ
1 102 104 106 10810-6
10-4
10-2
1
packet size @byteD
host
Ê Ê Ê
Ê
Ê
Ê
Ê
Ù Ù Ù
Ù
Ù
Ù
Ù
Á Á Á ÁÁ
Á
Á
‡ ‡
‡
‡
‡
‡
‡
Ú Ú ÚÚ
Ú
Ú
Ú
1 102 104 106 10810-4
10-3
10-2
10-1
1
10
packet size @byteD
distributed
Fig. 5. Results of the stress test benchmark for 1 (top) and 20 (bottom) clients and for the three domains (columns). Each plot shows the mean round-triptime (averaged over some 100 runs) over the packet size for the various frameworks. Please, be aware of the log-log-scaling of the plots. The performance ofaRDx is almost always the best – most dramatically for the host domain where no other framework can provide zero-copy semantics. Only for small packetsizes (up to 1KB) where the transfer time is dominated by the constant overhead of a framework aRDx is beaten by aRD’s minimalistic implementationand in the 1-client case and large packets YARP is about 10% faster presumably due to a slightly more clever configuration of the TCP sockets. In the20-client case aRDx beats in the distributed domain all other frameworks by a factor of 2 because it has to transfer the packets sent from the master to theremote client only once and, hence, in each round of the test instead of 20+20 packets only 2+20 packets have to be transmitted over the GigE network.The increased constant overhead of aRDx for the host compared to the process domain is about 5x and, hence, close to the theoretically expected 6x dueto the indirect communication through the daemon. Interestingly, although aRDx needs a quite complex logic to provide zero-copy semantics in the hostdomain, its constant overhead is still 4x smaller than that of all other (except aRD) frameworks. In what follows we discuss some feature and quirks ofthe other frameworks we came about. All these frameworks scale very well and roughly linear with the number of clients. For the process domain YARPcan provide zero-copy semantics. In this domain ROS with its nodelets also was expected to show constant transfer times but could do so only after wefixed the implementation (labeled ROS fixed) – standard ROS (labeled ROS) completely initializes the memory of newly constructed packets, hence, thetransfer time has to scale with the packet size. For the host and the distributed domain YARP and ROS perform very similar as both communicate overTCP sockets (side note: for YARP, because of instabilities, we could use the potentially more efficient mutlicast and shared memory modes) . In caseof the host domain and large packets (> 1MB) they even reach almost the performance of the shared memory based transport of aRD showing that theLinux loopback sockets are very efficient. In all tests the performance of Orocos was worst, although we always tried the optimal parameters. We suspectthat this comes due to the additional abstraction layer with ACE/TAO in its communication stack. For ROS we found another severe quirk in the host anddistributed domain and packet sizes of 10KB to 100KB. There the round-trip time dramatically increases 100x. A further analysis (showed that this effectdisappears completely when adding a pause of at least 100ms between each round of the test (see the inset in the 1-client plot depicting the round-triptime over the pause time for 1KB packet) . This means, ROS is not really stress resistent.
with 100 clients running in the kHz range. Even for thedistributed domain the worst-case round-trip latencies are nolonger than 500µs.
IV. CONCLUSIONS
We presented the design considerations and implementa-tion details of the new highly performant, realtime capable,minimalistic and simple communication layer of our aRDxsoftware framework. In an in-depth benchmarking on Linuxof the raw communication performance of aRDx and thepopular robotic software frameworks ROS, YARP, Orocosand aRD it was shown that aRDx performs excellent in bothextreme performance aspects, namely latency and bandwidth,and partially dramatically outperforms the other frameworks.In addition due to the ”stress” character of our tests we coulduncover a number of severe quirks in all other frameworks.
Running on QNX, aRDx provides hard realtime perfor-mance even for distributed applications.
aRDx is already successfuly in use on our advanced andcomplex humanoid robot Agile Justin. In future publicationswe will describe its other, high level parts, like the dynamicand flexible but less performant communication layer or theadvanced mechanisms for startup and shutdown of largedistributed applications.
REFERENCES
[1] PR2 - personal robot 2. [Online]. Available:http://www.willowgarage.com
[2] M. Quigley, B. Gerkey, K. Conley, J. Faust, T. Foote, J. Leibs,E. Berger, R. Wheeler, and A. Ng, “Ros: an open-source robot oper-ating system,” in Proceedings of the Open-Source Software workshopat the International Conference on Robotics and Automation (ICRA),2009.
[3] icub. [Online]. Available: http://www.icub.org
domainlow level driver, joint controller
robot controller, sensor
preprocessing
world modeling, path planning
“AI” (logic planner, cognitive model)
computermicrocontroller/
FPGArealtime PC
(QNX)CPU/GPGPU cluster
(Linux)cluster, internet
server cloud
communi-cation
hardware bus (SPI, I2C, …)
hard realtime, distributed, QoS, up to 1kHz, 5MB
“optimal” transport, distributed, QoS, ~10Hz, up to 1GB
“fast” transport, distributed
<10Hz, <10MBbits/bytes &
simple structsnested static
structs & arraysnested static structs
& dynamic arraysflexible, recursive
data types & program snips
languagehard realtime, small footprint,
HDL
hard realtime, efficient, parallel
efficient on large data, parallel, OOP
parallel, high level (functional,
declarative, …)
C/C++ Racket
rt-stack Racket-stackaRDx
• Simulink,
• Simulink Coder -> HIL (Hardware in the Loop)!
• HDL Coder
• Stateflow
• Control System Toolbox
• Signal Processing Toolbox
• Image Processing Toolbox
• Optimization Toolbox
• Computer Vision System Toolbox
• Symbolic Math Toolbox
• SimMechanics, Sim…
domainlow level driver, joint controller
robot controller, sensor
preprocessing
system modeling, data analysis
world modeling, path planning
“AI” (logic planner, cognitive model)
MathWorks Toolchainhas much to offer for wide range of domains!
• model-based design of complex controllers
• automatic realtime (!) code generation -> HIL (Hardware-In-the-Loop)
• signal and image processing
• thorough numerics
• data analysis & visualization
• simulation of complex mechatronic systems
• …
domainlow level driver, joint controller
robot controller, sensor
preprocessing
system modeling, data analysis
world modeling, path planning
“AI” (logic planner, cognitive model)
computermicrocontroller/
FPGArealtime PC
(QNX)workstation
(Linux)CPU/GPGPU cluster
(Linux)cluster, internet
server cloud
communi-cation
hardware bus (SPI, I2C, …)
hard realtime, distributed, QoS, up to 1kHz, 5MB
data logging “optimal” transport, distributed, QoS, ~10Hz, up to 1GB
“fast” transport, distributed
<10Hz, <10MBbits/bytes &
simple structsnested static
structs & arraysnumerical data
(structs and arrays)
nested static structs & dynamic arrays
flexible, recursive data types &
program snips
languagehard realtime, small footprint,
HDL
hard realtime, efficient, parallel
scripting, numerics &
computer algebra
efficient on large data, parallel, OOP
parallel, high level (functional,
declarative, …)
C/C++ Racket
rt-stack Racket-stackaRDx
Simulink/Coder MatlabMathWorks!Toolchain
• problem: How to bring to robotic system with >50DOF & distributed computation?
• -> binding to aRDx
features • automatic S-Function block generation
• packets with 1D-arrays of basic types(future: multi-arrays and nested structs)
• supports interpreted Simulink and generated code on realtime target
• channel specification in S-Function parameter
Simulink - aRDx Binding
#lang racket/base (require generator/simulink) (require robot-packet) !(ardx-simulink-bridge #:ardx->simulink robot_state_packet #:simulink->ardx robot_control_packet)
robot-sfun.rkt
#lang generator/idesc !(define-gstruct robot_state_packet ([status gint] [angles (garray gdouble 19)] [torques (garray gdouble 19)])) !(define-gstruct robot_control_packet ([torques (garray double 19)]))
robot-packets.rktrobot-sfun.c robot-packets.h
status
angles
torques
1
19
19
19
torques
robot-sfun
racket mex
Simulink Coder (+model)
realtime executable
vision
world model
dev camera GUI
EC path planning
dev input
device
3D viewer
Monitoring Configuration
torso dev
state machine
arm dev
hand dev
Linux
Linux Linux
Linux
VxWorks
VxWorks
Linux/ Windows
Edit Compile Debug
…
QNX
REALTIME
CPU 1 CPU 2
Stateflow
interpreted Simulink
domainlow level driver, joint controller
robot controller, sensor
preprocessing
system modeling, data analysis
world modeling, path planning
“AI” (logic planner, cognitive model)
computermicrocontroller/
FPGArealtime PC
(QNX)workstation
(Linux)CPU/GPGPU cluster
(Linux)cluster, internet
server cloud
communi-cation
hardware bus (SPI, I2C, …)
hard realtime, distributed, QoS, up to 1kHz, 5MB
data logging “optimal” transport, distributed, QoS, ~10Hz, up to 1GB
“fast” transport, distributed
<10Hz, <10MBbits/bytes &
simple structsnested static
structs & arraysnumerical data
(structs and arrays)
nested static structs & dynamic arrays
flexible, recursive data types &
program snips
languagehard realtime, small footprint,
HDL
hard realtime, efficient, parallel
scripting, numerics &
computer algebra
efficient on large data, parallel, OOP
parallel, high level (functional,
declarative, …)
C/C++ Racket
rt-stack Racket-stackaRDx
Simulink/Coder MatlabMathWorks!Toolchain
• problem: How to bring to robotic system with >50DOF & distributed computation?
• -> binding to aRDx
Matlab - Racket Binding
import racket.* require(‘my-module’) !v = to_racket([1 2 3 4], ‘vector’) sv = racket(‘scale’, 0.2, v) robots_id = racket(‘make-robots’, 5, sv) robots = from_racket(robots_id) q = s(3).arm.joint_angle(2)
command
result
aRDx channels
RAM-Disk MAT-Files
Matlab
Racket!VM
features • require (load) any Racket module
• obeys Matlab’s lexical scope
• call any racket function or macro
• transparent conversion of basic types
• references to any Racket objects
• explicit conversion to/from Racket for arbitrarily nested arrays, structs implementation
• Racket process for each Matlab process
• command/result (basic) by aRDx channels
• complex Matlab data by MAT-Files
(define (scale s v) (vector-map (lambda (e) (* e s) v) !(define (make-robots num v) (for/vector ([i num]) (dot (Robot) ‘arm ‘joint-angle v)))
my-module.rkt
domainlow level driver, joint controller
robot controller, sensor
preprocessing
system modeling, data analysis
world modeling, path planning
“AI” (logic planner, cognitive model)
computermicrocontroller/
FPGArealtime PC
(QNX)workstation
(Linux)CPU/GPGPU cluster
(Linux)cluster, internet
server cloud
communi-cation
hardware bus (SPI, I2C, …)
hard realtime, distributed, QoS, up to 1kHz, 5MB
data logging “optimal” transport, distributed, QoS, ~10Hz, up to 1GB
“fast” transport, distributed
<10Hz, <10MBbits/bytes &
simple structsnested static
structs & arraysnumerical data
(structs and arrays)
nested static structs & dynamic arrays
flexible, recursive data types &
program snips
languagehard realtime, small footprint,
HDL
hard realtime, efficient, parallel
scripting, numerics &
computer algebra
efficient on large data, parallel, OOP
parallel, high level (functional,
declarative, …)
C/C++ Racket
rt-stack Racket-stackaRDx
Simulink/Coder MatlabMathWorks!Toolchain
• problem: How to bring to robotic system with >50DOF & distributed computation?
• -> binding to aRDx
Example: Automatic Calibration of a Multisensorial Upper Body of a Humanoid Robot
Example: Automatic Calibration of a Multisensorial Upper Body of a Humanoid Robot
• Simulink model for HIL control of full humanoid
• Matlab script for flow control of the calibration
• during recording (commanding robot movements, start/stop of recorders, ..)
• batch processing of recorded data
• Matlab for processing data and optimization
• Image Processing Toolbox (marker detection)
• Signal Processing Toolbox (noise filtering)
• Matlab based MTK - Manifold ToolKit (openslam.org/MTK.html) (model fit)
• Optimization Toolbox (finding optimal robot configurations)
-> whole application can be implemented in Matlab/Simulink universe • fast development
• dramatically less error-prone than, e.g., C/C++
• numerically rock-solid
Conclusions
• advanced robotic systems span wide range of software domains “from hardware driver to artificial intelligence”
• DLR’s new aRDx software framework tackles this with
• two communication stacks: static & realtime — flexible & fast
• Racket as language
• MathWorks Toolchain could cover large part of domains
• tight binding of MathWorks Toolchain to aRDx allows use on advanced robotics systems (>50DOF, distributed computing, …)
• Simulink & Coder to aRDx: hard realtime
• Matlab to Racket: flexible, all functionality accessible
• MathWorks Toolchain and aRDx perfect fit esp. in research:
• rapid prototyping
• even students can work with complex robots because they can often stay completely in the MathWorks universe (known from their studies)
"It shows off … capabilities of what's arguably … the most, capable dual-armed mobile humanoid robots in existence.” (IEEE Spectrum 2014)
Best Video Award ICRA 2014
“aRDx/Racket & MathWorks Tools
inside”
Top Related