Software Data Planes: You Can’t Always Spin to Win · 2020. 3. 6. · Software Data Planes: You...
Transcript of Software Data Planes: You Can’t Always Spin to Win · 2020. 3. 6. · Software Data Planes: You...
This work is supported by the Semiconductor Research Corporation (SRC) and DARPA
@ADA_Centeradacenter.org
Software Data Planes:You Can’t Always Spin to Win
Hossein Golestani, Amirhossein Mirhosseini, Thomas F. WenischUniversity of Michigan
ACM Symposium on Cloud Computing (SoCC)November 22, 2019
• Virtual μs-scale computing era
• Service objectives
• High throughput
• Low average/tail latency
Software Data Planes: You Can’t Always Spin to Win
* Image credits: Mellanox, Intel
High-speed I/O*
Microservices
What’s Up in the Cloud?
Network function
virtualizationI/O virtualization
Server #1
Address
Translation
Routing
Server #2
Firewall
Load
Balancing
VM #1 VM #n…
2
• Then vs. now
• Kernel-bypass architectures (just a handful)
Software Data Planes: You Can’t Always Spin to Win
Andromeda [NSDI’18]
Arrakis [OSDI’14]
IX [OSDI’14]
mTCP [NSDI’14]
ReFlex [ASPLOS’17]
Shenango [NSDI’19]
Shinjuku [NSDI’19]
Snap [SOSP’19]
ZygOS [SOSP’17]
Kernel
CPU
CPU
…
I/O
I/O
…User app
CPU … CPU
Kernel
I/O … I/O
User app
Software Stacks: Under Revision
3
• Key mechanisms
• User-level shared queues
• Spin-polling cores
• Fast notification by cache coherence write signals
• Widely adopted in industry
Software Data Planes: You Can’t Always Spin to Win
SPDKSTORAGE PERFORMANCE DEVELOPMENT KIT
I/O
Software Data Planes
4
• An easy-to-use and fast model for communication and signaling
• But far from ideal, especially when scaled
• We show that spin-based data planes:
• Perform more work when there is less
• Are not scalable to many cores
• Are not scalable to many queues
• Are not well-suited for shared queues
Software Data Planes: You Can’t Always Spin to Win
Spin-polling: Not a Panacea
5
• Introduction to Software Data Planes
• Methodology
• Characterization of Software Data Plane Challenges
• Solution Directions
• Conclusion
Software Data Planes: You Can’t Always Spin to Win
Outline
6
• Setup• DPDK-based applications
• Skylake cores
• 100GbE Mellanox NIC
• Experiments
Inefficiencies of spin-polling
Lack of queue scalability
Impracticality of queue sharing
Software Data Planes: You Can’t Always Spin to Win
Methodology
7
1
2
3
• Polling “tax”• Body of poll loop
• Useless polling on idle queues (possibly causing cache misses)
• Affects throughput scalability with cores
Software Data Planes: You Can’t Always Spin to Win
While forever:
For each RX queue:
Read packets from RX queue;
If there are any packets:
Route packets using LPM*;
Send packets to TX queue(s);
* LPM: Longest Prefix Match
Inefficiencies of Spin-polling
Polling tax can be 20-28% of total CPU cycles even in 100% load
Core
… NIC
Port 1
… NIC
Port 2
8
(1)
(2)
(3)
(4)
(5)
(6)
1
2
3
• IPC (Instructions Per Cycle) of routing core at varying loads
Software Data Planes: You Can’t Always Spin to Win
IPC != Useful Work
0.00
0.25
0.50
0.75
1.00
1.25
1.50
1.75
2.00
2.25
2.50
2.75
0 5 10 15 20 25 30
IPC
of ro
uting c
ore
Routing throughput (Mpps)
1 queue
4 queues
8 queues
9
1
2
3
IPC decreases as load increases, resulting in
energy inefficiency, fast aging, and severe co-runner interference
• More (useless) instructions executed in lighter traffic
• Co-running:• Matrix mult
• Spin-based routing (0-100% load)
• Executed on:• SMT cores of a physical CPU
• Different physical CPUs
Software Data Planes: You Can’t Always Spin to Win
Effect on SMT Co-runner
Useless spinning wastes execution resources of an SMT co-runner
2.24
1.56 1.54
0.0
0.5
1.0
1.5
2.0
2.5
Not collocated CollocatedRouting-0%
CollocatedRouting-100%
IPC
of m
atr
ix m
ult
10
1
2
3
• Traffic flows spread among multiple queues
• Limited size of CPU caches: a performance antagonist
• Experiment• Forwarding packets by a single core
• Scaling up the number of queues
Software Data Planes: You Can’t Always Spin to Win
Lack of Queue Scalability
Core
… NIC
Port 1
… NIC
Port 2
11
1
2
3
• Round-trip latency of packet forwarding
• Light traffic (minimal queuing delay)
Software Data Planes: You Can’t Always Spin to Win
Latency is severely affected as queue heads fall out of L1/L2 caches
Effect on Latency
0
5
10
15
20
25
0 64 128 192 256 320 384 448 512
Avera
ge late
ncy (
μs)
Number of queues
12
1
2
3
• Balanced traffic: Passing through all queues
• Unbalanced traffic: Passing through only one queue
Software Data Planes: You Can’t Always Spin to Win
Effect on Peak Throughput
13
Cache misses not interleaved with transmits
severely hurt peak throughput in unbalanced traffic
0
5
10
15
20
25
30
35
40
0 64 128 192 256 320 384 448 512
Thro
ughput
(Mpps)
Total number of queues
Balanced
Unbalanced
1
2
3
• (a) Scale-out vs. (b) Scale-up queuing (shared queue)
• Scale-up queuing
• Strong theoretical merits
• Synchronization disadvantage
Software Data Planes: You Can’t Always Spin to Win
Scale-up Queuing Is Impractical
14
(a) (b)
Core
1
Core
n
……
Core
1
Core
n
…
1
2
3
• Processing hiccups cause head-of-line (HoL) blocking in scale-out
• Round-trip latency with 10 parallel cores(a) No hiccups
(b) 1μs processing hiccupwith 1% probability
Software Data Planes: You Can’t Always Spin to Win
Although effective in avoiding HoL blocking,
spin-polling in scale-up queuing saturates at lower loads
Scale-out vs. Scale-up
15
0
50
100
150
200
250
300
350
400
0 20 40 60
Avera
ge late
ncy (
μs)
Throughput (Mpps)
Scale-out
Scale-up0
50
100
150
200
250
300
350
400
0 20 40 60A
vera
ge late
ncy (
μs)
Throughput (Mpps)
Scale-out
Scale-up
(a) (b)
1
2
3
Software Data Planes: You Can’t Always Spin to Win
Future Data Planes
16
• QWAIT, a multi-address monitoring scheme• Inspired by x86 MWAIT
• Avoids polling tax, useless polling, and disruption to SMT co-runners
• Needs hardware support
• Programming model similar to select-case in Go
Software Data Planes: You Can’t Always Spin to Win
Solution Direction(s)
QWAIT (queue_set):
case queue_1:
process_queue_1();
…
case queue_n:
process_queue_n();
17
• Key mechanisms of software data planes• User-level shared queues
• Spin-polling cores
• Although easy-to-use and low-latency, software data planes have deficiencies, especially when scaled
• Using DPDK, we quantified these deficiencies:• Incurring polling overhead and useless work
• Not scalable to many cores/queues
• Not well-suited for scale-up queuing
Software Data Planes: You Can’t Always Spin to Win
Conclusion
18
Thank you!
Software Data Planes: You Can’t Always Spin to Win
Q & A
19