OS support for (flexible)high-speed networking
Herbert Bos Vrije Universiteit Amsterdam
uspace
kspace
nspace
u
k
n
• monitoring
• intrusion detection
• packet processing (NATs, firewalls, etc.)
• network processors
• multiple concurrent apps
• optical infrastructure
we’re hurting because of
● hardware problems● software problems● language problems● lack of flexibility● network speed may
grow too fast anyway● multiple apps problem
CPU MEM
NIC
What are we building?flows different
barriers
What are we building?
uspace
kernel
if (pkt.proto == TCP) then if (http) then scan webattacks; else if (smtp) scan spam; scan viruses fielse if (pkt.proto == UDP) then if (NFS traffic) mem = statistics (pkt); else scan SLAMMER;fi
• reconfigure when load changes• dynamic optical infrastructure:
- configure at startup time- construct datapaths
Application B
reduce copying● FFPF avoids both ‘horizontal’ and ‘vertical’
copies
Application A
U
K
‘filter’
- no ‘vertical’ copies- no ‘horizontal’ copies within flow group- more than ‘just
filtering’ in kernel
(e.g.,statistics)
minimise copying, context switching
flowgraph =
DAG
combine multiple requestscontrol
dataplane
different copying regimes
different buffer
regimes
endianness
new languages
applications
userspace
kernelspace
ixp2xxx FFPF
FFPF
host- NIC boundary
userspace-kernel boundary
MAPI
libpcap
FFPF toolkit
FFPF
Software structure
current status
● working for Linux PC with– NICs– IXP1200s and IXP2400s (plugged in)– remote IXP2850 (with poetic license)
● rudimentary support for distributed packet processor
● speed comparable to tailored solutions
Thank you!
http://ffpf.sourceforge.net/
bridging barriers
Three flavours
11
Three flavours of packet processing on IXP & host
• Regular- copy only when needed
- may be slow depending on access pattern
• Zero copy- never copy
- may be slow depending on access pattern
• Copy once - copy always
combining filters
(dev,expr=/dev/ixp0) > [ [ (BPF,expr=“…”) >(PKTCOUNT) ] | (FPL2,expr=“…”) ] > (PKTCOUNT)
Example 1: writing applications1. int main(int argc, char** argv) {
2. void *opaque_handle;
3. int count, returnval;
4. /** pass the string to the subsystem. It initializes and returns an opaque pointer */
5. opaque_handle = ffpf_open(argv[1], strlen(argv[1]));
6. if (start_listener(asynchronous_listener, NULL))
7. {
8. printf (WARNING,“spawn failed\n");
9. ffpf_close (opaque_handle); return -1;
10. }
11. count = ffpf_process(FFPF_PROCESS_FOREVER);
12. stop_listener(1);
13. returnval = ffpf_close(opaque_handle);
14. printf(,"processed %d packets\n",count);
15. return returnval;
16. }
Example 1: writing applications1. void* asynchronous_listener(void* arg) {
2. int i;
3. while (!should_stop()) {
4. sleep(5);
5. for (i=0;i<export_len;i++) {
6. printf("flowgrabber %d: read %d, written%d\n", i, get_stats_read(exported[i]), get_stats_processed(exported[i]));
7. while ((pkt = get_packet_next(exported[i]))) {
8. dump_pkt (pkt);
9. }
10. }
11. }
12. return NULL;
13. }
Example 2: FPL-2
• new pkt processing language: FPL-2
• for IXP, kernel and userspace• simple, efficient and flexible• simple example: filter all webtraffic
IF ( (PKT.IP_PROTO == PROTO_TCP) && (PKT.TCP_PORT == 80)) THEN RETURN 1;
• more complex example: count pkts in all TCP flows
IF (PKT.IP_PROTO == PROTO_TCP) THEN R[0] = Hash[ 14, 12, 1024]; M[ R[0] ]++;
FI
FPL-2● all common arithmetic and bitwise operations● all common logical ops● all common integer types
– for packet– for buffer ( useful for keeping state!)
● statements– Hash– External functions
● to call hardware implementations● to call fast C implementations
– If … then … else– For … break; … Rof– Return
Example application: dynamic ports
1. // R[0] stores no. of dynports found (initially 0)
2. IF (PKT.IP_PROTO==PROTO_TCP) THEN
3. IF (PKT.TCP_DPORT==554) THEN
4. M[R[0]]=EXTERN("GetDynTCPDPortFromRTSP",0,0);
5. R[0]++;
6. ELSE // compare pkt’s dst port to all ports in array – if match, return pkt
7. FOR (R[1]=0; R[1] < R[0]; R[1]++)
8. IF (PKT.TCP_DPORT == M[ R[1] ] ) THEN
9. RETURN TRUE;
10. FI
11. ROF
11. FI
12. FI
12. RETURN FALSE;
?
x
?
?
?
kernel
userspace
network card
efficient
● reduced copying and context switches● sharing data● flowgraphs: sharing computations
“push filtering tasks as far down the processing hierarchy as possible”
Network Monitoring● Increasingly important
– traffic characterisation, securitytraffic engineering, SLAs, billing, etc.
● Existing solutions:– designed for slow networks
or traffic engineering/QoS– not very flexible
● We’re hurting because of– hardware (bus, memory)– software (copies, context switches)
-process at lowest possible level-minimise copying -minimise context switching-freedom at the bottom
demand for solution:- scales to high link rates- scales in no. of apps- flexible
spread of SAPPHIRE in 30 minutes
HTTP RTSP
RTP
bytecount
generalised notion of flow
Flow: “a stream of packets that match arbitrary user criteria”
TCP SYN
UID 0
eth0
U TCP
UDP
IP“contains worm”
Flowgraph
UDP with CodeRed
Application B
reduce copying● FFPF avoids both ‘horizontal’ and ‘vertical’
copies
Application A
U
K
‘filter’
(device,eth0) | (device,eth1) -> (sampler,2) -> (FPL-2,”..”) | (BPF,”..”) -> (bytecount)
(device,eth0) -> (sampler,2) -> (BPF,”..”) -> (packetcount)Extensible
✔ modular framework✔ language agnostic✔ plug-in filters
BuffersO
O
O
O
OO O
W
R
● PacketBuf – circular buffer with N fixed-size slots– large enough to hold packet
● IndexBuf– circular buffer with N slots– contains classification result + pointer
BuffersO
O
O
O
OO O
WR
● PacketBuf – circular buffer with N fixed-size slots– large enough to hold packet
● IndexBuf– circular buffer with N slots– contains classification result + pointer
X
X
X
X
XO O
WR
Buffers
● PacketBuf – circular buffer with N fixed-size slots– large enough to hold packet
● IndexBuf– circular buffer with N slots– contains classification result + pointer
Buffer management what to do if writer catches
up with slowest reader?● slow reader preference
– drop new packets (traditional way of dealing with this)
– overall speed determined by slowest reader● fast reader preference
– overwrite existing packets– application responsible for keeping up
● can check that packets have been overwritten● different drop rates for different apps
O
O
O
O
OO O
R1
OO O
O
O
O
O
O
O
W
R1
Languages
● FFPF is language neutral● Currently we support:
– BPF– C– OKE Cyclone– FPL
• simple to use• compiles to optimised native code• resource limited (e.g., restricted FOR loop)• access to persistent storage (scratch memory)• calls to external functions (e.g., fast C functions or hardware assists)• compiler for uspace, kspace, and nspace (ixp1200)
IF (PKT.IP_PROTO == PROTO_TCP)THEN // reg.0 = hash over flow fields R[0] = Hash (14,12,1024)
// increment pkt counter at this // location in MBuf MEM[ R[0] ]++FI
IF (PKT.IP_PROTO == PROTO_TCP)THEN // reg.0 = hash over flow fields R[0] = Hash (14,12,1024)
// increment pkt counter at this // location in MBuf MEM[ R[0] ]++FI
packet sources
uspace
kspace
nspace
● currently three kinds implemented
-netfilter
-net_if_rx()
-IXP1200
● implementation on IXPs: NIC-FIX
-bottom of the processing hierarchy
-eliminates mem & bus bottlenecks
Network Processors
“programmable NIC”
zero copy
copy once
on-demand copy
Performance results
pkt loss:FFPF: < 0.5%
LSF: 2-3%
Performance results
pkt loss:LSF:64-75%
FFPF: 10-15%
regular copy copy once zero copy0
10
20
30
40
50
60
70
80
90
100
Copy Strategiesreference
drop
accept
pro
cess
ed (in
%)
Performance
Summary
concept of ‘flow’ generalised
copying and context switching minimised
processing in kernel/NIC complex programs + ‘pipes’
FPL: FFPF Packet Languages fast + flexible
persistent storage flow-specific state
authorisation + third-party code any user
flow groups applications sharing packet buffers
Thank you!
http://ffpf.sourceforge.net/
microbenchmarks
we’re hurting because of
● hardware problems● software problems● language problems● lack of flexibility● network speed may
grow too fast anyway● multiple apps problem
• kernel: too little, too often • vertical + horizontal copies• context switching
CPU MEM
NIC
• popular ones: not expressive• expressive ones: not popular• same for speed• no way to mix and match
• heterogeneous hardware• programmable cards, routers, etc.• no easy way to combine• build on abstractions that are too hi-level• hard to use as ‘generic’ packet
processor
• 10 40Gbps: what will you do
in 10 ns?• beyond capacity of single processor
Top Related