PFQ@ 10th Italian Networking Workshop (Bormio)

27
Running Monitoring Applica0ons on Accelerated Capture Engines Nicola Bonelli N. Bonelli, R.G Garroppo, L. Gazzarrini, S. Giordano, G. Procissi, F. Russo, G. Volpi

description

Running monitoring applications on accelerated capture engines

Transcript of PFQ@ 10th Italian Networking Workshop (Bormio)

Page 1: PFQ@ 10th Italian Networking Workshop (Bormio)

Running  Monitoring  Applica0ons  on  Accelerated  Capture  Engines  

Nicola  Bonelli  

N.  Bonelli,  R.G  Garroppo,  L.  Gazzarrini,  S.  Giordano,  G.  Procissi,  F.  Russo,  G.  Volpi  

Page 2: PFQ@ 10th Italian Networking Workshop (Bormio)

Agenda  

•  Capture  engines  overview  •  What’s  new  in  PFQ  (2.0)  

•  Accelerated  pcap  library  – PF_RING,  PF_RING+DNA,  NETMAP,  PFQ  

•  Pcap-­‐perf:  a  tool  for  benchmarking  pcap  apps  

•  Experimental  results  

Page 3: PFQ@ 10th Italian Networking Workshop (Bormio)

Speed  maXers…  

Page 4: PFQ@ 10th Italian Networking Workshop (Bormio)

Accelerated  Capture  Engine    

•  Linux  is  provided  with  a  default  capture  engine  –  the  PF_PACKET  socket  

•  Because  of  speed,  other  capture  engines  emerged:  –  2004:  PF_RING  

•  designed  for  single  core,  beXer  performance  than  the  then  PF_PACKET  

–  2011:  PFQ  •  first  to  address  mul0-­‐core  architecture  and  mul0-­‐queues  NICs  (Best  Paper  Award  @PAM2012)  

–  2012:  PF_RING-­‐DNA  •  accelerated  drivers  (Intel)  

–  2012:  NetMap  •  accelerated  drivers  (Intel,Broadcom)  (Best  Paper  Award  @Usenix  ATC’12)  

Page 5: PFQ@ 10th Italian Networking Workshop (Bormio)

…  but  what  happens  on  these  tracks?  

Page 6: PFQ@ 10th Italian Networking Workshop (Bormio)

What’s  new  in  PFQ  2.0  •  From  capture  engine  to  monitoring  framework…  •  Improved  performance  

–  ~14.8  Mpps  single  user-­‐space  thread  

•  Improved  features:  –  compliant  with  a  plethora  of  NICs:  pfq-­‐oma0c  – monitoring  groups  and  classes  –  in-­‐kernel  extensible  engine  for  packet  steering:  dispatching,  copying,  cloning,  filtering  

–  na0ve  bindings:  C,  C++11,  Haskell  (more  to  come)  –  per-­‐group  filtering:  BFP,  vlan  (un-­‐tagging)  –  pcap  library  

Page 7: PFQ@ 10th Italian Networking Workshop (Bormio)

Feature  comparison  PF_PACKET   PF_RING  5.x   PF_RING-­‐DNA   NETMAP  -­‐  0813   PFQ  2.0  

NIC   *   *,  PF-­‐AWARE  (Intel,  Broadcom)  

only  Intel  1/10G   Intel  1/10G,  forcedeth  

*    accelerated  

Driver  compat.   *   yes,  non  accel.   no   no   yes,  dynamic  

mul0-­‐core   -­‐   Hardware  (RSS)   Hardware  (RSS)   Hardware  (RSS)   Hw  RSS  +  sog  

mul0-­‐queue   yes  (poor)   yes   yes   yes   yes  

na0ve  binding   C   C   C   C   C,  C++11,  Haskell,  Java,  Python  

groups   -­‐   -­‐   -­‐   -­‐   yes  

class   -­‐   -­‐   -­‐   -­‐   yes  

concurrent  mon.   yes   yes   commercial  ?   -­‐   yes  

clustering   -­‐   yes   -­‐   -­‐   yes  (MT,  group)  

steering   -­‐   -­‐   commercial   -­‐   yes  (MT,  group)  

STM  state   -­‐   -­‐   -­‐   -­‐   work  in  progress  

Page 8: PFQ@ 10th Italian Networking Workshop (Bormio)

Feature  comparison  PF_PACKET   PF_RING  5.x   PF_RING-­‐DNA   NETMAP  -­‐  0813   PFQ  2.0  

Pcap  library   yes   yes   yes   buggy/incomplete   yes  

BPF  (filters)   yes  (MT)   yes  (MT)   yes  (user-­‐space)   -­‐   yes  (MT,  group)  

vlan  filters   -­‐   yes   yes  (hw  Intel)   -­‐   yes  (MT,  group)  

vlan  untagging   -­‐   -­‐     -­‐   -­‐   yes  (MT,  sog.)  

Intel  hw  filters   -­‐   yes   yes   -­‐   No  

bloom  filters   -­‐   -­‐   -­‐   -­‐   work  in  progress  

Page 9: PFQ@ 10th Italian Networking Workshop (Bormio)

Accelerated  PCAP  library  •  Pcap  library  is  the  standard  de-­‐facto  interface  for  packet  capture  •  Accelerated  capture  engines  provide  their  own  pcap  library:  

–  Both  PF_RING  and  PF_RING-­‐DNA  provide  a  complete  accelerated  version  

–  NetMap  provides  an  experimental  and  incomplete  pcap  support  •  BPF  is  missing  

•  PFQ  provides  a  complete  implementa0on  –  PFQ  C-­‐API  mapped  over  pcap  interface  wherever  possible,  

implemented  as  environment  variables  otherwise  –  Clustering  is  enabled  specifying  mul0ple  NICs  in  colon-­‐separated  

fashion,  steering  by  means  of  PFQ_STEER  variable  

PFQ_GROUP=10  PFQ_STEER=ipv4-­‐addr  tcpdump  –n  –i  eth2:eth3  PFQ_GROUP=10  PFQ_STEER=ipv4-­‐addr  tcpdump  –n  –i  eth2:eth3  

Page 10: PFQ@ 10th Italian Networking Workshop (Bormio)

Pcap-­‐perf  

•  Pcap-­‐perf  is  a  C++11  applica0on  designed  for  benchmarking  capture  engines  through  pcap  interfaces  

•  Support  for  mul0-­‐threads,  BPF  filter  and  plug-­‐ins:  

plug-­‐in   kind  

Null   packet  counter  

IP  checksum   light  CPU  computa0on  

MD5   CPU  computa0on  

SHA256   heavy  CPU  computa0on  

Bloom  Filter   memory  (linear)  

Protocol  Classifica0on   memory  tree  

TCP/UDP  flow  counter   memory  (std::unordered_set)  

Page 11: PFQ@ 10th Italian Networking Workshop (Bormio)

Test-­‐bed  and  measurements  

•  Intel  Xeon  6  cores  x5650  @2.67Ghz,  16G  Ram  +  Intel  82599  10G  (Debian  Wheezy)  •  Accelerated  drivers  

–  PF_RING:  ixgbe  3.11.33  PF_RING-­‐aware  –  PF_RING-­‐DNA:  ixgbe  3.10.16-­‐DNA  driver  –  Netmap:  ixgbe  driver  shipped  with  the  netmap  package  –  PFQ:  intel  ixgbe  3.11.33  vanilla,  recompiled  through  pfq-­‐oma0c  

•  Best  Interrupt  affinity  (MSI-­‐X)  –  4  or  5  kernel  threads  (NAPI)  bound  to  fixed  core  (RSS),  1  or  2  user-­‐space  threads  bound  to  

other  core(s)  

•  Traffic  is  generated  with  randomized  IP  addresses,  64/128  bytes  long  UDP  packets  –  using  both  PF_DIRECT  and  PF_RING-­‐DNA  

10 Gb link

mascara monsters

Page 12: PFQ@ 10th Italian Networking Workshop (Bormio)

Coun0ng  packets  is  useless  

(na0ve  speed)  

     uint64_t counter = 0;!! ! !for(;;)!! ! !{!

! ! !counter++;!! ! !}!

Page 13: PFQ@ 10th Italian Networking Workshop (Bormio)

1  thread  user-­‐space  (Intel  10G)  

Page 14: PFQ@ 10th Italian Networking Workshop (Bormio)

pcap  library  

Page 15: PFQ@ 10th Italian Networking Workshop (Bormio)

Pcap  library,  1  thread  counter  

Page 16: PFQ@ 10th Italian Networking Workshop (Bormio)

Pcap,  1  thread  counter,  BPF=udp  

Page 17: PFQ@ 10th Italian Networking Workshop (Bormio)

Pcap,  1  thread  counter,  BPF=hXp  ||  udp    

Page 18: PFQ@ 10th Italian Networking Workshop (Bormio)

pcap-­‐perf  

Page 19: PFQ@ 10th Italian Networking Workshop (Bormio)

pcap-­‐perf  

Page 20: PFQ@ 10th Italian Networking Workshop (Bormio)

pcap-­‐perf  with  BPF  =  udp  

Page 21: PFQ@ 10th Italian Networking Workshop (Bormio)

pcap-­‐perf  (2  threads)  

Page 22: PFQ@ 10th Italian Networking Workshop (Bormio)

tcpdump  

Page 23: PFQ@ 10th Italian Networking Workshop (Bormio)

tcpdump  –s  64  –i  dev  –w  /ramdisk/dump.pcap  ([email protected])  

Page 24: PFQ@ 10th Italian Networking Workshop (Bormio)

tcpdump  –s  138  –i  dev  –w  /ramdisk/dump.pcap  (100M@~8Mpps)  

Page 25: PFQ@ 10th Italian Networking Workshop (Bormio)

tcpdump  –i  dev  –w  /ramdisk/dump.pcap  vlan    (5  Gbps)  

Page 26: PFQ@ 10th Italian Networking Workshop (Bormio)

tcpdump  –i  dev  –w  /ramdisk/dump.pcap  ip  host  192.168.0.10  (voip  call)  

Page 27: PFQ@ 10th Italian Networking Workshop (Bormio)

Thanks  for  the  aXen0on!  

[email protected]