OpenStack monitoring - Unidata S.p.A. Case Report

34
OpenStack and Monitoring Unidata S.p.A. case report Davide Guerri - Unidata S.p.A. - [email protected]

Transcript of OpenStack monitoring - Unidata S.p.A. Case Report

  • 1. OpenStack and Monitoring Unidata S.p.A. case report Davide Guerri - Unidata S.p.A. - [email protected]

2. Agenda What is Unidata S.p.A.? (Cloud) monitoring OpenStack Monitoring Unidata case report 3. Unidata S.p.A. established in 1985 pioneer of microcomputer technology in Italy today one of the most important ISPs PoP at NaMeX, MiX, AMS-IX large ber infrastructure (Rome and province of Rome) a large number of WiFi installations (based on the OpenWISP project) also for the Italian PA institutional partners AIIP - rst Italian ISPs Association - founder and member, 1995 NaMeX - Internet exchange and interconnection point - founder and member, 1995 strong vocation for innovation (making signicant investments in R&D) 4. Unidata S.p.A. since 2012 - public and private cloud services UniCloud [3] - yep, its OpenStack! ;-) Folsom release Full access to OpenStack API (SSL) IPv6 enabled 5. Cloud Monitoring 6. Puppies vs Cattle 7. Puppies vs Cattle (crude) analogy that describes the most appropriate use of the cloud paradigm The servers in todays data center are like puppies theyve got names and when they get sick, everything grinds to a halt while you nurse them back to health -- Joshua McKenty, co-founder of Piston Cloud treat servers like cattle a single server should easily replaced it should be possible to (seamlessly) increment or decrement their number for a given application 8. Puppies vs Cattle ...not only for VMs... it also make sense for the bare-metal this also changes something for monitoring, doesnt it? 9. Cloud Monitoring for cloud monitoring weve got two points of view operators infrastructural monitoring end users cloud infrastructural resources (IaaS) monitoring (e.g. cloud servers monitoring) cloud services monitoring (SaaS/PaaS) 10. Cloud Monitoring in both cases: what to monitor? and with what purpose? availability - for proactive anomalies x efciency - for (proactive) capacity planning what is needed? alerting systems instantaneous measures historical data 11. OpenStack Monitoring 12. OpenStack Monitoring as of today (Grizzly release) there is no integrated and ready-to-use monitoring system [1] what about Ceilometer? general purpose measurement collector 13. OpenStack Monitoring Healthnmon (uses ceilometer) [2] inventory management alerts and notications utilization data (CPU, RAM, network, storage) for guests and hosts 14. ...meanwhile... those who already offer cloud services based on the OpenStack had to develop (semi-) ad-hoc solutions OpenStack is massively scalable... ...so also the monitoring system should be scalable the good news is that we have all the ingredients and they are free and open source ;-) 15. What to monitor? load average/ CPUs/RAM/swap/disk & network usage alerts based on absolute (and relative) thresholds health of storage resources logs analysis system integrity checks 16. What to monitor? OpenStack specic services availability and logs of the following nova-* glance-* cinder-* keystone horizon misc (dnsmasq, swift, rabbitmq) 17. Unidata S.p.A. case report 18. UniCloud UniCloud logical architecture - public cloud infrastructure 19. Monitoring - Operator p.o.v 20. UniCloud Monitoring Zenoss core, for infrastructural monitoring open source (GPLv2) SNMP and network protocol monitoring of applications, servers and network devices auto-discovery / auto-modeling crucial for automatizations (puppies vs cattle) just add the SNMP agent to the conguration of new nodes (e.g. with Puppet) 21. UniCloud - Zenoss core Web UI with events and infrastructure summary historical data browsing customizable reports real-time email or user- dened alerts simple integration with an SMS gateway 22. UniCloud Monitoring OpenStack/Systems logs swatch - email alerts for errors/anomalies logwatch - daily system status review system integrity (and security) smartmontools - health of hard drives with email notications rkhunter - daily systems status analysis and (eventual) alerting arpwatch - real-time ARP monitoring (detection of duplicate IPs) 23. Monitoring - User p.o.v 24. UniCloud Monitoring ad hoc monitoring system based on OpenStack API Collectd [5] collects, transfers and stores performance data of computers and network equipment modular architecture we used RRD, LibVirt, and network plugins free and open source (GPLv2) we wrote a patch for the LibVirt plugin - included since version 5.2 [6] 25. UniCloud Monitoring Front-end WEB-UI RoR (written from scratch) OpenStack ActiveResource - Ruby binding for OpenStack API by Unidata S.p.A. [7] 26. UniCloud Monitoring hypervisors acquire raw data from LibVirt (localhost) sends structured data to the collector collector receives data from the network (efciently) writes RRD les RoR application establishes a mapping between OpenStack cloud instances and RRD les (via API) renders performance graphs to fulll user requests (instances and timespans) 27. UniCloud Monitoring What gets monitored? all the measurements that the collectd LibVirt plugin makes available for each vCPU - utilization rate (%) for each network interface - pps, bps and eps (in+out) for each disks - bps and ops (read+write) with extra volumes from nova-volume (or cinder) 28. UniCloud Monitoring Does it scale? collectd is not a new product... it has proven itself to be very reliable and scalable its possible to use multiple collectors for HA (using multicast) or LB puppies vs cattle? automatic discovery of new cloud instances collectd installation and conguration should be made by means of a conguration management system (e.g. Puppet) 29. UniCloud Monitoring Collectd conguration example (/etc/collectd/collectd.conf) Collector Hypervisors 30. Some screenshots 31. Some screenshots 32. Some screenshots 33. Grazie per lattenzione Domande! 34. [1] OpenStack ofcial programs https://wiki.openstack.org/wiki/Programs [2] Ceilometer and Healthnmon https://wiki.openstack.org/wiki/Ceilometer/CeilometerAndHealthnmon [3] UniCloud http://unicloud.it [4] Zenoss http://zenoss.com [5] Collectd http://collectd.org [6] Collectd 5.2 changelog https://collectd.org/wiki/index.php/Version_5.2 [7] OpenStack ActiveResource https://github.com/Unidata-SpA/openstack_activeresource References