Big Data in Production Environments
Transcript of Big Data in Production Environments
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt1
Proposal for establishing modern concepts
of data storage and analytics to production data
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt2
Current situation
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt3
Current situation
Just the Numbers
✔ Approx. 270.000 sensors in one installation (AUDI Györ)(but only 17.000 sensures are currently tracked)
✔ Lots of 'unsynchronized' control desks, respective their data
✔ Lot of duplicated data(because of the 'home-grown' failover/replication concept)
✔ No historical data(because the amount of data is overwhelming and can't be handeled)
✔ Problems with scalability
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt4
Current situation
Outdated technologies
✔ Trend Server is developed in Delphi: Who develops in that?
✔ Microsoft SQL Server: not fast enough
✔ Technological breaches between several technologies
Bottlenecks
✔ Query slow for mor than 750k events
✔ No more than 7500 CSV files
✔ CSV & SQL server for the same tasks
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt5
Current situation
Scalability and fault tolerance
✔ Only few sensor can be saved
✔ IOM synchronization problems
✔ Buffered data saved with the same timestamp
✔ Different IOM saved same data with different timestamps
No integration / standalone application
✔ Data can not be accessed from every place (control desk)
✔ Data can not be recorded in case of failure
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt6
Big data and NoSQL
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt7
Why the relational model
… sometimes isn't enough
✔ Can't handle extremely large data amounts (in extreme 15 Petabyte data in Gov. Of India)
✔ Hard to scale (esp. scaleing out adding nodes to handle the load)→
✔ Hard to deal with 'unstructured' data due to strict data model
✔ The valuable transactional model sometimes is an overkill
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt8
Dealing with data
… awfull lots of data
… petabytes
… and even behind this
plus NoSQL
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt9
Dealing with data
Characteristics and value proposition
✔ Big Data gains momentum (data generates value)
➢ High data velocity
➢ Data variety
➢ Data Volume
➢ Data complexity
✔ Continuous availability
✔ Data location independence
✔ Flexible data model (schemaless databases)
✔ Improved architecture and enhanced analytics
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt10
Problems with NoSQL
Document - oriented MongoDBCouchDB
Column Store Big TableHBase
Key-Value CassandraDynamoDBAzure Table StorageRiakBerkeleyDB
Graph Neo4J
Many players, several concepts, no one size fits all approach and no standards
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt11
Why Cassandra?
Because people with the same problems have chosen it ...
“I can create a Cassandra cluster in any region of the world in 10 minutes. When marketing decides we want to move into a certain part of the world, we’re ready.”
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt12
Why Cassandra
Scalability
✔ Add nodes to scale
✔ Millions operations
✔ Low latency in read/write operations
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt13
Why Cassandra
Availability
✔ Created to be distributed
✔ Resistant and flexible to failures
✔ Different data centers (probably in different parts of the world)
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt14
Why Cassandra
Replication
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt15
Why Cassandra
Sometimes things go wrong:
✔ Hardware fails
✔ Bug
✔ Power
✔ Natural disaster
and then...
✔ Fast node recovery
✔ Auto-Balancing when a node fails
✔ Transparent to the client
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt16
Why Cassandra
Easy to use
✔ Large ecosystem
✔ Well documented
✔ Full Java support
✔ SQL-like syntax
INSERT INTO sensor_by_day(sensor_id,date,event_time,value)
VALUES (’1234ABCD’,’2013-04-03′,’2013-04-03 07:01:00′,’72F’);
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt17
Time Series in Cassandra
Cassandra can store up to 2 billion columns per row, but if we’re storing data every millisecond you wouldn’t even get a month’s worth of data.
The solution is to use a pattern called row partitioning by adding data to the row key to limit the amount of columns you get per device.
Almost no limits!
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt18
Data analysis goals
✔ Low latency (interactive) queries on historical data: enable faster decisions
✔ Low latency queries on live data (streaming): enable decisions on real-time data
✔ Sophisticated data processing: enable “better” decisions (e.g. anomaly detection, trend analysis)
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt19
Spark ecosystem
Well integrated with Cassandra and includes:
✔ SQL-like interface
✔ Machine learning:Algorithms that can learn from data, used for predictions (predictive maintenance: exploit patterns found in historical and transactional data to identify risks and opportunities)
✔ Streaming:Real-time streaming data likesliding windows
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt20
Use Cases
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt21
Use Case
✔ Data from Oven will be collected
✔ Cassandra stores sequentially
✔ TrendPage reads sequentially for faster graphic creation.
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt22
Use Case
Data Model to support queries
✔ Store data per oven
✔ Store time series in order: first to last
✔ Get all data for one oven
Queries needed
✔ Get data for a single date and time
✔ Get data for a range of dates and times
Cassandra is really good for time-series data because you can write one column for each period in your series and then query across a range of time using sub-string matching.
This is best done using columns for each period rather than rows, as you get huge IO efficiency wins from loading only a single row per query.
– MyDrive Telemetry (15 billion records on average)
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt23
Time Series in Cassandra
The data model
✔ Row Key is Time Identifier
✔ Column Values are Events
✔ Columns Values are Measurements
✔ Rows Can be Very Wide
1 s Schema
Faster data storage in database
1 min Schema
Avoid networks overloadsData can be compressed (prior to sending)Extra data like min, max, avg can be calculated before stored.Increment retrieving data speed.
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt24
Architectual options
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt25
Architectual Options
Unreplicated databases
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt26
Architectual Options
Redundant and replicated databases
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt27
Architectual Options
Replicated databases plus analytics
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt28
What is next
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt29
Discussion
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt30
Enough propaganda ... Get in touch!
Contact information:
Brockhaus Consulting GmbHGustav Stresemann Ring 1D - 65189 WiesbadenGermany
Fon: +49-611-97774-332Fax: +49-611-97774-432
Web: www.brockhaus-gruppe.de Mail: [email protected]