B42-BICTIB (1)
Transcript of B42-BICTIB (1)
-
8/13/2019 B42-BICTIB (1)
1/30
Submitted By:Vartika Jhindal (12609067)
Nikita Shukla (12609090)
-
8/13/2019 B42-BICTIB (1)
2/30
2B I G D A T A F O R D U M M I E S
-
8/13/2019 B42-BICTIB (1)
3/30
THE EVOLUTION OF DATA MANAGEMENTBig data is defined as any kind of data source that has at
least three shared characteristics:
Extremely large volumes of data
Extremely high velocity of data
Extremely wide variety of data
3B I G D A T A F O R D U M M I E S
-
8/13/2019 B42-BICTIB (1)
4/30
UNDERSTANDING THE WAVES OF MANAGINGDATA
4B I G D A T A F O R D U M M I E S
-
8/13/2019 B42-BICTIB (1)
5/30
The relational database management system (RDBMS) that imposed
structure and a method for improving performance
The relational model added a level of abstraction so that it was easier for
programmers to satisfy the growing business demands to extract value fromdata.
It filled a growing need to help companies better organize their data and be
able to compare transactions from one geography to another
It helped business managers who wanted to be able to examine
information such as inventory and compare it to customer order
information for decision-making purposes.
Wave1: Creating manageable data structure
The relational model offered an ecosystem of tools from a large number
of emerging software companies.
5B I G D A T A F O R D U M M I E S
-
8/13/2019 B42-BICTIB (1)
6/30
How would companies be able to transform their traditional data
management approaches to handle the expanding volume of
unstructured data elements?
Companies began to store unstructured data, vendors began to addcapabilities such as BLOBs (binary large objects).
Enter the object database management system (ODBMS).
6B I G D A T A F O R D U M M I E S
-
8/13/2019 B42-BICTIB (1)
7/30
Wave 2: Web and Content management
Enterprise Content Management systems evolved in the 1980s to provide
businesses with the capability to better manage unstructured data, mostlydocuments. In the 1990s with the rise of the web, organizations wanted to
move beyond documents and store and manage web content, images, audio,
and video.
Wave 3: Managing big data
Organizations typically would compromise by storing snapshots or subsets
of important information because the cost of storage and processing
limitations prohibited them from storing everything they wanted to
analyze.
7B I G D A T A F O R D U M M I E S
-
8/13/2019 B42-BICTIB (1)
8/30
DEFINING BIG DATABig data is not a single technology but a combination of old and new
technologies that helps companies gain actionable insight.
Big data is typically broken down by three characteristics:
Volume: How much data
Velocity: How fast that data is processed
Variety: The various types of data
8B I G D A T A F O R D U M M I E S
-
8/13/2019 B42-BICTIB (1)
9/30
BUILDING A SUCCESSFUL BIG DATAMANAGEMENT ARCHITECTURE
Beginning with capture, organize, integrate, analyze and act
Setting the architectural foundation
Performance matters
Traditional and advanced analytics
9B I G D A T A F O R D U M M I E S
-
8/13/2019 B42-BICTIB (1)
10/30
BEGINNING WITH CAPTURE ORGANIZEINTEGRATE ANALYZE AND ACT
Capture
Organize
IntegrateAnalyze
Act
10B I G D A T A F O R D U M M I E S
-
8/13/2019 B42-BICTIB (1)
11/30
SETTING THE ARCHITECTURAL FOUNDATIONHow much data will my organization need to manage today and in the
future?
How often will my organization need to manage data in real time or near
real time?
How much risk can my organization afford? Is my industry subject to strictsecurity, compliance, and governance requirements?
How important is speed to my need to manage data?
How certain or precise does the data need to be?
11B I G D A T A F O R D U M M I E S
-
8/13/2019 B42-BICTIB (1)
12/30
BIG DATA ARCHITECTURE
12B I G D A T A F O R D U M M I E S
-
8/13/2019 B42-BICTIB (1)
13/30
ALL THESE OPERATIONAL DATA SOURCES HAVESEVERAL CHARACTERISTICS IN COMMON
They represent systems of record that keep track of the critical data
required for real-time, day-to-day operation of the business.
They are continually updated based on transactions happening within
business units and from the web.
For these sources to provide an accurate representation of the business,
they must blend structured and unstructured data.
These systems also must be able to scale to support thousands of users on
a consistent basis.
13B I G D A T A F O R D U M M I E S
-
8/13/2019 B42-BICTIB (1)
14/30
PERFORMANCE MATTERSYour data architecture also needs to perform in concert with yourorganizations supporting infrastructure. Performance might also
determine the kind of database you would use.
Organizing data services and tools
Map Reduce, Hadoop, and Big Table
14B I G D A T A F O R D U M M I E S
-
8/13/2019 B42-BICTIB (1)
15/30
TRADITIONAL AND ADVANCED ANALYTICS
What does your business now do with all the data in all its forms totry to make sense of it for the business?
Analytical data warehouses and data marts
Big data analytics
Reporting and visualization
Big data applications
15B I G D A T A F O R D U M M I E S
-
8/13/2019 B42-BICTIB (1)
16/30
THE BIG DATA JOURNEYCompanies have always had to deal with lots of data in lots of forms.
The change that big data brings is what you can do with that
information.
If you have the right technology in place, you can use big data to
anticipate and solve business problems and react to opportunities.
With big data, you can analyze data patterns to change everything,
from the way you manage cities, prevent failures, conduct experiments,
manage traffic, improve customer satisfaction, or enhance product
quality.
16B I G D A T A F O R D U M M I E S
-
8/13/2019 B42-BICTIB (1)
17/30
17B I G D A T A F O R D U M M I E S
-
8/13/2019 B42-BICTIB (1)
18/30
STRUCTURED DATA
The term structured data generally refers to data that has adefined length and format.
This include numbers, dates, and groups of words and numbers
called strings
This kind of data accounts for about 20 percent of the data that
is out there.
Examples of structured data are (CRM) data, operational (ERP)
data, and financial data.
18B I G D A T A F O R D U M M I E S
-
8/13/2019 B42-BICTIB (1)
19/30
SOURCES OF DATAThe evolution of technology provides newer sources of structured
data being producedoften in real time and in large volumes.The sources of data are divided into two categories:
19
Data created by
machine Without human
intervention
Computer-or machine-
generated
Data generated by
human In interaction with
computers
Human-generated
B I G D A T A F O R D U M M I E S
-
8/13/2019 B42-BICTIB (1)
20/30
SOURCES OF BIG STRUCTURED DATASources ofstructured
data
Machinegenerated
Sensor dataWeblog
dataPoint of
sale dataFinancial
data
Humangenerated
Input dataClick-
screen dataGaming
related data
20B I G D A T A F O R D U M M I E S
-
8/13/2019 B42-BICTIB (1)
21/30
SOURCES OF BIG STRUCTURED DATAMachine-generated structured data can include:
Sensor data-Companies are interested in this for supply chainmanagement and inventory control.
Weblog data- When servers, applications, networks, and so onoperate, they capture all kinds of data about their activity.
Point of sale data- When the cashier swipes the bar code ofany product that you are purchasing, all that data associated
with the product is generated.
Financial data- Financial systems are operated based on
predefined rules that automate processes.
21B I G D A T A F O R D U M M I E S
-
8/13/2019 B42-BICTIB (1)
22/30
SOURCES OF BIG STRUCTURED DATAStructured human-generated data might include:
Input data- This is any piece of data that a human might inputinto a computer, such as name, age, income, and so on.
Click-stream data- Data is generated every time you click alink on a website.
Gaming related data- This can be useful in understandinghow end users move through a gaming portfolio.
22B I G D A T A F O R D U M M I E S
-
8/13/2019 B42-BICTIB (1)
23/30
UNSTRUCTURED DATA Unstructured data is data that does not follow a specified
format.
If 20 percent of the data available to enterprises is structured
data, the other 80 percent is unstructured.
The technology didnt really support doing much with it except
storing it or analyzing it manually.
23B I G D A T A F O R D U M M I E S
-
8/13/2019 B42-BICTIB (1)
24/30
SOURCES OF BIG UNSTRUCTURED DATASources of
unstructureddata
Machinegenerated
Satelliteimages
Scientific dataPhotographsand videos
Radar data
Humangenerated
Internal text Social media Mobile dataWebsitecontent
24B I G D A T A F O R D U M M I E S
-
8/13/2019 B42-BICTIB (1)
25/30
SOURCES OF BIG UNSTRUCTURED DATAMachine-generated unstructured data include:
Satellite images-This includes weather data or the data thatthe government captures in its satellite surveillance imagery.
Scientific data- This includes seismic imagery, atmospheric
data, and high energy physics. Photographs and videos- This includes security, surveillance,
and traffic video.
Radar or sonar data- This includes vehicular, meteorological,and oceanographic seismic profiles.
25B I G D A T A F O R D U M M I E S
-
8/13/2019 B42-BICTIB (1)
26/30
SOURCES OF BIG UNSTRUCTURED DATAHuman-generated unstructured data include:
Text internal to your company- Enterprise informationactually represents a large percent of the text information
in the world today.
Social media data- This data is generated from the social
media platforms such as YouTube, Facebook, etc. Mobile data- This includes data such as text messages
and location information.
Website content- This comes from any site deliveringunstructured content, like YouTube, Flickr, or Instagram.
26B I G D A T A F O R D U M M I E S
-
8/13/2019 B42-BICTIB (1)
27/30
REAL TIME REQUIREMENT Real-time approach is most relevant when the answer to a
problem is time sensitive and business critical.
The following are examples of when a company wants to use this
real-time data to gain a quick advantage:
Monitoring for an exception with a new piece of information Monitoring news feeds and social media to determine events
Changing your ad placement during a big sporting event
Providing a coupon to a customer
27B I G D A T A F O R D U M M I E S
-
8/13/2019 B42-BICTIB (1)
28/30
SYSTEMS RE L TIME C P BILITYFew things to consider regarding a systems capability to ingest
data, process it, and analyze it in real time:
Low latency
Scalability
Versatility
Native format
28B I G D A T A F O R D U M M I E S
-
8/13/2019 B42-BICTIB (1)
29/30
INTEGRATING DATA TYPES INTO A BIG DATAENVIRONMENT
It is necessary to integrate different sources of data.
This data may be coming from all internal, external, or both the
sources.
No business value would be there if we deal with a variety of
data sources as a set of disconnected silos of information. Components need to include are:
Connectors
metadata
29B I G D A T A F O R D U M M I E S
-
8/13/2019 B42-BICTIB (1)
30/30
30B I G D A T A F O R D U M M I E S