Oracle NoSQL DB技术分析与演进 及如何打通从SQL到NoSQL · NLP / Search Engine...
Transcript of Oracle NoSQL DB技术分析与演进 及如何打通从SQL到NoSQL · NLP / Search Engine...
Oracle Enterprise Architecture
Oracle NoSQL DB技术分析与演进 及如何打通从SQL到NoSQL
Software. Hardware. Complete
1
Oracle Technical Architecture
大数据技术 交易大数据
订单、物流、金融业务等
行为/交互大数据
用户日志、电话记录、GPS轨迹、
电表计费、传感器数据
非结构化数据处理
微博、舆情分析、搜索引擎等
大数据存储
交通视频存储、卫星图像、用户
详单、用户收藏、气象数据等
实时大数据
股票数据、实时交易、
航班调度、交通信息等 大数据整合 大数据共享 大数据开发 大数据挖掘 大数据业务
RDBMS
Hadoop / NoSQL
NLP / Search Engine Distributed File System
Streaming / Real Time Computing
• 从2010年开始,NoSQL获得大规模主流网站的使用。
• NoSQL与关系型数据库幵存,幵在特定场合比关系型数据库更有优势
• Oracle NoSQL DB是一个非常优秀的能够满足企业需求的产品
趋势
NoSQL趋势
Oracle DB与Oracle NoSQL DB的使用场景
简单数据管理
数据水平分布,永久在线
高幵发实时数据读写
使用低成本硬件
ERP
EAM
Inventory
Control
Accting
& Payroll
Process
Mgmt
Business
Analytics
CRM
…
Driver
Application
实时事件处理
分布式应用
在线游戏
…
手机数据管理
传感器数据
NoSQL数据库的分类和主流NoSQL数据库
NoSQL Databases 键值对 列式 文档类 图类
Oracle NoSQL DB* Cassandra MongoDB OrientDB
Voldemort* HBase CouchDB GraphDB
Tokyo Cabinet HyperTable RavenDB Neo4J
Redis Google BigTable Clusterpoint Server Infinite Graph
Riak ThruDB AllegroGraph
CitrusLeaf Terrastore
GenieDB* RaptorDB
Amazon Dynamo* SisoDB
Google LevelDB SimpleDB
(*) Built on top of Berkeley DB
存储选择 – HDFS/NoSQL/关系型数据库 HDFS Oracle NoSQL DB Oracle Database
文件系统 数据库 数据库
幵行扫描 索引存储 索引、缓存、内存技术
水平可扩展 水平可扩展 垂直/水平可扩展
没有结构 简单数据结构 复杂数据结构,丰富SQL处理能力
大数据量写 大数据量随机读写 高性能的OLTP事务处理
批量处理 实时大幵发的特定应用场景 通用SQL平台,多应用,ODBC/JDBC接口
功能简单 高速的数据get/put操作,灵活配置 安全,备份/恢复,数据生命周期管理,XML等
Oracle NoSQL数据库分布式大数据存储
节点 东部
节点 西部
节点 中部
NoSQL 驱劢程序
应用程序
NoSQL 驱劢程序
应用程序
读取
删除
读取
更新
•分布式高性能高幵发数据库
•简单Key-Value的编程模型
•可扩展易扩展的数据扩展方式
•无单点故障、高速读写、事务保障
•全面的数据综合整合和管理方案
•支持多数据中心部署
•可靠的商业软件和支持
Oracle NoSQL DB的Key-Value数据模型
Key-Value对 RDBMS
值:$800M
值:$500M
值:… …
主键: /Larry/Ellison
子键:/SYS
• 创新的 主键 + 子键 + 值 KV模型
• 键/值均可自定义
• 主键HASH计算为“索引”
• Java API 高效读写
Oracle NoSQL Database
Oracle NoSQL DB的数据模型和Schema 主键和次键
{ ("4c2209f9f3924d31102bd84a"), "name" : “David" }
{ ("4c2209fef3924d31102bd84b"), "x" : 3 }
{ ("4c220a42f3924d31102bd856"), "x" : 4, "j" : 1 }
Oracle NoSQL DB的数据建模 JSON 数据格式
为什么使用 Avro? 压缩,高效数据序列化
Hadoop中得到广泛使用
Schema模型 基于Avro JSON定义,可以使用DDL创建
数据模型
支持从JSON序列化/反序列化
Schema模型修改 模型修改简单方便
模型可版本话控制幵且对前端透明
{ ("4c2209f9f3924d31102bd84a"), "name" : “David" }
{ ("4c2209fef3924d31102bd84b"), "x" : 3 }
{ ("4c220a42f3924d31102bd856"), "x" : 4, "j" : 1 }
db.c1.find( { age : { $ne : 7 } } );
{ "_id" : ObjectId("4fb4af89afa87dc1bed94331"), "age" : 8, "length_1" : 30 }
{ "_id" : ObjectId("4fb4af8cafa87dc1bed94332"), "age" : 6, "length_1" : 30 }
> j = { name : “David" };
{"name" : “David"}
> db.things.save(j);
• 自劢把存储使用率过高的数据均衡到其他使用率不高的节点上去
• Shard数据和复制因子保持不变
提升性能
Oracle NoSQL DB的重新均衡节点存储使用率
Master-1 Master-2 Master-3
Represents a partition
Oracle NoSQL DB的RDF能力
• Unified content metadata for federated resources
• Validate semantic and structural consistency
社交媒体分析
Analyze social relations
using curated metadata
- Blogs, wikis, video
- Calendars, IM, voice
语义元数据层
Find related content & relations by navigating connected entities
“Reason” across entities
文本挖掘和 实体分析
Oracle NoSQL DB通过OLH加载数据到Oracle
Jdbc 0 Reduce
Jdbc 1 Reduce
SQL
OBIEE
Endeca
Copy Merge
Database
Output
Splits/per partition
Split 0,1
Map
Split 2,3
Map
Split n
Map
Sort
kvclient.jar
kvclient.jar
kvclient.jar Part
itio
n
D
Part
itio
n
D
R
Part
itio
n
D
R
R
Appli
cati
on
No
SQ
L D
riv
er
R
R
R
Input NoSQL
Oracle NoSQL DB 外部表数据集成
Oracle NoSQL DB 企业版
Oracle Database通过SQL 访问NoSQL Database 数据
Database
External Table
Metadata
Data
Dictionary
External Table
Queries
Access
Driver
Data Formatter Layer
NoSQL Driver
Column1 | Colum2 | ColumnN
Column1,| Colum2 | ColumnN
Column1,| Colum2 | ColumnN
Column1,| Colum2 | ColumnN
Column1,| Colum2 | ColumnN
CSV Output
1
Configuration
File/table *.dat Files
publish
2
3 4
5
• 1.25M ops/sec
• 2 billion records
• 2 TB of data
• 95% read, 5% update
• Low latency
• High Scalability
YCSB
Oracle NoSQL的性能 – 读+写
• 226K ops /sec
• 2 billion records
• Low latency
• Highly Scalable
YCSB
Oracle NoSQL的性能 – 写
0
1
2
3
4
0
50,000
100,000
150,000
200,000
250,000
6 (2x3) 12 (4x3) 24 (8x3) 30 (10x3)
Avera
ge L
ate
ncy (
ms)
Th
rou
gh
pu
t (o
ps/s
ec)
Cluster Size
Insert Throughput
Throughput (insert/sec) Write Latency (ms)
•1.6 billion records
•94K insert/sec
•25K read/update/sec
•Low latency
•Linear scalability
Benchmark - Disk
NoSQL Use Cases
互联网级别事务处理
High velocity, volume, variety, low information
density data capture
Web browsing, Shopping Carts, CDR processing,
Sensor data capture
最后一公里内容分发
Guaranteed low latency lookups for end-customers
Advertising, Product Recommendations, Online
Catalogs, Social Media
实时事件处理
Real time events trigger rule that perform low latency
lookups
Medical Monitoring, Factory Automation, Oil & Gas, Geo-
location
NoSQL for Fraud Scoring
Objectives
Solution
Benefits
Application Data Ingestion
Tra
nsa
ctio
n A
uth
ori
za
tio
n
Pro
ce
sso
r
Combine data sources for complex scoring
Detect, alert analyst with low latency
Handle burst seasonal transaction volumes
Oracle Coherence cluster for real time
transaction object management
Oracle NoSQL Database for fraud model
and customer profile management
Oracle Database for statistics and fraud
modeling-related data
Simple data model, flexible transactions
Scalable, Low Latency data management
Easy configuration and administration
Enterprise Support
NoSQL DB Driver
Financial Services coordinated theft prevention
Objectives
Solution
Benefits
Centralized view of customer data within
federated database environment
Dynamic, customer influence tactics
Oracle NoSQL database for central
repository of meta data for customer
activity, scheduling and “next generation
experience” events
Oracle database for financial data,
reservation and property management
Simple, flexible data format
Highly scalable with predictable performance
Enterprise support, technology commitment and
roadmap
NoSQL DB Driver
Event Scheduling
Application
Staff & End Customers
Customer Profiles
Customer Care & End
Customers
Reservation Systems
NoSQL for Customer Experience Management Brand enhancement and loyalty enrichment
Objectives
Solution
Benefits
Effective segmented advertizing platform
Improve revenue by increasing granularity
of market segmentation
Oracle NoSQL database for cookie
management and ad content lookup
Oracle database and Hadoop/MapReduce
for market segmentation analysis, ad
generation and recommendation
Oracle Database for complex analytics
Ease of management and administration
Scalability and predictable performance
Integrated storage and processing technologies
Enterprise support
Multi-Dimensional
Reporting
NoSQL DB Driver
Advertising Svr
Acquire, Analyze, Prepare Content Delivery
Business Users End Customers
NoSQL for Online Advertising Platform for real-time marketing
NoSQL for Social Online Betting
Objectives
Scalable in-play sports betting platform
Increase new business revenue
Improve operational efficiency
Solution
Match in-play bets with incoming events
Promote interaction between customers
Scale system with customers and events
Feeds MySQL database for revenue tracking
and operational reporting
James Anthony Chief Technology Officer
Passoker
“Oracle NoSQL Database enabled the rapid,
scalable processing of incoming XML, ensuring
high available and guaranteed event ordering.”
NoSQL DB MySQL
Accounting &
Operations
Event Capture
& Store Customers
Real-Time, In-Play Sports Betting
Providers
XML App
Challenge
Solution
NoSQL for Scalable PaaS
Provide special purpose application server
services to financial institutions
Provide cost competitive subscriptions
Benefits
Low latency application object persistence
Flexible data format and serialization techniques
Highly reliable data store that can scale as the number
of hosted apps and app objects grow
Oracle NoSQL database for unstructured data
capture and application object persistence
Oracle database for business analytics and
insight into the data collected
NoSQL DB Driver
Application
Challenge
Solution
NoSQL for Oracle Communications Mgmt
Improve billing and revenue management
Calculate charges for any service combo
Provide scalable CDR processing
NoSQL DB Driver Application Data Ingestion
OC
M R
ate
d E
ve
nt s P
roce
sso
r
Coherence cluster for real time event rating
NoSQL database for rated event persistence and
consumption by downstream systems
Coherence memory optimization using NoSQL
database for out of band data
Challenge
Solution
Benefits
Extend Coherence data caching to disk
Manage growth in data volume, 400M customers
Handle extreme TXN volumes with low latency
Always on and highly available