C l oud MapReduce: A MapReduce Implementation on top of a Cloud Operation System
SpatialHadoop:A MapReduce Framework for Spatial Data 汇报人:赵郁亮 2015-8-3 ICDE 2015.
-
Upload
joan-short -
Category
Documents
-
view
278 -
download
6
Transcript of SpatialHadoop:A MapReduce Framework for Spatial Data 汇报人:赵郁亮 2015-8-3 ICDE 2015.
SpatialHadoop:A MapReduce Framework
for Spatial Data
汇报人:赵郁亮 2015-8-3 ICDE 2015
Executive Summary
• Propose a full-fledged MapReduce framework with native support for spatial data.
• Propose a new system architecture with fourlayers:language,operations,mapreduce and storage layers.
• SpatialHadoop achieve orders of magnitude better performance than hadoop for spatial data processing.
Outline
• Introduction
• Related work
• SpatialHadoop Architecture
• Experiments
Introduction
• An explosion in the amounts of spatial data were produced by various devices such as smart phones,satellites,and medical devices.
• Hadoop was adopted as a solution for scalable processing of huge datasets in many applications,e.g.,machine learning ,graph processing and behavioral simulations.
• ESRI has released ‘GIS Tools on Hadoop’.
Introduction
• Parallel-Secondo
• MD-HBase
• Hadoop-GIS
• SpatialHadoop
Related work
• Specific spatial operations
R-tree construction
Range query
kNN query
All NN query
• Systems
Hadoop-GIS
MD-Hbase
Parallel-Secondo
SpatialHadoop Architecture
SpatialHadoop Architecture
• Language Layer(Pigeon)
Data types
Spatial functions
KNN query
SpatialHadoop Architecture
• Storage Layer(Indexing) Existing techniques for spatial indexing in
Hadoop
1) Build only
2 ) Custom on-the-fly indexing
3) Indexing in HDFS
SpatialHadoop Architecture
• Storage Layer(Indexing) Overview of Indexing in SpatialHadoop
SpatialHadoop Architecture
Index Building
1)Partitioning
Step1:Number of partitions.
Step2:Partitions boundaries.
Step3:Physical partitioning
2)Local Indexing
3)Global Indexing
SpatialHadoop Architecture
Grid file
SpatialHadoop Architecture
R-tree
SpatialHadoop Architecture
R+-tree
SpatialHadoop Architecture
• MapReduce Layer
SpatialHadoop Architecture
• Operations Layer Range QueryKNN
SpatialHadoop Architecture
• Operations Layer Spatial Join
Step1:Global join
Step2:Local join
Step3:Duplicate avoidance
Experiments
• DataSet
TIGER:spatial features in the US such as streets and rivers(60G).
OSM:OpenStreetMap(60G)
NASA:120 Billion(4.6 TB)
SYNTH:2 Billion(128 GB,uniform distribution)
• Experiment Environment
Amazon EC2 cluster of up to 100 nodes
Hadoop 1.2.1 on java 1.6
Experiments
• Evaluation Range Query
Experiments
• Evaluation Range Query
Experiments
• Evaluation KNN
Experiments
• Evaluation Spatial Join
Experiments
• Evaluation Index Creation