Hive sql的编译过程
-
Upload
chen-chun -
Category
Technology
-
view
685 -
download
22
description
Transcript of Hive sql的编译过程
![Page 1: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/1.jpg)
Hive sql的编译过程
Monday, 30 December, 13
![Page 2: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/2.jpg)
⺫⽬目录1. MapReduce实现Join Group By Distinct操作的基本原理
2. SQL转化为MapReduce的过程
(1) Antlr && ASTTree
(2) sql基本组成单元QueryBlock
(3) 逻辑操作符Operator
(4) 逻辑层优化器
(5) OperatorTree转化为MapReduce Job的过程
(6) 物理层优化器 MapJoin原理
3. 如何理解Hive执⾏行计划
Monday, 30 December, 13
![Page 3: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/3.jpg)
Join
useruser
uid name
1 apple
2 orange
orderorder
uid orderid
1 1001
1 1002
2 1003
select u.name, o.orderid from order o join user u on o.uid = u.uid;
Monday, 30 December, 13
![Page 4: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/4.jpg)
Join
useruser
uid name
1 apple
2 orange
orderorder
uid orderid
1 1001
1 1002
2 1003
Map
key value
1 <1,apple>
2 <1,orange>
key value
1 <2,1001>
1 <2,1002>
2 <2,1003>
select u.name, o.orderid from order o join user u on o.uid = u.uid;
Monday, 30 December, 13
![Page 5: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/5.jpg)
Join
useruser
uid name
1 apple
2 orange
orderorder
uid orderid
1 1001
1 1002
2 1003
Map
key value
1 <1,apple>
2 <1,orange>
key value
1 <2,1001>
1 <2,1002>
2 <2,1003>
ShuffleSort
key value
1 <1,apple>
1 <2,1001>
1 <2,1002>
key value
2 <1,orange>
2 <2,1003>
select u.name, o.orderid from order o join user u on o.uid = u.uid;
Monday, 30 December, 13
![Page 6: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/6.jpg)
Join
useruser
uid name
1 apple
2 orange
orderorder
uid orderid
1 1001
1 1002
2 1003
Map
key value
1 <1,apple>
2 <1,orange>
key value
1 <2,1001>
1 <2,1002>
2 <2,1003>
ShuffleSort
key value
1 <1,apple>
1 <2,1001>
1 <2,1002>
key value
2 <1,orange>
2 <2,1003>
Reduce
name orderid
apple 1001
apple 1002
name orderid
orange 1003
select u.name, o.orderid from order o join user u on o.uid = u.uid;
Monday, 30 December, 13
![Page 7: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/7.jpg)
Group By
citycity
rank isonline
A 1
A 1
select rank, isonline, count(*) from city group by rank, isonline;
citycity
rank isonline
A 1
B 0
Monday, 30 December, 13
![Page 8: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/8.jpg)
Group By
citycity
rank isonline
A 1
A 1
select rank, isonline, count(*) from city group by rank, isonline;
citycity
rank isonline
A 1
B 0
Map
key value
<A, 1> 2
key value
<A, 1> 1
<B, 0> 1
Monday, 30 December, 13
![Page 9: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/9.jpg)
Group By
citycity
rank isonline
A 1
A 1
select rank, isonline, count(*) from city group by rank, isonline;
citycity
rank isonline
A 1
B 0
Map
key value
<A, 1> 2
key value
<A, 1> 1
<B, 0> 1
ShuffleSort
key value
<A, 1> 2
<A, 1> 1
key value
<B, 0> 1
Monday, 30 December, 13
![Page 10: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/10.jpg)
Group By
citycity
rank isonline
A 1
A 1
select rank, isonline, count(*) from city group by rank, isonline;
citycity
rank isonline
A 1
B 0
Map
key value
<A, 1> 2
key value
<A, 1> 1
<B, 0> 1
ShuffleSort
key value
<A, 1> 2
<A, 1> 1
key value
<B, 0> 1
Reduce
rank isonline value
A 1 3
rank isonline value
B 0 1
Monday, 30 December, 13
![Page 11: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/11.jpg)
Distinctselect dealid, count(distinct uid) num from order group by dealid;
uid dealid
1 1001
2 1002
2 1001
uid dealid
1 1002
1 1002
2 1001
Monday, 30 December, 13
![Page 12: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/12.jpg)
Distinctselect dealid, count(distinct uid) num from order group by dealid;
uid dealid
1 1001
2 1002
2 1001
uid dealid
1 1002
1 1002
2 1001
Map
key valuepartition
Key
<1001, 1> 1 1001
<1002, 2> 1 1002
<1001, 2> 1 1001
key valuepartition
Key
<1002, 1> 1 1002
<1001, 2> 1 1001
Monday, 30 December, 13
![Page 13: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/13.jpg)
Distinctselect dealid, count(distinct uid) num from order group by dealid;
uid dealid
1 1001
2 1002
2 1001
uid dealid
1 1002
1 1002
2 1001
Map
key valuepartition
Key
<1001, 1> 1 1001
<1002, 2> 1 1002
<1001, 2> 1 1001
key valuepartition
Key
<1002, 1> 1 1002
<1001, 2> 1 1001
ShuffleSort
key value
<1001, 1> 1
<1001, 2> 1
<1001, 2> 1
key value
<1002, 1> 2
<1002, 2> 1
Monday, 30 December, 13
![Page 14: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/14.jpg)
Distinctselect dealid, count(distinct uid) num from order group by dealid;
uid dealid
1 1001
2 1002
2 1001
uid dealid
1 1002
1 1002
2 1001
Map
key valuepartition
Key
<1001, 1> 1 1001
<1002, 2> 1 1002
<1001, 2> 1 1001
key valuepartition
Key
<1002, 1> 1 1002
<1001, 2> 1 1001
ShuffleSort
key value
<1001, 1> 1
<1001, 2> 1
<1001, 2> 1
key value
<1002, 1> 2
<1002, 2> 1
Reduce
dealid num
1001 2
dealid num
1002 2
Monday, 30 December, 13
![Page 15: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/15.jpg)
Distinctselect dealid, count(distinct uid), count(distinct date) from order group by dealid;
uid dealid date
1 1001 1101
2 1001 1101
2 1001 1102
Monday, 30 December, 13
![Page 16: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/16.jpg)
Distinctselect dealid, count(distinct uid), count(distinct date) from order group by dealid;
uid dealid date
1 1001 1101
2 1001 1101
2 1001 1102
Mapkey value
partitionKey
<1001,1,1101> 1 1001
<1001,2,1101> 1 1001
<1001,2,1102> 1 1001
Monday, 30 December, 13
![Page 17: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/17.jpg)
Distinctselect dealid, count(distinct uid), count(distinct date) from order group by dealid;
uid dealid date
1 1001 1101
2 1001 1101
2 1001 1102
Mapkey value
partitionKey
<1001,1,1101> 1 1001
<1001,2,1101> 1 1001
<1001,2,1102> 1 1001
需要在Reduce阶段在内存中分对uid和date去重
Monday, 30 December, 13
![Page 18: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/18.jpg)
Distinctselect dealid, count(distinct uid), count(distinct date) from order group by dealid;
uid dealid date
1 1001 1101
2 1001 1101
2 1001 1102
Monday, 30 December, 13
![Page 19: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/19.jpg)
Distinctselect dealid, count(distinct uid), count(distinct date) from order group by dealid;
uid dealid date
1 1001 1101
2 1001 1101
2 1001 1102
Map
key valuepartition
Key
<1001,0,1> 1 1001
<1001,1,1101> 1 1001
<1001,0,2> 1 1001
<1001,1,1101> 1 1001
<1001,0,2> 1 1001
<1001,1,1102> 1 1001
Monday, 30 December, 13
![Page 20: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/20.jpg)
Distinctselect dealid, count(distinct uid), count(distinct date) from order group by dealid;
uid dealid date
1 1001 1101
2 1001 1101
2 1001 1102
Map
key valuepartition
Key
<1001,0,1> 1 1001
<1001,1,1101> 1 1001
<1001,0,2> 1 1001
<1001,1,1101> 1 1001
<1001,0,2> 1 1001
<1001,1,1102> 1 1001
只需要在Reduce阶段记录lastDealid, lastTag, lastuid, lastDate
Monday, 30 December, 13
![Page 21: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/21.jpg)
⺫⽬目录1. MapReduce实现Join Group By Distinct操作的基本原理
2. SQL转化为MapReduce的过程
(1) Antlr && ASTTree
(2) sql基本组成单元QueryBlock
(3) 逻辑操作符Operator
(4) 逻辑层优化器
(5) OperatorTree转化为MapReduce Job的过程
(6) 物理层优化器 MapJoin原理
3. Hive执⾏行计划
Monday, 30 December, 13
![Page 22: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/22.jpg)
Compile Workflow
Parser
SemanticAnalyzer
Logical Plan Gen
Logical Optimizer
Physical Plan Gen
Physical Optimizer
Monday, 30 December, 13
![Page 23: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/23.jpg)
Compile Workflow
Parser
SemanticAnalyzer
Logical Plan Gen
Logical Optimizer
Physical Plan Gen
Physical Optimizer
HiveQL
ASTTree
QB
OperatorTree
TaskTree
OperatorTree
TaskTree
Monday, 30 December, 13
![Page 24: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/24.jpg)
⺫⽬目录1. MapReduce实现Join Group By Distinct操作的基本原理
2. SQL转化为MapReduce的过程
(1) Antlr && ASTTree
(2) sql基本组成单元QueryBlock
(3) 逻辑操作符Operator
(4) 逻辑层优化器
(5) OperatorTree转化为MapReduce Job的过程
(6) 物理层优化器 MapJoin原理
3. Hive执⾏行计划
Monday, 30 December, 13
![Page 25: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/25.jpg)
Antlr• Antlr是⼀一种语⾔言识别的⼯工具
• 可以⽤用来构造领域语⾔言
• 只需要编写⼀一个语法⽂文件,定义词法和语法替换规则,Antlr完成了词法分析、语法分析、语义分析、中间代码⽣生成等过程
Monday, 30 December, 13
![Page 26: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/26.jpg)
如果需要对表达式做进⼀一步的处理,对表达式的运算结果求值,使⽤用 Antlr 可以有两种选择,第⼀一,直接在语法⽂文件中嵌⼊入动作,加⼊入代码⽚片段;第⼆二,使⽤用 Antlr 的抽象语法树语法,在语法分析的同时将⽤用户输⼊入转换成中间表⽰示⽅方式:抽象语法树,后续在遍历语法树的同时完成计算。
AST Tree
Monday, 30 December, 13
![Page 27: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/27.jpg)
Example SQL
Monday, 30 December, 13
![Page 28: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/28.jpg)
Sub Query
15
SemanticAnalyzer
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser
Monday, 30 December, 13
![Page 29: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/29.jpg)
Sub Query
15
SemanticAnalyzer
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser
1 1
Monday, 30 December, 13
![Page 30: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/30.jpg)
22
Sub Query
15
SemanticAnalyzer
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser
1 1
Monday, 30 December, 13
![Page 31: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/31.jpg)
From => AST
16
SemanticAnalyzer
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser
1.1
Monday, 30 December, 13
![Page 32: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/32.jpg)
From => AST
17
SemanticAnalyzer
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser
1.1
Monday, 30 December, 13
![Page 33: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/33.jpg)
Select => AST
18
SemanticAnalyzer
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser
1.2
Monday, 30 December, 13
![Page 34: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/34.jpg)
Select => AST
19
SemanticAnalyzer
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser
1.2
Monday, 30 December, 13
![Page 35: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/35.jpg)
Where
20
SemanticAnalyzer
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser
1.3
Monday, 30 December, 13
![Page 36: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/36.jpg)
Where => AST
21
SemanticAnalyzer
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser
1.3
Monday, 30 December, 13
![Page 37: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/37.jpg)
⺫⽬目录1. MapReduce实现Join Group By Distinct操作的基本原理
2. SQL转化为MapReduce的过程
(1) Antlr && ASTTree
(2) sql基本组成单元QueryBlock
(3) 逻辑操作符Operator
(4) 逻辑层优化器
(5) OperatorTree转化为MapReduce Job的过程
(6) 物理层优化器 MapJoin原理
3. Hive执⾏行计划
Monday, 30 December, 13
![Page 38: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/38.jpg)
QueryBlock
23
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
Analyzer
• QueryBlock : ⼀一条SQL的基本组成单元,包括三个部分:输⼊入源,计算过程,输出。
• 从AST Tree⽣生成QueryBlock的过程,就是从抽象语法树中找出所有的基本单元以及每个单元之间的关系的过程。每个基本单元创建⼀一个QB对象,将每个基本单元的不同操作转化为QB对象的不同属性。
Monday, 30 December, 13
![Page 39: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/39.jpg)
QueryBlock
23
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
Analyzer
• QueryBlock : ⼀一条SQL的基本组成单元,包括三个部分:输⼊入源,计算过程,输出。
• 从AST Tree⽣生成QueryBlock的过程,就是从抽象语法树中找出所有的基本单元以及每个单元之间的关系的过程。每个基本单元创建⼀一个QB对象,将每个基本单元的不同操作转化为QB对象的不同属性。
Monday, 30 December, 13
![Page 40: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/40.jpg)
QuueryBlock
24
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
Analyzer
Monday, 30 December, 13
![Page 41: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/41.jpg)
QuueryBlock
24
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
Analyzer
表名和别名的映射关系
Monday, 30 December, 13
![Page 42: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/42.jpg)
QuueryBlock
24
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
Analyzer
⼦子查询
⼦子查询
Monday, 30 December, 13
![Page 43: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/43.jpg)
QuueryBlock
24
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
Analyzer
QBExpr本意是表达QB的关系,但是⺫⽬目前只实现了Union
Monday, 30 December, 13
![Page 44: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/44.jpg)
QuueryBlock
24
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
Analyzer
Join ASTTree
Monday, 30 December, 13
![Page 45: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/45.jpg)
QuueryBlock
24
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
Analyzer
key=‘inclause-i’ value=ASTTree
Monday, 30 December, 13
![Page 46: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/46.jpg)
QuueryBlock
25
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
Analyzer
记录表的源数据
Monday, 30 December, 13
![Page 47: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/47.jpg)
AST Tree => QB
26
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
Analyzer
先序遍历AST Tree SemanticAnalyze#doPhase1
Monday, 30 December, 13
![Page 48: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/48.jpg)
1
AST Tree => QB
26
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
Analyzer
先序遍历AST Tree SemanticAnalyze#doPhase1
Monday, 30 December, 13
![Page 49: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/49.jpg)
1
AST Tree => QB
26
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
Analyzer
先序遍历AST Tree SemanticAnalyze#doPhase1
2
Monday, 30 December, 13
![Page 50: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/50.jpg)
1
AST Tree => QB
26
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
Analyzer
1. TOK_QUERY > 创建QB对象,循环递归⼦子节点
先序遍历AST Tree SemanticAnalyze#doPhase1
2
Monday, 30 December, 13
![Page 51: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/51.jpg)
1
AST Tree => QB
26
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
Analyzer
1. TOK_QUERY > 创建QB对象,循环递归⼦子节点
2. TOK_FROM > QB#aliasToTabs.put(alias, tabname); QB#aliases.put(alias, tabname); QBParseInfo#aliasToSrc.put(alias.toLowerCase(), ast);
先序遍历AST Tree SemanticAnalyze#doPhase1
2
Monday, 30 December, 13
![Page 52: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/52.jpg)
1
AST Tree => QB
26
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
Analyzer
1. TOK_QUERY > 创建QB对象,循环递归⼦子节点
2. TOK_FROM > QB#aliasToTabs.put(alias, tabname); QB#aliases.put(alias, tabname); QBParseInfo#aliasToSrc.put(alias.toLowerCase(), ast);
3. TOK_INSERT > 循环递归⼦子节点
先序遍历AST Tree SemanticAnalyze#doPhase1
2
Monday, 30 December, 13
![Page 53: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/53.jpg)
1
AST Tree => QB
26
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
Analyzer
1. TOK_QUERY > 创建QB对象,循环递归⼦子节点
2. TOK_FROM > QB#aliasToTabs.put(alias, tabname); QB#aliases.put(alias, tabname); QBParseInfo#aliasToSrc.put(alias.toLowerCase(), ast);
3. TOK_INSERT > 循环递归⼦子节点4. TOK_DESTINATION > QBParseInfo#nameToDest.put(“insclause-i”, astnode)
先序遍历AST Tree SemanticAnalyze#doPhase1
2
Monday, 30 December, 13
![Page 54: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/54.jpg)
1
AST Tree => QB
26
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
Analyzer
1. TOK_QUERY > 创建QB对象,循环递归⼦子节点
2. TOK_FROM > QB#aliasToTabs.put(alias, tabname); QB#aliases.put(alias, tabname); QBParseInfo#aliasToSrc.put(alias.toLowerCase(), ast);
3. TOK_INSERT > 循环递归⼦子节点4. TOK_DESTINATION > QBParseInfo#nameToDest.put(“insclause-i”, astnode) 5. TOK_SELECT > QBParseInfo#destToSelExpr.put(“insclause-i”, astnode);
destToAggregationExprs.put(“insclause-i”, astnode); destToDistinctFuncExprs.put(“insclause-i”, astnode);
先序遍历AST Tree SemanticAnalyze#doPhase1
2
Monday, 30 December, 13
![Page 55: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/55.jpg)
1
AST Tree => QB
26
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
Analyzer
1. TOK_QUERY > 创建QB对象,循环递归⼦子节点
2. TOK_FROM > QB#aliasToTabs.put(alias, tabname); QB#aliases.put(alias, tabname); QBParseInfo#aliasToSrc.put(alias.toLowerCase(), ast);
3. TOK_INSERT > 循环递归⼦子节点4. TOK_DESTINATION > QBParseInfo#nameToDest.put(“insclause-i”, astnode) 5. TOK_SELECT > QBParseInfo#destToSelExpr.put(“insclause-i”, astnode);
destToAggregationExprs.put(“insclause-i”, astnode); destToDistinctFuncExprs.put(“insclause-i”, astnode);
6. TOK_WHERE > QBParseInfo# destToWhereExpr.put(“insclause-i”, ast);
先序遍历AST Tree SemanticAnalyze#doPhase1
2
Monday, 30 December, 13
![Page 56: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/56.jpg)
1
AST Tree => QB
26
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
Analyzer
1. TOK_QUERY > 创建QB对象,循环递归⼦子节点
2. TOK_FROM > QB#aliasToTabs.put(alias, tabname); QB#aliases.put(alias, tabname); QBParseInfo#aliasToSrc.put(alias.toLowerCase(), ast);
3. TOK_INSERT > 循环递归⼦子节点4. TOK_DESTINATION > QBParseInfo#nameToDest.put(“insclause-i”, astnode) 5. TOK_SELECT > QBParseInfo#destToSelExpr.put(“insclause-i”, astnode);
destToAggregationExprs.put(“insclause-i”, astnode); destToDistinctFuncExprs.put(“insclause-i”, astnode);
6. TOK_WHERE > QBParseInfo# destToWhereExpr.put(“insclause-i”, ast);
先序遍历AST Tree SemanticAnalyze#doPhase1
QB1 \ QB2
2
Monday, 30 December, 13
![Page 57: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/57.jpg)
⺫⽬目录1. MapReduce实现Join Group By Distinct操作的基本原理
2. SQL转化为MapReduce的过程
(1) Antlr && ASTTree
(2) sql基本组成单元QueryBlock
(3) 逻辑操作符Operator
(4) 逻辑层优化器
(5) OperatorTree转化为MapReduce Job的过程
(6) 物理层优化器 MapJoin原理
3. Hive执⾏行计划
Monday, 30 December, 13
![Page 58: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/58.jpg)
Operator
28
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.
• 逻辑操作符,在Map阶段或者Reduce阶段完成单⼀一特定的功能。
• 常⻅见的Operator如:TableScanOperator SelectOperator FilterOperator JoinOperator GroupByOperator ReduceSinkOperator
• Map/Reduce阶段都由⼀一个OperatorTree组成。
• 流式的计算过程。每⼀一个Operator计算完成⼀一⾏行数据之后将数据传递给childOperator计算
• 某些Operator是⼀一个终结操作符TerminalOperator,标⽰示Map/Reduce阶段的结束。如FileSinkOperator将数据写⼊入⽂文件,标志当前阶段的结束。
• ReduceSinkOperator只可能出现在Map阶段,将Map端的字段组合序列化为Reduce Key/value, Partition Key。
Monday, 30 December, 13
![Page 59: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/59.jpg)
Operator
29
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.
• RowSchema表⽰示Operator的输出字段
• InputObjInspector outputObjInspector解析输⼊入和输出字段
• Hive每⼀一⾏行数据经过⼀一个Operator处理之后,会对字段重新编号,colExprMap被LogicalOptimizer⽤用来回溯字段名
• Operator所有运⾏行时需要的参数均保存在OperatorDesc中,OperatorDesc在提交任务前序列化到hdfs上,在MR Task执⾏行前从hdfs读取并反序列化
• Map阶段OperatorTree在hdfs上的位置在Job.getConf(“hive.exec.plan”) + “/map.xml”
Monday, 30 December, 13
![Page 60: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/60.jpg)
SemanticAnalyzer#genBodyPlan
QB => Operator Tree
30
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.
中序遍历QB SemanticAnalyzer#genPlan(QB qb)
SemanticAnalyzer#genPlan1. QB#aliasToSubq => 递归调⽤用genPlan()
2. QB#aliasToTabs => TableScanOperator3. QBParseInfo#joinExpr => QBJoinTree => ReduceSinkOperator + JoinOperator4. QBParseInfo#destToWhereExpr => FilterOperator5. QBParseInfo#destToGroupby => ReduceSinkOperator + GroupByOperator6. QBParseInfo#destToOrderby => ReduceSinkOperator + ExtractOperator7. ...
Monday, 30 December, 13
![Page 61: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/61.jpg)
QB2 : aliasToTabs => TableScanOperator
31
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.
TableScanOperator(“dim.user”) TS[0]TableScanOperator(“detail.usersequence_client”) TS[1]TableScanOperator(“fact.orderpayment”) TS[2]
QB#aliasToTabs {du=dim.user, c=detail.usersequence_client, p=fact.orderpayment}
Monday, 30 December, 13
![Page 62: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/62.jpg)
QBJoinTree
32
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.
Monday, 30 December, 13
![Page 63: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/63.jpg)
QB2 : QBParseInfo#joinExpr => QBJoinTree
33
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.
先序遍历joinExpr⽣生成QBJoinTree
Monday, 30 December, 13
![Page 64: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/64.jpg)
1
p / \c p
QB2
QB2 : QBParseInfo#joinExpr => QBJoinTree
33
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.
先序遍历joinExpr⽣生成QBJoinTree
Monday, 30 December, 13
![Page 65: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/65.jpg)
1
p / \c p
QB2
QB2 : QBParseInfo#joinExpr => QBJoinTree
33
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.
先序遍历joinExpr⽣生成QBJoinTree2
base / \ p du / \c p
QB1
Monday, 30 December, 13
![Page 66: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/66.jpg)
QB2 : QBJoinTree => RS + JOIN
34
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.
前序遍历QBJoinTreeTS=TableScanOperator RS=ReduceSinkOperator JOIN=JoinOperator
Monday, 30 December, 13
![Page 67: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/67.jpg)
QB2 : QBJoinTree => RS + JOIN
34
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.
前序遍历QBJoinTreeTS=TableScanOperator RS=ReduceSinkOperator JOIN=JoinOperator
base / \ p du / \c p
TS[c] TS[p]
Monday, 30 December, 13
![Page 68: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/68.jpg)
QB2 : QBJoinTree => RS + JOIN
34
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.
前序遍历QBJoinTreeTS=TableScanOperator RS=ReduceSinkOperator JOIN=JoinOperator
base / \ p du / \c p
TS[c] TS[p]TS[c] TS[p] | |RS[3] RS[4]
Monday, 30 December, 13
![Page 69: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/69.jpg)
QB2 : QBJoinTree => RS + JOIN
34
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.
前序遍历QBJoinTreeTS=TableScanOperator RS=ReduceSinkOperator JOIN=JoinOperator
TS[c] TS[p] | |RS[3] RS[4] \ / JOIN[5]
base / \ p du / \c p
TS[c] TS[p]TS[c] TS[p] | |RS[3] RS[4]
Monday, 30 December, 13
![Page 70: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/70.jpg)
QB2 : QBJoinTree => RS + JOIN
35
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.
前序遍历QBJoinTreeTS=TableScanOperator RS=ReduceSinkOperator JOIN=JoinOperator
Monday, 30 December, 13
![Page 71: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/71.jpg)
QB2 : QBJoinTree => RS + JOIN
35
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.
前序遍历QBJoinTreeTS=TableScanOperator RS=ReduceSinkOperator JOIN=JoinOperator
base / \ p du / \c p
TS[c] TS[p] | |RS[3] RS[4] \ / JOIN[5] TS[du]
Monday, 30 December, 13
![Page 72: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/72.jpg)
QB2 : QBJoinTree => RS + JOIN
35
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.
前序遍历QBJoinTreeTS=TableScanOperator RS=ReduceSinkOperator JOIN=JoinOperator
TS[c] TS[p] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7]
base / \ p du / \c p
TS[c] TS[p] | |RS[3] RS[4] \ / JOIN[5] TS[du]
Monday, 30 December, 13
![Page 73: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/73.jpg)
QB2 : QBJoinTree => RS + JOIN
35
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.
前序遍历QBJoinTreeTS=TableScanOperator RS=ReduceSinkOperator JOIN=JoinOperator
TS[c] TS[p] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7]
base / \ p du / \c p
TS[c] TS[p] | |RS[3] RS[4] \ / JOIN[5] TS[du]
TS[c] TS[p] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8]
Monday, 30 December, 13
![Page 74: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/74.jpg)
QB2 : genBodyPlan
QBParseInfo#destToWhereExpr > FilterOperatorFIL= FilterOperator SEL= SelectOperator
36
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.
Monday, 30 December, 13
![Page 75: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/75.jpg)
QB2 : genBodyPlan
QBParseInfo#destToWhereExpr > FilterOperatorFIL= FilterOperator SEL= SelectOperator
TS[c] TS[p] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8]
36
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.
Monday, 30 December, 13
![Page 76: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/76.jpg)
QB2 : genBodyPlan
QBParseInfo#destToWhereExpr > FilterOperatorFIL= FilterOperator SEL= SelectOperator
TS[c] TS[p] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8]
36
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.
TS[c] TS[p] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8] | FIL[9]
Monday, 30 December, 13
![Page 77: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/77.jpg)
QB2 : genBodyPlan
QBParseInfo#destToWhereExpr > FilterOperatorFIL= FilterOperator SEL= SelectOperator
TS[c] TS[p] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8]
36
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.
TS[c] TS[p] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8] | FIL[9]
TS[c] TS[p] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8] | FIL[9] | SEL[10]
Monday, 30 December, 13
![Page 78: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/78.jpg)
QB1 : genBodyPlanQBParseInfo#destToGroupby > ReduceSinkOperator + GroupByOperatorGBY= GroupByOperator
37
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.
Monday, 30 December, 13
![Page 79: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/79.jpg)
QB1 : genBodyPlanQBParseInfo#destToGroupby > ReduceSinkOperator + GroupByOperatorGBY= GroupByOperator
37
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.
TS[c] TS[p] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8] | FIL[9] | SEL[10]
Monday, 30 December, 13
![Page 80: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/80.jpg)
QB1 : genBodyPlanQBParseInfo#destToGroupby > ReduceSinkOperator + GroupByOperatorGBY= GroupByOperator
37
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.
TS[c] TS[p] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8] | FIL[9] | SEL[10]
TS[c] TS[p] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8] | FIL[9] | SEL[10] | SEL[11] | GBY[12]HashMode AGGR <
Monday, 30 December, 13
![Page 81: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/81.jpg)
QB1 : genBodyPlanQBParseInfo#destToGroupby > ReduceSinkOperator + GroupByOperatorGBY= GroupByOperator
37
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.
TS[c] TS[p] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8] | FIL[9] | SEL[10]
TS[c] TS[p] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8] | FIL[9] | SEL[10] | SEL[11] | GBY[12]HashMode AGGR <
TS[c] TS[p] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8] | FIL[9] | SEL[10] | SEL[11] | GBY[12] | RS[13]
Monday, 30 December, 13
![Page 82: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/82.jpg)
QB1 : genBodyPlanQBParseInfo#destToGroupby > ReduceSinkOperator + GroupByOperatorGBY= GroupByOperator
37
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.
TS[c] TS[p] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8] | FIL[9] | SEL[10]
TS[c] TS[p] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8] | FIL[9] | SEL[10] | SEL[11] | GBY[12]HashMode AGGR <
TS[c] TS[p] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8] | FIL[9] | SEL[10] | SEL[11] | GBY[12] | RS[13]
TS[c] TS[p] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8] | FIL[9] | SEL[10] | SEL[11] | GBY[12] | RS[13] | GBY[14]
Monday, 30 December, 13
![Page 83: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/83.jpg)
QB1 : genPostGroupByBodyPlan
38
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.
FS=FileSinkOperator
TS[c] TS[p] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8] | FIL[9] | SEL[10]
SEL[11] | GBY[12] | RS[13] | GBY[14] | SEL[15] | SEL[16] | FS[17]
QB2 QB1
Monday, 30 December, 13
![Page 84: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/84.jpg)
⺫⽬目录1. MapReduce实现Join Group By Distinct操作的基本原理
2. SQL转化为MapReduce的过程
(1) Antlr && ASTTree
(2) sql基本组成单元QueryBlock
(3) 逻辑操作符Operator
(4) 逻辑层优化器
(5) OperatorTree转化为MapReduce Job的过程
(6) 物理层优化器 MapJoin原理
3. Hive执⾏行计划
Monday, 30 December, 13
![Page 85: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/85.jpg)
Logical Optimizer
40
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
Optimizer
名称 作⽤用2) PredicatePushDown 谓词前置ColumnPruner 字段剪枝2) GroupByOptimizer Map端聚合
1) ReduceSinkDeDuplication合并线性的OperatorTree中partition/sort key相同的reduce
1) CorrelationOptimizer利⽤用查询中的相关性,合并有相关性的Job,HIVE-2206
2) SimpleFetchOptimizer 优化没有GroupBy表达式的聚合查询2) MapJoinProcessor MapJoin,提供hint
2) BucketMapJoinOptimizer BucketMapJoin
变换OperatorTree
Monday, 30 December, 13
![Page 86: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/86.jpg)
Logical Optimizer
40
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
Optimizer
名称 作⽤用2) PredicatePushDown 谓词前置ColumnPruner 字段剪枝2) GroupByOptimizer Map端聚合
1) ReduceSinkDeDuplication合并线性的OperatorTree中partition/sort key相同的reduce
1) CorrelationOptimizer利⽤用查询中的相关性,合并有相关性的Job,HIVE-2206
2) SimpleFetchOptimizer 优化没有GroupBy表达式的聚合查询2) MapJoinProcessor MapJoin,提供hint
2) BucketMapJoinOptimizer BucketMapJoin
变换OperatorTree
1) ⼀一个Job干尽可能多的事情/合并Job
Monday, 30 December, 13
![Page 87: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/87.jpg)
Logical Optimizer
40
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
Optimizer
名称 作⽤用2) PredicatePushDown 谓词前置ColumnPruner 字段剪枝2) GroupByOptimizer Map端聚合
1) ReduceSinkDeDuplication合并线性的OperatorTree中partition/sort key相同的reduce
1) CorrelationOptimizer利⽤用查询中的相关性,合并有相关性的Job,HIVE-2206
2) SimpleFetchOptimizer 优化没有GroupBy表达式的聚合查询2) MapJoinProcessor MapJoin,提供hint
2) BucketMapJoinOptimizer BucketMapJoin
变换OperatorTree
1) ⼀一个Job干尽可能多的事情/合并Job2) 减少shuffle数据量,甚⾄至不做Reduce
Monday, 30 December, 13
![Page 88: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/88.jpg)
PredicatePushDown
41
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
Optimizer
TS[c] TS[p] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8] | FIL[9] | SEL[10]
QB2
断⾔言判断提前
TS[p] | TS[c] FIL[18] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8] | SEL[10]
Monday, 30 December, 13
![Page 89: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/89.jpg)
NonBlockingOpDeDupProc
42
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
Optimizer
合并SEL-SEL 或者 FIL-FIL 为⼀一个Operator
SEL[11] | GBY[12] | RS[13] | GBY[14] | SEL[15] | SEL[16] | FS[17]
QB1
GBY[12] | RS[13] |GBY[14] |SEL[15] | FS[17]
Monday, 30 December, 13
![Page 90: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/90.jpg)
ReduceSinkDeDuplication
43
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
Optimizer
合并线性的相连的两个RSfrom (select key, value from src group by key, value) s select s.key group by s.key;
Monday, 30 December, 13
![Page 91: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/91.jpg)
ReduceSinkDeDuplication
43
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
Optimizer
合并线性的相连的两个RSfrom (select key, value from src group by key, value) s select s.key group by s.key;
TS |SEL |GBY | RS |GBY |SEL |GBY | FS
TS | RS |GBY |SEL | FS
Stage-1 Stage-2
Monday, 30 December, 13
![Page 92: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/92.jpg)
ReduceSinkDeDuplication
43
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
Optimizer
合并线性的相连的两个RSfrom (select key, value from src group by key, value) s select s.key group by s.key;
TS |SEL |GBY | RS |GBY |SEL |GBY | FS
TS | RS |GBY |SEL | FS
Stage-1 Stage-2
keypartition
Key
pRS key,value key,value
cRS key key
Monday, 30 December, 13
![Page 93: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/93.jpg)
ReduceSinkDeDuplication
43
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
Optimizer
合并线性的相连的两个RSfrom (select key, value from src group by key, value) s select s.key group by s.key;
TS |SEL |GBY | RS |GBY |SEL |GBY | FS
TS | RS |GBY |SEL | FS
Stage-1 Stage-2
pRS key完全包含cRS key,且排序顺序⼀一致pRS partitionkey完全包含cRS partitionkey
keypartition
Key
pRS key,value key,value
cRS key key
Monday, 30 December, 13
![Page 94: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/94.jpg)
ReduceSinkDeDuplication
43
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
Optimizer
合并线性的相连的两个RSfrom (select key, value from src group by key, value) s select s.key group by s.key;
TS |SEL |GBY | RS |GBY |SEL |GBY | FS
TS | RS |GBY |SEL | FS
Stage-1 Stage-2
pRS key完全包含cRS key,且排序顺序⼀一致pRS partitionkey完全包含cRS partitionkey
keypartition
Key
pRS key,value key,value
cRS key key
TS |SEL |GBY | RS |GBY |SEL |GBY | FS
Monday, 30 December, 13
![Page 95: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/95.jpg)
ReduceSinkDeDuplication
43
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
Optimizer
合并线性的相连的两个RSfrom (select key, value from src group by key, value) s select s.key group by s.key;
TS |SEL |GBY | RS |GBY |SEL |GBY | FS
TS | RS |GBY |SEL | FS
Stage-1 Stage-2
pRS key完全包含cRS key,且排序顺序⼀一致pRS partitionkey完全包含cRS partitionkey
keypartition
Key
pRS key,value key,value
cRS key key
TS |SEL |GBY | RS |GBY |SEL |GBY | FS
key : key, valuepartitionkey : key
Monday, 30 December, 13
![Page 96: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/96.jpg)
ReduceSinkDeDuplication
43
PhysicalPlan Gen.
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
Optimizer
合并线性的相连的两个RSfrom (select key, value from src group by key, value) s select s.key group by s.key;
TS |SEL |GBY | RS |GBY |SEL |GBY | FS
TS | RS |GBY |SEL | FS
Stage-1 Stage-2
pRS key完全包含cRS key,且排序顺序⼀一致pRS partitionkey完全包含cRS partitionkey
keypartition
Key
pRS key,value key,value
cRS key key
TS |SEL |GBY | RS |GBY |SEL |GBY | FS
key : key, valuepartitionkey : key
两个Job的numReduce数⺫⽬目是否⼀一致
Monday, 30 December, 13
![Page 97: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/97.jpg)
⺫⽬目录1. MapReduce实现Join Group By Distinct操作的基本原理
2. SQL转化为MapReduce的过程
(1) Antlr && ASTTree
(2) sql基本组成单元QueryBlock
(3) 逻辑操作符Operator
(4) 逻辑层优化器
(5) OperatorTree转化为MapReduce Job的过程
(6) 物理层优化器 MapJoin原理
3. Hive执⾏行计划
Monday, 30 December, 13
![Page 98: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/98.jpg)
MapReduceCompiler
45
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
• 对输出表⽣生成MoveTask
• 从OperatorTree的其中⼀一个根节点向下深度优先遍历
• ReduceSinkOperator标⽰示Map/Reduce的界限,多个Job间的界限
• 遍历其他根节点,遇过碰到JoinOperator合并MapReduceTask
• ⽣生成StatTask更新元数据
• 剪断Map与Reduce间的Operator
Monday, 30 December, 13
![Page 99: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/99.jpg)
R0 gen MoveTask & Fetch Task
46
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
GBY[12] | RS[13] | GBY[14] | SEL[15] | FS[17]
QB1
MapredLockWork[Stage-0]
Stage-0 Move Operator
Monday, 30 December, 13
![Page 100: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/100.jpg)
TS[p] | TS[c] FIL[18] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8] | SEL[10]
Begin Walk
47
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
QB2
toWalk[] {TS[c], TS[du], TS[p]}
Monday, 30 December, 13
![Page 101: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/101.jpg)
Begin Walk
48
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
opStack {}
TS[p] | TS[c] FIL[18] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8] | SEL[10]
QB2
toWalk[] {TS[c], TS[du], TS[p]}
Monday, 30 December, 13
![Page 102: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/102.jpg)
Begin Walk
49
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
opStack {TS[p]}
TS[p] | TS[c] FIL[18] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8] | SEL[10]
QB2
toWalk[] {TS[c], TS[du]}
Monday, 30 December, 13
![Page 103: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/103.jpg)
R1 GenMRTableScan1
50
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
toWalk[] {TS[du], TS[c]} opStack {TS[p]}
Monday, 30 December, 13
![Page 104: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/104.jpg)
R1 GenMRTableScan1
50
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
toWalk[] {TS[du], TS[c]} opStack {TS[p]}
"".join([t + "%" for t in opStack]) == “ TS%”
Monday, 30 December, 13
![Page 105: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/105.jpg)
R1 GenMRTableScan1
50
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
TS[p] | TS[c] FIL[18] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8] | SEL[10]
QB2
toWalk[] {TS[du], TS[c]} opStack {TS[p]}
"".join([t + "%" for t in opStack]) == “ TS%”
Monday, 30 December, 13
![Page 106: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/106.jpg)
R1 GenMRTableScan1
50
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
TS[p] | TS[c] FIL[18] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8] | SEL[10]
QB2
toWalk[] {TS[du], TS[c]} opStack {TS[p]}
"".join([t + "%" for t in opStack]) == “ TS%”
TS[p] | TS[c] FIL[18] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8] | SEL[10]
Stage-1 MapRedTask
Monday, 30 December, 13
![Page 107: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/107.jpg)
R2 GenMRRedSink1
51
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
toWalk[] {TS[du], TS[c]} opStack {TS[p], FIL[18], RS[4]}
Monday, 30 December, 13
![Page 108: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/108.jpg)
R2 GenMRRedSink1
51
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
toWalk[] {TS[du], TS[c]} opStack {TS[p], FIL[18], RS[4]}
"".join([t + "%" for t in opStack]) == “TS%.*RS%”
Monday, 30 December, 13
![Page 109: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/109.jpg)
R2 GenMRRedSink1
51
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
toWalk[] {TS[du], TS[c]} opStack {TS[p], FIL[18], RS[4]}
"".join([t + "%" for t in opStack]) == “TS%.*RS%”
Stage-1 MapTask
TS[p] | TS[c] FIL[18] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8] | SEL[10]
Monday, 30 December, 13
![Page 110: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/110.jpg)
R2 GenMRRedSink1
51
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
toWalk[] {TS[du], TS[c]} opStack {TS[p], FIL[18], RS[4]}
"".join([t + "%" for t in opStack]) == “TS%.*RS%”
Stage-1 MapTask TS[p] | TS[c] FIL[18] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8] | SEL[10]
Stage-1 ReduceTask
Stage-1 MapTask
TS[p] | TS[c] FIL[18] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8] | SEL[10]
Monday, 30 December, 13
![Page 111: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/111.jpg)
R3 GenMRRedSink2
52
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
toWalk[] {TS[du], TS[c]} opStack {TS[p], FIL[18], RS[4], JOIN[5], RS[6]}
Monday, 30 December, 13
![Page 112: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/112.jpg)
R3 GenMRRedSink2
52
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
toWalk[] {TS[du], TS[c]} opStack {TS[p], FIL[18], RS[4], JOIN[5], RS[6]}
"".join([t + "%" for t in opStack]) == “RS%.*RS%”
Monday, 30 December, 13
![Page 113: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/113.jpg)
R3 GenMRRedSink2
52
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
toWalk[] {TS[du], TS[c]} opStack {TS[p], FIL[18], RS[4], JOIN[5], RS[6]}
"".join([t + "%" for t in opStack]) == “RS%.*RS%”
Stage-1 MapTask TS[p] | TS[c] FIL[18] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8] | SEL[10]
Stage-1 ReduceTask
Monday, 30 December, 13
![Page 114: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/114.jpg)
R3 GenMRRedSink2
52
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
toWalk[] {TS[du], TS[c]} opStack {TS[p], FIL[18], RS[4], JOIN[5], RS[6]}
"".join([t + "%" for t in opStack]) == “RS%.*RS%”
Stage-1 MapTask TS[p] | TS[c] FIL[18] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8] | SEL[10]
Stage-1 ReduceTask
TS[p] | TS[c] FIL[18] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8] | SEL[10]
Stage-1
Stage-2
Monday, 30 December, 13
![Page 115: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/115.jpg)
R3 GenMRRedSink2
52
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
toWalk[] {TS[du], TS[c]} opStack {TS[p], FIL[18], RS[4], JOIN[5], RS[6]}
"".join([t + "%" for t in opStack]) == “RS%.*RS%”
Stage-1 MapTask TS[p] | TS[c] FIL[18] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8] | SEL[10]
Stage-1 ReduceTask
MR[Stage-1]
TS[p] | FIL[18] | RS[4] / JOIN[5] | FS[19]
MR[Stage-2]
TS[20] | RS[6] \ JOIN[8] | SEL[10]
splitPlan
TS[p] | TS[c] FIL[18] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8] | SEL[10]
Stage-1
Stage-2
Monday, 30 December, 13
![Page 116: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/116.jpg)
R3 GenMRRedSink2
52
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
toWalk[] {TS[du], TS[c]} opStack {TS[p], FIL[18], RS[4], JOIN[5], RS[6]}
"".join([t + "%" for t in opStack]) == “RS%.*RS%”
Stage-1 MapTask TS[p] | TS[c] FIL[18] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8] | SEL[10]
Stage-1 ReduceTask
MR[Stage-1]
TS[p] | FIL[18] | RS[4] / JOIN[5] | FS[19]
MR[Stage-2]
TS[20] | RS[6] \ JOIN[8] | SEL[10]
splitPlan
TS[p] | TS[c] FIL[18] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8] | SEL[10]
Stage-1
Stage-2
中间数据落地,存储在hdfs临时⽂文件中
Monday, 30 December, 13
![Page 117: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/117.jpg)
R3 GenMRRedSink2
53
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
toWalk[] {TS[du], TS[c]} opStack {TS[p], FIL[18], RS[4], JOIN[5], RS[6], JOIN[8], SEL[10], GBY[12], RS[13]}
Stage-3
Monday, 30 December, 13
![Page 118: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/118.jpg)
R3 GenMRRedSink2
53
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
toWalk[] {TS[du], TS[c]} opStack {TS[p], FIL[18], RS[4], JOIN[5], RS[6], JOIN[8], SEL[10], GBY[12], RS[13]}
"".join([t + "%" for t in opStack]) == “RS%.*RS%”
Stage-3
Monday, 30 December, 13
![Page 119: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/119.jpg)
R3 GenMRRedSink2
53
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
toWalk[] {TS[du], TS[c]} opStack {TS[p], FIL[18], RS[4], JOIN[5], RS[6], JOIN[8], SEL[10], GBY[12], RS[13]}
"".join([t + "%" for t in opStack]) == “RS%.*RS%”
Stage-2
TS[20] |RS[6] \ JOIN[8] | SEL[10] | GBY[12] | RS[13] | GBY[14] | SEL[15] | FS[17]
Stage-3
Monday, 30 December, 13
![Page 120: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/120.jpg)
R3 GenMRRedSink2
53
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
toWalk[] {TS[du], TS[c]} opStack {TS[p], FIL[18], RS[4], JOIN[5], RS[6], JOIN[8], SEL[10], GBY[12], RS[13]}
"".join([t + "%" for t in opStack]) == “RS%.*RS%”
Stage-2
TS[20] |RS[6] \ JOIN[8] | SEL[10] | GBY[12] | RS[13] | GBY[14] | SEL[15] | FS[17]
TS[20] |RS[6] \ JOIN[8] | SEL[10] | GBY[12] | RS[13] | GBY[14] | SEL[15] | FS[17]
Stage-2
Stage-3
Monday, 30 December, 13
![Page 121: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/121.jpg)
R3 GenMRRedSink2
53
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
toWalk[] {TS[du], TS[c]} opStack {TS[p], FIL[18], RS[4], JOIN[5], RS[6], JOIN[8], SEL[10], GBY[12], RS[13]}
"".join([t + "%" for t in opStack]) == “RS%.*RS%”
Stage-2
TS[20] |RS[6] \ JOIN[8] | SEL[10] | GBY[12] | RS[13] | GBY[14] | SEL[15] | FS[17]
TS[20] |RS[6] \ JOIN[8] | SEL[10] | GBY[12] | RS[13] | GBY[14] | SEL[15] | FS[17]
Stage-2
Stage-3
MR[Stage-2]
TS[20] |RS[6] \ JOIN[8] | SEL[10] | GBY[12] | FS[21]
splitPlan
MR[Stage-3]
TS[22] | RS[13] |GBY[14] |SEL[15] | FS[17]
Monday, 30 December, 13
![Page 122: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/122.jpg)
R4 GenMRFileSink1
54
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
toWalk[] {TS[du], TS[c]} opStack {TS[p], FIL[18], RS[4], JOIN[5], RS[6], JOIN[8], SEL[10], GBY[12], RS[13], GBY[14], SEL[15], FS[17]}
"".join([t + "%" for t in opStack]) == “FS%”
Monday, 30 December, 13
![Page 123: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/123.jpg)
R4 GenMRFileSink1
54
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
toWalk[] {TS[du], TS[c]} opStack {TS[p], FIL[18], RS[4], JOIN[5], RS[6], JOIN[8], SEL[10], GBY[12], RS[13], GBY[14], SEL[15], FS[17]}
"".join([t + "%" for t in opStack]) == “FS%”
MR[Stage-1] |MR[Stage-2] |MR[Stage-3]
MoveWork[Stage-0]
Monday, 30 December, 13
![Page 124: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/124.jpg)
R4 GenMRFileSink1
54
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
toWalk[] {TS[du], TS[c]} opStack {TS[p], FIL[18], RS[4], JOIN[5], RS[6], JOIN[8], SEL[10], GBY[12], RS[13], GBY[14], SEL[15], FS[17]}
"".join([t + "%" for t in opStack]) == “FS%”
MR[Stage-1] |MR[Stage-2] |MR[Stage-3] |MoveWork[Stage-0] |StatsWork[Stage-4]
MR[Stage-1] |MR[Stage-2] |MR[Stage-3]
MoveWork[Stage-0]
Monday, 30 December, 13
![Page 125: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/125.jpg)
Begin Walk
55
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
opStack.clear()
TS[du] | RS[7] /JOIN[8] | SEL[10] | GBY[12] | FS[21]
Monday, 30 December, 13
![Page 126: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/126.jpg)
toWalk[] {TS[c], TS[du]}
Begin Walk
56
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
opStack {}
TS[du] | RS[7] /JOIN[8] | SEL[10] | GBY[12] | FS[21]
Monday, 30 December, 13
![Page 127: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/127.jpg)
R1 GenMRTableScan1
57
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
toWalk[] {TS[c]} opStack {TS[du]}
"".join([t + "%" for t in opStack]) == “ TS%”
Monday, 30 December, 13
![Page 128: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/128.jpg)
R1 GenMRTableScan1
57
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
TS[du] | RS[7] /JOIN[8] | SEL[10] | GBY[12] | FS[21]
toWalk[] {TS[c]} opStack {TS[du]}
"".join([t + "%" for t in opStack]) == “ TS%”
Monday, 30 December, 13
![Page 129: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/129.jpg)
R1 GenMRTableScan1
57
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
TS[du] | RS[7] /JOIN[8] | SEL[10] | GBY[12] | FS[21]
toWalk[] {TS[c]} opStack {TS[du]}
"".join([t + "%" for t in opStack]) == “ TS%”
TS[du] | RS[7] /JOIN[8] | SEL[10] | GBY[12] | FS[21]
Stage-5 MapTask
Monday, 30 December, 13
![Page 130: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/130.jpg)
R2 GenMRRedSink1
58
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
toWalk[] {TS[c]} opStack {TS[du], RS[7]}
"".join([t + "%" for t in opStack]) == “ TS%.*RS%”
Monday, 30 December, 13
![Page 131: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/131.jpg)
R2 GenMRRedSink1
58
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
toWalk[] {TS[c]} opStack {TS[du], RS[7]}
"".join([t + "%" for t in opStack]) == “ TS%.*RS%”
TS[du] | RS[7] /JOIN[8] | SEL[10] | GBY[12] | FS[21]
Stage-5 MapTask
Monday, 30 December, 13
![Page 132: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/132.jpg)
R2 GenMRRedSink1
58
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
toWalk[] {TS[c]} opStack {TS[du], RS[7]}
"".join([t + "%" for t in opStack]) == “ TS%.*RS%”
TS[du] | RS[7] /JOIN[8] | SEL[10] | GBY[12] | FS[21]
Stage-5 MapTask
Stage-5 ReduceTask
TS[du] | RS[7] /JOIN[8] | SEL[10] | GBY[12] | FS[21]
Stage-5 MapTask
Monday, 30 December, 13
![Page 133: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/133.jpg)
R2 GenMRRedSink1
58
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
toWalk[] {TS[c]} opStack {TS[du], RS[7]}
"".join([t + "%" for t in opStack]) == “ TS%.*RS%”
TS[du] | RS[7] /JOIN[8] | SEL[10] | GBY[12] | FS[21]
Stage-5 MapTask
Stage-5 ReduceTask
TS[du] | RS[7] /JOIN[8] | SEL[10] | GBY[12] | FS[21]
Stage-5 MapTask MR[Stage-2]
TS[20] | RS[6] \ JOIN[8] | SEL[10]
+
Monday, 30 December, 13
![Page 134: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/134.jpg)
R2 GenMRRedSink1
58
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
toWalk[] {TS[c]} opStack {TS[du], RS[7]}
"".join([t + "%" for t in opStack]) == “ TS%.*RS%”
TS[du] | RS[7] /JOIN[8] | SEL[10] | GBY[12] | FS[21]
Stage-5 MapTask
Stage-5 ReduceTask
TS[du] | RS[7] /JOIN[8] | SEL[10] | GBY[12] | FS[21]
Stage-5 MapTask MR[Stage-2]
TS[20] | RS[6] \ JOIN[8] | SEL[10]
+
merge map work
MR[Stage-2]
TS[20] TS[du] | |RS[6] RS[7] \ / JOIN[8] | SEL[10] | GBY[12] | FS[21]
Monday, 30 December, 13
![Page 135: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/135.jpg)
Begin Walk
59
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
opStack.clear()
TS[c] | RS[3] \ JOIN[5] | FS[19]
Monday, 30 December, 13
![Page 136: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/136.jpg)
Begin Walk
60
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
TS[c] | RS[3] \ JOIN[5] | FS[19]
opStack {}
toWalk[] {TS[c]}
Monday, 30 December, 13
![Page 137: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/137.jpg)
R1 GenMRTableScan1
61
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
toWalk[] {} opStack {TS[c]}
"".join([t + "%" for t in opStack]) == “ TS%”
Stage-6 MapRedTask
TS[c] | RS[3] \ JOIN[5] | FS[19]
TS[c] | RS[3] \ JOIN[5] | FS[19]
Monday, 30 December, 13
![Page 138: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/138.jpg)
R2 GenMRRedSink1
62
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
toWalk[] {} opStack {TS[c], RS[3]}
"".join([t + "%" for t in opStack]) == “ TS%.*RS%”
Stage-6 MapRedTask
TS[c] | RS[3] \ JOIN[5] | FS[19]
Stage-6 MapWork
TS[c] | RS[3] \ JOIN[5] | FS[19]
Stage-6 RedWork
merge map work
MR[Stage-1]
TS[p] | FIL[18] | RS[4] / JOIN[5] | FS[19]
+
MR[Stage-1]
TS[p] | TS[c] FIL[18] | |RS[3] RS[4] \ / JOIN[5] | FS[19]
Monday, 30 December, 13
![Page 139: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/139.jpg)
breakTaskTree
63
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
MR[Stage-1]
TS[p] | TS[c] FIL[18] | |RS[3] RS[4] \ / JOIN[5] | FS[19]
MR[Stage-2]
TS[20] TS[du] | |RS[6] RS[7] \ / JOIN[8] | SEL[10] | GBY[12] | FS[21]
MR[Stage-3]
TS[22] | RS[13] |GBY[14] |SEL[15] | FS[17]
Monday, 30 December, 13
![Page 140: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/140.jpg)
breakTaskTree
63
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
MR[Stage-1]
TS[p] | TS[c] FIL[18] | |RS[3] RS[4] \ / JOIN[5] | FS[19]
MR[Stage-2]
TS[20] TS[du] | |RS[6] RS[7] \ / JOIN[8] | SEL[10] | GBY[12] | FS[21]
MR[Stage-3]
TS[22] | RS[13] |GBY[14] |SEL[15] | FS[17]
MR[Stage-1]
TS[p] | TS[c] FIL[18] | |RS[3] RS[4]
JOIN[5] | FS[19]
MR[Stage-2]
TS[20] TS[du] | |RS[6] RS[7]
JOIN[8] | SEL[10] | GBY[12] | FS[21]
MR[Stage-3]
TS[22] | RS[13]
GBY[14] |SEL[15] | FS[17]
Monday, 30 December, 13
![Page 141: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/141.jpg)
breakTaskTree
63
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
MR[Stage-1]
TS[p] | TS[c] FIL[18] | |RS[3] RS[4] \ / JOIN[5] | FS[19]
MR[Stage-2]
TS[20] TS[du] | |RS[6] RS[7] \ / JOIN[8] | SEL[10] | GBY[12] | FS[21]
MR[Stage-3]
TS[22] | RS[13] |GBY[14] |SEL[15] | FS[17]
MR[Stage-1]
TS[p] | TS[c] FIL[18] | |RS[3] RS[4]
JOIN[5] | FS[19]
MR[Stage-2]
TS[20] TS[du] | |RS[6] RS[7]
JOIN[8] | SEL[10] | GBY[12] | FS[21]
MR[Stage-3]
TS[22] | RS[13]
GBY[14] |SEL[15] | FS[17]
map
reduce
Monday, 30 December, 13
![Page 142: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/142.jpg)
Logical Plan => Physical Plan
64
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
TS[p] | TS[c] FIL[18] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8] | SEL[10] | GBY[12] | RS[13] | GBY[14] | SEL[15] | FS[17]
Monday, 30 December, 13
![Page 143: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/143.jpg)
Logical Plan => Physical Plan
64
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
TS[p] | TS[c] FIL[18] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8] | SEL[10] | GBY[12] | RS[13] | GBY[14] | SEL[15] | FS[17]
MR[Stage-1]
TS[p] | TS[c] FIL[18] | |RS[3] RS[4] JOIN[5] | FS[19]
Monday, 30 December, 13
![Page 144: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/144.jpg)
Logical Plan => Physical Plan
64
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
TS[p] | TS[c] FIL[18] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8] | SEL[10] | GBY[12] | RS[13] | GBY[14] | SEL[15] | FS[17]
MR[Stage-1]
TS[p] | TS[c] FIL[18] | |RS[3] RS[4] JOIN[5] | FS[19]
MR[Stage-2]
TS[20] TS[du] | |RS[6] RS[7]
JOIN[8] | SEL[10] | GBY[12] | FS[21]
Monday, 30 December, 13
![Page 145: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/145.jpg)
Logical Plan => Physical Plan
64
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
TS[p] | TS[c] FIL[18] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8] | SEL[10] | GBY[12] | RS[13] | GBY[14] | SEL[15] | FS[17]
MR[Stage-1]
TS[p] | TS[c] FIL[18] | |RS[3] RS[4] JOIN[5] | FS[19]
MR[Stage-2]
TS[20] TS[du] | |RS[6] RS[7]
JOIN[8] | SEL[10] | GBY[12] | FS[21]
MR[Stage-3]
TS[22] | RS[13]
GBY[14] |SEL[15] | FS[17]
Monday, 30 December, 13
![Page 146: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/146.jpg)
Logical Plan => Physical Plan
64
PhysicalOptimizerParser Semantic
AnalyzerLogical
Plan Gen.Logical
OptimizerPhysical
Plan Gen.
TS[p] | TS[c] FIL[18] | |RS[3] RS[4] \ / JOIN[5] TS[du] | | RS[6] RS[7] \ / JOIN[8] | SEL[10] | GBY[12] | RS[13] | GBY[14] | SEL[15] | FS[17]
MR[Stage-1]
TS[p] | TS[c] FIL[18] | |RS[3] RS[4] JOIN[5] | FS[19]
MR[Stage-2]
TS[20] TS[du] | |RS[6] RS[7]
JOIN[8] | SEL[10] | GBY[12] | FS[21]
MR[Stage-3]
TS[22] | RS[13]
GBY[14] |SEL[15] | FS[17]
MR[Stage-1]JOIN[5] |MR[Stage-2]JOIN[8] GBY[12] |MR[Stage-3] GBY[14] |MoveWork[Stage-0] |StatsWork[Stage-4]
Monday, 30 December, 13
![Page 147: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/147.jpg)
⺫⽬目录1. MapReduce实现Join Group By Distinct操作的基本原理
2. SQL转化为MapReduce的过程
(1) Antlr && ASTTree
(2) sql基本组成单元QueryBlock
(3) 逻辑操作符Operator
(4) 逻辑层优化器
(5) OperatorTree转化为MapReduce Job的过程
(6) 物理层优化器
3. Hive执⾏行计划
Monday, 30 December, 13
![Page 148: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/148.jpg)
Physical Optimizer
66
Parser SemanticAnalyzer
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizer
名称 作⽤用CommonJoinResolver + MapJoinResolver
MapJoin
SortMergeJoinResolver 与bucket配合,类似于归并排序SamplingOptimizer 并⾏行 order by
Vectorizer HIVE-4160
Monday, 30 December, 13
![Page 149: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/149.jpg)
MapJoin
67
Parser SemanticAnalyzer
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizer
MapReduce Local Task
Monday, 30 December, 13
![Page 150: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/150.jpg)
MapJoin
67
Parser SemanticAnalyzer
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizer
MapReduce Local Task
Small Table Data
Small Table Data
Small Table Data
Monday, 30 December, 13
![Page 151: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/151.jpg)
MapJoin
67
Parser SemanticAnalyzer
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizer
MapReduce Local Task
Small Table Data
Small Table Data
Small Table Data
Distributed Cache
HashTable Files
Upload files to DCHashTable
FilesHashTable
Files
Monday, 30 December, 13
![Page 152: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/152.jpg)
MapJoin
67
Parser SemanticAnalyzer
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizer
Mapper
Mapper
…
…
Mapper …
MapJoin Task
MapReduce Local Task
Small Table Data
Small Table Data
Small Table Data
Distributed Cache
HashTable Files
Upload files to DCHashTable
FilesHashTable
Files
Monday, 30 December, 13
![Page 153: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/153.jpg)
MapJoin
67
Parser SemanticAnalyzer
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizer
Mapper
Mapper
…
…
Mapper …
MapJoin Task
MapReduce Local Task
Small Table Data
Small Table Data
Small Table Data
Distributed Cache
HashTable Files
Upload files to DCHashTable
FilesHashTable
Files
Monday, 30 December, 13
![Page 154: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/154.jpg)
MapJoin
67
Parser SemanticAnalyzer
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizer
Mapper
Mapper
…
…
Mapper …
MapJoin Task
Big Table Data
Record
Record
Record
Record
……
MapReduce Local Task
Small Table Data
Small Table Data
Small Table Data
Distributed Cache
HashTable Files
Upload files to DCHashTable
FilesHashTable
Files
Monday, 30 December, 13
![Page 155: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/155.jpg)
CommonJoinResolver
68
Parser SemanticAnalyzer
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizer
Task A
Task C
Monday, 30 December, 13
![Page 156: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/156.jpg)
CommonJoinResolver
68
Parser SemanticAnalyzer
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizer
Task A
Conditional Task
Task C
Monday, 30 December, 13
![Page 157: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/157.jpg)
CommonJoinResolver
68
Parser SemanticAnalyzer
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizer
Task A
Conditional Task
Task C
MapJoin LocalTask
MapJoinTask
Monday, 30 December, 13
![Page 158: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/158.jpg)
CommonJoinResolver
68
Parser SemanticAnalyzer
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizer
Task A
Conditional Task
Task C
MapJoin LocalTask
MapJoinTask
Memory Bound
Monday, 30 December, 13
![Page 159: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/159.jpg)
CommonJoinResolver
68
Parser SemanticAnalyzer
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizer
Task A
Conditional Task
Task C
MapJoin LocalTask
MapJoinTask
Memory Bound
Monday, 30 December, 13
![Page 160: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/160.jpg)
CommonJoinResolver
68
Parser SemanticAnalyzer
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizer
Task A
Conditional Task
Task C
MapJoin LocalTask
CommonJoinTask
MapJoinTask
Run as a Backup Task
Memory Bound
Monday, 30 December, 13
![Page 161: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/161.jpg)
CommonJoinResolver
69
Parser SemanticAnalyzer
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizer
MR[Stage-1]JOIN[5] |MR[Stage-2]JOIN[8] GBY[12] |MR[Stage-3] GBY[14] |MoveWork[Stage-0] |StatsWork[Stage-4]
• 深度优先遍历Task Tree
• 找到JoinOperator,判断左右表数据量⼤大⼩小
• ⼩小表 + ⼤大表 => MapJoinTask
• ⼩小/⼤大表 + 中间表 => ConditionalTask
Monday, 30 December, 13
![Page 162: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/162.jpg)
CommonJoinResolver
70
Parser SemanticAnalyzer
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizer
MR[Stage-2]
TS[20] TS[du] | |RS[6] RS[7]
JOIN[8] | SEL[10] | GBY[12] | FS[21]
Monday, 30 December, 13
![Page 163: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/163.jpg)
CommonJoinResolver
70
Parser SemanticAnalyzer
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizer
MR[Stage-2]
TS[20] TS[du] | |RS[6] RS[7]
JOIN[8] | SEL[10] | GBY[12] | FS[21]
big table
Monday, 30 December, 13
![Page 164: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/164.jpg)
CommonJoinResolver
70
Parser SemanticAnalyzer
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizer
MR[Stage-2]
TS[20] TS[du] | |RS[6] RS[7]
JOIN[8] | SEL[10] | GBY[12] | FS[21]
MR[Stage-7]
TS[23] TS[25] | |RS[24] RS[26]
JOIN[34] | SEL[35] | GBY[36] | FS[37]
deepCopy
big table
Monday, 30 December, 13
![Page 165: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/165.jpg)
CommonJoinResolver
70
Parser SemanticAnalyzer
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizer
MR[Stage-2]
TS[20] TS[du] | |RS[6] RS[7]
JOIN[8] | SEL[10] | GBY[12] | FS[21]
MR[Stage-7]
TS[23] TS[25] | |RS[24] RS[26]
JOIN[34] | SEL[35] | GBY[36] | FS[37]
deepCopy
big table
MRTask[Stage-7]FetchWork[$INTNAME]
TS[23] TS[25] \ / MAPJOIN[44] | SEL[35] | GBY[36] | FS[37]
Map Only MR
LocalWork
Monday, 30 December, 13
![Page 166: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/166.jpg)
CommonJoinResolver
71
Parser SemanticAnalyzer
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizer
MR[Stage-2]
TS[20] TS[du] | |RS[6] RS[7]
JOIN[8] | SEL[10] | GBY[12] | FS[21]
Monday, 30 December, 13
![Page 167: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/167.jpg)
CommonJoinResolver
71
Parser SemanticAnalyzer
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizer
MR[Stage-2]
TS[20] TS[du] | |RS[6] RS[7]
JOIN[8] | SEL[10] | GBY[12] | FS[21]
big table
Monday, 30 December, 13
![Page 168: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/168.jpg)
CommonJoinResolver
71
Parser SemanticAnalyzer
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizer
MR[Stage-2]
TS[20] TS[du] | |RS[6] RS[7]
JOIN[8] | SEL[10] | GBY[12] | FS[21]
big table
...deepCopy
Monday, 30 December, 13
![Page 169: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/169.jpg)
CommonJoinResolver
71
Parser SemanticAnalyzer
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizer
MR[Stage-2]
TS[20] TS[du] | |RS[6] RS[7]
JOIN[8] | SEL[10] | GBY[12] | FS[21]
big table
MRTask[Stage-8]FetchWork[du]
TS[45] TS[47] \ / MAPJOIN[66] | SEL[57] | GBY[36] | FS[37]
Map Only MR
LocalWork
...deepCopy
Monday, 30 December, 13
![Page 170: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/170.jpg)
CommonJoinResolver
72
Parser SemanticAnalyzer
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizer
MR[Stage-1]JOIN[5] |MR[Stage-2]JOIN[8] GBY[12] |MR[Stage-3] GBY[14] |MoveWork[Stage-0] |StatsWork[Stage-4]
Monday, 30 December, 13
![Page 171: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/171.jpg)
CommonJoinResolver
72
Parser SemanticAnalyzer
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizer
MR[Stage-1]JOIN[5] |MR[Stage-2]JOIN[8] GBY[12] |MR[Stage-3] GBY[14] |MoveWork[Stage-0] |StatsWork[Stage-4]
MR[Stage-10] MAPJOIN | ConditionalTask[Stage-9] / | \MR[Stage-7] MR[Stage-8] MR[Stage-2]MAPJOIN MAPJOIN JOIN \ | / \ | / MR[Stage-3] | MoveWork[Stage-0] | StatsWork[Stage-4]
Monday, 30 December, 13
![Page 172: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/172.jpg)
CommonJoinResolver
72
Parser SemanticAnalyzer
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizer
MR[Stage-1]JOIN[5] |MR[Stage-2]JOIN[8] GBY[12] |MR[Stage-3] GBY[14] |MoveWork[Stage-0] |StatsWork[Stage-4]
MR[Stage-10] MAPJOIN | ConditionalTask[Stage-9] / | \MR[Stage-7] MR[Stage-8] MR[Stage-2]MAPJOIN MAPJOIN JOIN \ | / \ | / MR[Stage-3] | MoveWork[Stage-0] | StatsWork[Stage-4]
运⾏行时判断,采⽤用哪种⽅方式执⾏行
Monday, 30 December, 13
![Page 173: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/173.jpg)
MapJoinResolver
73
Parser SemanticAnalyzer
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizer
MRTask[Stage-10]FetchWork[c]MRWork
• 遍历Task Tree,将所有有local work的MapReduceTask拆成两个Task
MRTask[Stage-13]FetchWork[c]HashTableSinkOperator |MRTask[Stage-10]MRWork
Monday, 30 December, 13
![Page 174: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/174.jpg)
MapJoinResolver
74
Parser SemanticAnalyzer
LogicalPlan Gen.
LogicalOptimizer
PhysicalPlan Gen.
PhysicalOptimizer
MR[Stage-10] MAPJOIN | ConditionalTask[Stage-9] / | \MR[Stage-7] MR[Stage-8] MR[Stage-2]MAPJOIN MAPJOIN JOIN \ | / \ | / MR[Stage-3] | MoveWork[Stage-0] | StatsWork[Stage-4]
Lock[Stage-13] | MR[Stage-10] MAPJOIN | ConditionalTask[Stage-9] / | \Lock[Stage-11] Lock[Stage-12] \ | | | MR[Stage-7] MR[Stage-8] MR[Stage-2]MAPJOIN MAPJOIN JOIN \ | / \ | / \ | / MR[Stage-3] | MoveWork[Stage-0] | StatsWork[Stage-4]
Monday, 30 December, 13
![Page 175: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/175.jpg)
回顾
sql翻译的过程
Monday, 30 December, 13
![Page 176: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/176.jpg)
回顾
1. Antlr定义sql的语法规则,完成sql词法,语法解析,将sql转化为抽象语法树AST Tree
sql翻译的过程
Monday, 30 December, 13
![Page 177: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/177.jpg)
回顾
1. Antlr定义sql的语法规则,完成sql词法,语法解析,将sql转化为抽象语法树AST Tree
2. 遍历AST Tree,抽象出查询的基本组成单元QueryBlock
sql翻译的过程
Monday, 30 December, 13
![Page 178: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/178.jpg)
回顾
1. Antlr定义sql的语法规则,完成sql词法,语法解析,将sql转化为抽象语法树AST Tree
2. 遍历AST Tree,抽象出查询的基本组成单元QueryBlock
3. 遍历QueryBlock,翻译为执⾏行逻辑OperatorTree
sql翻译的过程
Monday, 30 December, 13
![Page 179: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/179.jpg)
回顾
1. Antlr定义sql的语法规则,完成sql词法,语法解析,将sql转化为抽象语法树AST Tree
2. 遍历AST Tree,抽象出查询的基本组成单元QueryBlock
3. 遍历QueryBlock,翻译为执⾏行逻辑OperatorTree
4. 逻辑优化器进⾏行OperatorTree变换,合并ReduceSink,减少shuffle数据量
sql翻译的过程
Monday, 30 December, 13
![Page 180: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/180.jpg)
回顾
1. Antlr定义sql的语法规则,完成sql词法,语法解析,将sql转化为抽象语法树AST Tree
2. 遍历AST Tree,抽象出查询的基本组成单元QueryBlock
3. 遍历QueryBlock,翻译为执⾏行逻辑OperatorTree
4. 逻辑优化器进⾏行OperatorTree变换,合并ReduceSink,减少shuffle数据量
5. 遍历OperatorTree,翻译为MapReduce任务
sql翻译的过程
Monday, 30 December, 13
![Page 181: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/181.jpg)
回顾
1. Antlr定义sql的语法规则,完成sql词法,语法解析,将sql转化为抽象语法树AST Tree
2. 遍历AST Tree,抽象出查询的基本组成单元QueryBlock
3. 遍历QueryBlock,翻译为执⾏行逻辑OperatorTree
4. 逻辑优化器进⾏行OperatorTree变换,合并ReduceSink,减少shuffle数据量
5. 遍历OperatorTree,翻译为MapReduce任务
6. 物理层优化器进⾏行MapReduce任务的变换,⽣生成Conditional Task,动态检测是否能转化MapJoin
sql翻译的过程
Monday, 30 December, 13
![Page 182: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/182.jpg)
⺫⽬目录1. MapReduce实现Join Group By Distinct操作的基本原理
2. SQL转化为MapReduce的过程
(1) Antlr && ASTTree
(2) sql基本组成单元QueryBlock
(3) 逻辑操作符Operator
(4) 逻辑层优化器
(5) OperatorTree转化为MapReduce Job的过程
(6) 物理层优化器 MapJoin原理
3. Hive执⾏行计划
Monday, 30 December, 13
![Page 183: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/183.jpg)
执⾏行计划
• AST抽象语法树
• Stage Dependency
• MapReduce Plan
Monday, 30 December, 13
![Page 184: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/184.jpg)
Stage Dependency
Stage-11 depends on stages: Stage-14 , consists of Stage-15, Stage-16, Stage-4
Stage-11是⼀一个ConditionalTask,可能执⾏行Stage-15/Stage-16/Stage-4中的⼀一个。⺫⽬目前出现ConditionalTask只可能是在执⾏行期间判断是否能转化为MapJoin的情况。Stage-4 common join,Stage-15和Stage-16就是可能的两种MapJoin的情况。
Monday, 30 December, 13
![Page 185: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/185.jpg)
Stage Dependency
Stage-11 depends on stages: Stage-14 , consists of Stage-15, Stage-16, Stage-4
Stage-11是⼀一个ConditionalTask,可能执⾏行Stage-15/Stage-16/Stage-4中的⼀一个。⺫⽬目前出现ConditionalTask只可能是在执⾏行期间判断是否能转化为MapJoin的情况。Stage-4 common join,Stage-15和Stage-16就是可能的两种MapJoin的情况。
Monday, 30 December, 13
![Page 186: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/186.jpg)
MapReduce Plan
• ReduceSinkOperator只可能出现在Map阶段,且标志着Map阶段
• 组合字段为reduce key, value
• sort order 按id正排,按name正排
• partition key 按partitionkey求hash值分配reduce
• tag,标⽰示表,在Join中区分是哪个原始表
Monday, 30 December, 13
![Page 187: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/187.jpg)
MapReduce Plan
• 每个Operator计算完成之后均会对字段重新命名,命名⽅方式_col + i,Map输出字段以KEY/VALUE._col + i形式表⽰示
• KEY._col1:0._col0 “0.”表⽰示给distinct字段打上标签
• mode,聚合计算⽅方式,COMPLETE, PARTIAL1, PARTIAL2, PARTIALS, FINAL, HASH, MERGEPARTIAL
Monday, 30 December, 13
![Page 188: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/188.jpg)
MapReduce Plan
• condition expression表⽰示join中两表分别包含的字段
• Position of Big Table 表⽰示tag=1的表是数据量⼤大的表
Monday, 30 December, 13
![Page 189: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/189.jpg)
Monday, 30 December, 13
![Page 190: Hive sql的编译过程](https://reader031.fdocument.pub/reader031/viewer/2022013105/5481d3f5b47959e20c8b45f9/html5/thumbnails/190.jpg)
Thanks && QA
Monday, 30 December, 13