[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
-
Upload
insight-technology-inc -
Category
Technology
-
view
716 -
download
1
description
Transcript of [db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
Copyright©2014 NTT corp. All Rights Reserved.
Apache Hadoop-What’s next?-@db tech showcase 2014
Tsuyoshi [email protected]
2Copyright©2014 NTT corp. All Rights Reserved.
• Tsuyoshi Ozawa
• Researcher & Engineer @ NTTTwitter: @oza_x86_64
• A Hadoop developer
• Merged patches – 53 patches!
• Author of “Hadoop 徹底入門 2nd Edition”Chapter 22(YARN)
About me
3Copyright©2014 NTT corp. All Rights Reserved.
Quiz!!
4Copyright©2014 NTT corp. All Rights Reserved.
Does Hadoophave SPoF?
Quiz
5Copyright©2014 NTT corp. All Rights Reserved.
Quiz
All master nodes in Hadoopcan run as highly available mode
6Copyright©2014 NTT corp. All Rights Reserved.
Is Hadooponly for MapReduce?
Quiz
7Copyright©2014 NTT corp. All Rights Reserved.
Quiz
Hadoop isnot only for MapReduce
but also Spark/Tez/Storm and so on…
8Copyright©2014 NTT corp. All Rights Reserved.
• Current Status of Hadoop- New features since Hadoop 2 -
• HDFS• No SPoF with Namenode HA + JournalNode
• Scaling out Namenode with Namenode Federation
• YARN• Resource Management with YARN
• No SPoF with ResourceManager HA
• MapReduce• No SPoF with ApplicationMaster restart
• What’s next?- Coming features in 2.6 release -
• HDFS• Heterogeneous Storage
• Memory as Storage Tier
• YARN• Label-based scheduling
• RM HA Phase 2
Agenda
9Copyright©2014 NTT corp. All Rights Reserved.
HDFS IN HADOOP 2
10Copyright©2014 NTT corp. All Rights Reserved.
• Once on a time, NameNode was SPoF
• In Hadoop 2, NameNode hasQuorum JournalManager
• Replication is done by Pasxos-based protocol
See also:
http://blog.cloudera.com/blog/2012/10/quorum-based-journaling-in-cdh4-1/
NameNode with JournalNode
NameNode
Quorum
JournalManager
JournalNode JournalNode JournalNode
Local disk Local disk Local disk
11Copyright©2014 NTT corp. All Rights Reserved.
• Once on a time, scalability of NameNode was limited to memory
• In Hadoop 2, NameNode hasFederation feature
• Distributing metadata per namespace
NameNode Federation
Figures from:
https://hadoop.apache.org/docs/r2.3.0/had
oop-project-dist/hadoop-
hdfs/Federation.html
12Copyright©2014 NTT corp. All Rights Reserved.
RESOURCE MANAGEMENT IN HADOOP 2
13Copyright©2014 NTT corp. All Rights Reserved.YARN
• Generic resource management framework
• YARN = Yet Another Resource Negotiator
• Proposed by Arun C Murthy in 2011
• Container-level resource management
• Container is more generic unit of resource than slots
• Separate JobTracker’s role
• Job Scheduling/Resource Management/Isolation
• Task Scheduling
What’s YARN?
JobTracker
MRv1 architectureMRv2 and YARN Architecture
YARN ResourceManager
Impala Master Spark MasterMRv2 Master
TaskTracker YARN NodeManager
map slot reduce slot containercontainercontainer
14Copyright©2014 NTT corp. All Rights Reserved.
• Running various processing frameworkson same cluster
• Batch processing with MapReduce
• Interactive query with Impala
• Interactive deep analytics(e.g. Machine Learning)with Spark
Why YARN?(Use case)
MRv2/Tez
YARN
HDFS
Impala Spark
Periodic long batch
query
Interactive
Aggregation
query
Interactive
Machine Learning
query
15Copyright©2014 NTT corp. All Rights Reserved.
• More effective resource management for multiple processing frameworks
• difficult to use entire resources without thrashing
• Cannot move *Real* big data from HDFS/S3
Why YARN?(Technical reason)
Master for MapReduce Master for Impala
Slave
Impala slavemap slot reduce slot
MapReduce slave
Slave Slave Slave
HDFS slave
Each frameworks has own scheduler Job2Job1 Job1
thrashing
16Copyright©2014 NTT corp. All Rights Reserved.
• Resource is managed by JobTracker
• Job-level Scheduling
• Resource Management
MRv1 Architecture
Master for MapReduce
Slave
map slot reduce slot
MapReduce slave
Slave
map slot reduce slot
MapReduce slave
Slave
map slot reduce slot
MapReduce slave
Master for Impala
Schedulers only now own resource usages
17Copyright©2014 NTT corp. All Rights Reserved.
• Idea
• One global resource manager(ResourceManager)
• Common resource pool for all frameworks(NodeManager and Container)
• Schedulers for each frameworks(AppMaster)
YARN Architecture
ResourceManager
Slave
NodeManager
Container Container Container
Slave
NodeManager
Container Container Container
Slave
NodeManager
Container Container Container
Master Slave Slave MasterSlave SlaveMaster Slave Slave
Client
1. Submit jobs
2. Launch Master 3. Launch Slaves
18Copyright©2014 NTT corp. All Rights Reserved.
YARN and Mesos
YARN
• AppMaster is launched for each jobs
• More scalability
• Higher latency
• One container per req
• One Master per Job
Mesos
• AppMaster is launched for each app(framework)
• Less scalability
• Lower latency
• Bundle of containers per req
• One Master per Framework
ResourceManager
NM NM NM
ResourceMaster
Slave Slave Slave
Master1
Master2
Master1 Master2
Policy/Philosophy is different
19Copyright©2014 NTT corp. All Rights Reserved.
• MapReduce• Of course, it works
• DAG-style processing framework• Spark on YARN
• Hive on Tez on YARN
• Interactive Query • Impala on YARN(via llama)
• Users• Yahoo!
• LinkdedIn
• Hadoop 2 @ Twitter http://www.slideshare.net/Hadoop_Summit/t-235p210-cvijayarenuv2
YARN Eco-system
20Copyright©2014 NTT corp. All Rights Reserved.
YARN COMPONENTS
21Copyright©2014 NTT corp. All Rights Reserved.
• Master Node of YARN
• Role
• Accepting requests from
1. Application Masters for allocating containers
2. Clients for submitting jobs
• Managing Cluster Resources
• Job-level Scheduling
• Container Management
• Launching Application-level Master(e.g. for MapReduce)
ResourceManager(RM)
ResourceManager Client
Slave
NodeManager
Container Container
Master
4.Container allocation
requests to NodeManager
1. Submitting Jobs
2. Launching Master of jobs
3.Container allocation requests
22Copyright©2014 NTT corp. All Rights Reserved.
• Slave Node of YARN
• Role
• Accepting requests from RM
• Monitoring local machine and report it to RM
• Health Check
• Managing local resources
NodeManager(NM)
NodeManagerResourceManager
2. Allocating containers
Clients
Master
or
3. Launching containers
containers
4. Containers information
(host, port, etc.)
1. Request containers
Periodic health check via heartbeat
23Copyright©2014 NTT corp. All Rights Reserved.
• Master of Applications(e.g. Master of MapReduce, Tez , Spark etc.)
• Run on Containers
• Roles
• Getting containers from ResourceManager
• Application-level Scheduling• How much and where Map tasks run?
• When reduce tasks will be launched?
ApplicationMaster(AM)
NodeManager
Container
Master of MapReduce ResourceManager
1. Request containers
2. List of Allocated containers
24Copyright©2014 NTT corp. All Rights Reserved.
RESOURCE MANAGER HA
25Copyright©2014 NTT corp. All Rights Reserved.
• What’s happen when ResourceManager fails?
• cannot submit new jobs
• NOTE:
• Launched Apps continues to run
• AppMaster recover is done in each frameworks• MRv2
ResourceManager High Availability
ResourceManager
Slave
NodeManager
Container Container Container
Slave
NodeManager
Container Container Container
Slave
NodeManager
Container Container Container
Master Slave Slave MasterSlave SlaveMaster Slave Slave
Client
Submit jobs
Continue to run each jobs
26Copyright©2014 NTT corp. All Rights Reserved.
• Approach
• Storing RM information to ZooKeeper
• Automatic Failover by Embedded Elector
• Manual Failover by RMHAUtils
• NodeManagers uses local RMProxy to access them
ResourceManager High Availability
ResourceManager
Active
ResourceManager
Standby
ZooKeeper ZooKeeper ZooKeeper
2. failure
3. Embedded
Detects
failure
EmbeddedElector EmbeddedElector
4. Failover
RMState RMState RMState
1. Active Node stores
all state into RMStateStore
3. Standby
Node become
active
5. Load states from
RMStateStore
27Copyright©2014 NTT corp. All Rights Reserved.
CAPACITY PLANNING ON YARN
28Copyright©2014 NTT corp. All Rights Reserved.
• Define resources with XML(etc/hadoop/yarn-site.xml)
Resource definition on NodeManager
NodeManager
CPUCPU
CPUCPU
CPU
MemoryMemory
MemoryMemory
Memory
<property><name>yarn.nodemanager.resource.cpu-vcores</name><value>8</value>
</property>
<property><name>yarn.nodemanager.resource.memory-mb</name><value>8192</value>
</property>
8 CPU cores 8 GB memory
29Copyright©2014 NTT corp. All Rights Reserved.
Container allocation on ResourceManager
• RM accepts container request and send it to NM, but the request can be rewritten
• Small requests will be rounded up to minimum-allocation-mb
• Large requests will be rounded down tomaximum-allocation-mb
<property><name>yarn.scheduler.minimum-allocation-mb</name><value>1024</value>
</property><property>
<name>yarn.scheduler.maximum-allocation-mb</name><value>8192</value>
</property>
ResourceManagerClient
Request 512MBNodeManager
NodeManagerNodeManager
Request 1024MB
Master
30Copyright©2014 NTT corp. All Rights Reserved.
• Define how much MapTasks or ReduceTasks use resource
• MapReduce: etc /hadoop/mapred-site.xml
Container allocation at framework side
NodeManager
CPUCPU
CPUCPU
CPU
MemoryMemory
MemoryMemory
Memory
8 CPU cores
8 GB memory
<property><name>mapreduce.map.memory.mb</name><value>1024</value>
</property>
<property><name>mapreduce.reduce.memory.mb</name><value>4096</value>
</property>
Slave
NodeManager
Container Container
Master
Giving us containers
For map task
- 1024 MB memory,
1 CPU core
Container
1024MB memory
1 core
31Copyright©2014 NTT corp. All Rights Reserved.
WHAT’S NEXT? – HDFS -
32Copyright©2014 NTT corp. All Rights Reserved.
• HDFS-2832, HDFS-5682
• Handling various storage types in HDFS
• SSD, memory, disk, and so on.
• Setting quota per storage types
• Setting SSD quota on /home/user1 to 10 TB.
• Setting SSD quota on /home/user2 to 10 TB.
• (c) Not configuring any SSD quota on the remaining user directories (i.e. leaving it to defaults).
Heterogeneous Storages for HDFS Phase 2
<configuration>
...
<property>
<name>dfs.datanode.data.dir</name>
<value>[DISK]/mnt/sdc2/,[DISK]/mnt/sdd2,[SSD]/mnt/sde2</value>
</property>
...
</configuration>
33Copyright©2014 NTT corp. All Rights Reserved.
• HDFS-5851
• Introducing obvious “Cache” layer in HDFS
• Discardable Distributed Memory
• Applications can accelerate their speedsby using memory
• Discardable Memory and Materialized Queries is one of examples
• Difference between RDD and DDM
• Multi-tenancy aware
• Handling data in processing layer or in Storage layer
Support memory as a storage medium
34Copyright©2014 NTT corp. All Rights Reserved.
• Archival storage
• HDFS-6584
• Transparent encryption
• HDFS-6134
And, more!
35Copyright©2014 NTT corp. All Rights Reserved.
WHAT’S NEXT? – YARN -
36Copyright©2014 NTT corp. All Rights Reserved.
• Non-stop YARN updating(YARN-666)
• NodeManger, ResourceManager, Applications
• Before 2.6.0
• Restarting RM -> RM restarts all AMs -> restart all jobs
• Restarting NMs -> NMs are removed from cluster-> Containers are restarted!
• After 2.6.0
• Restarting RM -> AMs continue run
• Restarting NM -> NMs restore the state from local data
Support for rolling upgrades in YARN
ResourceManager
Slave
NodeManager
Container Container Container
Slave
NodeManager
Container Container Container
Slave
NodeManager
Container Container Container
Master Slave Slave MasterSlave SlaveMaster Slave Slave
37Copyright©2014 NTT corp. All Rights Reserved.
• Now we can run various subsystems on YARN
• Interactive query engines : Spark, Impala, …
• Batch processing engines : MapReduce, Tez, …
• Problem
• Interactive query engines allocates resources at the same time – it can delay daily batch.
• Time-based reservation scheduling
• 8:00am – 6:00pm, allocating resources for Impala
• 6:00pm – 0:00am, allocating resources for MapReduce
YARN reservation-subsystem
Allocation for
Interactive query engine
Batch processing for
The next day!
8:00am 6:00pm 0:00am
38Copyright©2014 NTT corp. All Rights Reserved.
• YARN-796
• Handling heterogeneous machinesin one YARN cluster
• GPU cluster
• High memory cluster
• 40Gbps Network cluster
• Labeling them and scheduling based on labels
• Admin can add/remove labels via yarn rmadmincommands
Support for admin-specified labels in YARN
NodeManager NodeManager
NodeManager NodeManager
GPU
NodeManager NodeManager
NodeManager NodeManager
40Gnetwork
ResourceManager Client
Submit jobs
On GPU!
39Copyright©2014 NTT corp. All Rights Reserved.
• Timeline service security
• YARN-1935
• Minimal support for running long-running services on YARN
• YARN-896
• Support for automatic, shared cache for YARN application artifacts
• YARN-1492
• And, and more!
• Please check Wiki http://wiki.apache.org/hadoop/Roadmap
And, more!
40Copyright©2014 NTT corp. All Rights Reserved.
• Hadoop 2 is evolving rapidly• I appreciate if you can catch up via this presentaion!
• New components from V2• HDFS
• Quorum Journal Manager• Namenode Federation
• ResourceManager• NodeManager• Application Master
• New features in 2.6: • Discardable memory store on HDFS, and so on.• Rolling update, labels for heterogeneous cluster on YARN,
Reservation system, and so on…
• Questions or Feedbacks ->[email protected]
• Issue -> https://issues.apache.org/jira/browse /{HDFS,YARN,HADOOP,MAPREDUCE}
Summary
41Copyright©2014 NTT corp. All Rights Reserved.
• YARN-666
• https://www.youtube.com/watch?v=O4Q73e2ua9Y&feature=youtu.be
• http://www.slideshare.net/Hadoop_Summit/t-145p230avavilapalli-mac