123545395-hadoop

download 123545395-hadoop

of 17

Transcript of 123545395-hadoop

  • 7/30/2019 123545395-hadoop

    1/17

    Integrating HADOOP with Eclipse

    on a Virtual Machine

    Moheeb AlwarshJanuary 26, 2012

    Kent State University

  • 7/30/2019 123545395-hadoop

    2/17

    Integrating HADOOP with Eclipse on a Virtual Machine 2

    Installing VirtualBox

    Importing Virtual OS to VirtualBox

    Live Demo

    Outline

  • 7/30/2019 123545395-hadoop

    3/17

    Integrating HADOOP with Eclipse on a Virtual Machine 3

    Installing VirtualBox

    Virtualbox Download location

    https://www.virtualbox.org/wiki/Downloads

    Windows Installation

    Run executable file VirtualBox-4.1.8-75467-Win.exe" and follow

    instructions

    Mac OS (http://download.virtualbox.org/virtualbox/4.1.8/VirtualBox-

    4.1.8-75467-OSX.dmg)

    Run dmg filee file and follow instruction

    https://www.virtualbox.org/wiki/Downloadshttps://www.virtualbox.org/wiki/Downloads
  • 7/30/2019 123545395-hadoop

    4/17

    Integrating HADOOP with Eclipse on a Virtual Machine 4

    Installing VirtualBox

    Linux Prerequisites

    Qt 4.4.0 or higher

    SDL 1.2.7

    dkms

    Download Link (https://www.virtualbox.org/wiki/Linux_Downloads)

    Select the appropriate package for your Linux distribution

    x86/amd64 means 64 bit OS (Intel or AMD)

    CentOS and Fedora

    yum install dkms

    rpm -ivh VirtualBox-4.1-4.1.8_75467_rhel5-1.i386.rpm

    rpm -ivh 4.1.8/VirtualBox-4.1-4.1.8_75467_fedora16-1.i686.rpm

  • 7/30/2019 123545395-hadoop

    5/17

    Integrating HADOOP with Eclipse on a Virtual Machine 5

    Installing VirtualBox

    Ubuntu

    sudo apt-get install dkms

    sudo dpkg -i VirtualBox-3.2_4.1.8_Ubuntu_karmic_i386.deb

    Linux users (Make sure to add a user to VirtuaBox group if no default user

    add there. This user will be used to run virtualbox)

    https://www.virtualbox.org/wiki/Downloads

    https://www.virtualbox.org/wiki/Downloadshttps://www.virtualbox.org/wiki/Downloads
  • 7/30/2019 123545395-hadoop

    6/17

    Integrating HADOOP with Eclipse on a Virtual Machine 6

    Importing Virtual OS to VirtualBox

    Download Virtual OS from CS network (Node1.ova, Node2.ova and

    Node3.ova are optional)

    ftp://131.123.39.73/

    Run VirtualBox (from linux command line run "VirtuaBox")

    Click on File Import Appliance Click on Choose the downloaded

    file (Node1.ova) then click next

    Import

    Repeat the import process for

    Node2 and 3 if you want to use

    Master and slave nodes

    ftp://131.123.39.73/ftp://131.123.39.73/
  • 7/30/2019 123545395-hadoop

    7/17

    Integrating HADOOP with Eclipse on a Virtual Machine 7

    Importing Virtual OS to VirtualBox

    If you have 2GB ram in your machine, click on RAM and reduce the size to

    750 MB and 250 MB for Node2 (Note: Leave at least 1GB for the Host

    Machine and don't run Node3 if you have 2GB or less)

  • 7/30/2019 123545395-hadoop

    8/17

    Integrating HADOOP with Eclipse on a Virtual Machine 8

    Running Virtual OS

    Start Node2 and Node3 before starting Node1 if you decided to use slave

    nodes. Node1 will start tasktracker and nodename on slave nodes if the

    nodes are running (Note: add node3 to Node1:/opt/hadoop/conf/slaves if

    you want to use Node3)

    Note:Start nodes sequentially

    and wait tell you see the logon

    screen for each node before

    starting the next

  • 7/30/2019 123545395-hadoop

    9/17

    Integrating HADOOP with Eclipse on a Virtual Machine 9

    Running Virtual OS

    Username: hadoop

    Password : hadoop1123

    Root: start a terminal

    as a hadoop user and

    run : sudo su

    password: hadoop1123

  • 7/30/2019 123545395-hadoop

    10/17

    Integrating HADOOP with Eclipse on a Virtual Machine 10

    Running Virtual OS

    Run "jps" command

    If you see less than 6 processes

    SecondaryNameNode

    JobTracker

    Jps

    NameNode

    TaskTracker

    DataNode

    Then run this command

    ./hadoop.sh

    Start eclipse when you

    finish To shutdown all machines

    Run this command:

    sudo ./shutdown.shNote: add node3 to the script

    if you use it

  • 7/30/2019 123545395-hadoop

    11/17

    Integrating HADOOP with Eclipse on a Virtual Machine 11

    Running Eclipse

    Once you start eclipse, you will see DFS Locations which contains hadoop

    files. In this location you can view, upload, delete, download files, and

    create or delete directories using eclipse GUI

    Second part is your java files that will be executed on HADOOP

  • 7/30/2019 123545395-hadoop

    12/17

    Integrating HADOOP with Eclipse on a Virtual Machine 12

    Executing WordCount.java on HADOOP

    To execute WordCount Example, right click on WordCount.java Run As

    Run on Hadoop

    Click on HADOOP local Server Finish

  • 7/30/2019 123545395-hadoop

    13/17

    Integrating HADOOP with Eclipse on a Virtual Machine 13

    HADOOP Execution Output

    You can monitor the execution output on Eclipse's Console

  • 7/30/2019 123545395-hadoop

    14/17

    Integrating HADOOP with Eclipse on a Virtual Machine 14

    WordCount.java Output

    Right click on Hadoop Local server and click on Refresh to see the output

    directory.

  • 7/30/2019 123545395-hadoop

    15/17

    Integrating HADOOP with Eclipse on a Virtual Machine 15

    Live Demo

  • 7/30/2019 123545395-hadoop

    16/17

    Integrating HADOOP with Eclipse on a Virtual Machine 16

    References

    http://www.eclipse.org/

    http://hadoop.apache.org/

    https://www.virtualbox.org

    http://www.eclipse.org/http://hadoop.apache.org/https://www.virtualbox.org/https://www.virtualbox.org/http://hadoop.apache.org/http://www.eclipse.org/
  • 7/30/2019 123545395-hadoop

    17/17

    Integrating HADOOP with Eclipse on a Virtual Machine 17

    Questions