Installation of Cluster using Rocks
VadivelanVighnesh
Date: 7/04/2010Department of Aerospace EngineeringIndian Institute of Technology Bombay
Hardware configuration
Hyper
Model HP DL 140 G3 HP DL 160 G5 HP DL 160 G5
CPU cores 2 4 4
Processor/core 2 2 2
Processor configuration
Cache Size 4096 KB 6144 KB 6144 KB
RAM 4040864 B 8182224 B 8182224 B
Hard disk 60 GB 160 GB 2 TB
Hyperx Headnode Hyperx NAS
Intel Xeon CPU 5150 @ 2.66 Ghz Intel Xeon CPU E5430 @ 2.66 GHz Intel Xeon CPU E5430 @ 2.66 GHz
Pre-installation knowledge
Supported hardware – Hyper will not support Rocks-5.2 and Hyperx will not support Rocks-4.3
Eth1 and eth0 – Check which port is public and private from Catlog. For Rocks eth0→private and eth1→public
Cable connections – Between node-to-ethernet switch and ethernet-to-ethernet
Compatibility with OS that is going to be installed – Check OS is compatibility with Softwares like PGI
Availability of required rolls CDS or DVDS for rocks installation
Source: http://www.rocksclusters.org/wordpress
Rolls
Rolls Base – contains basic utilities for starting cluster
installation
Bio - is a collection of some of the most common bio-informatics tools
Ganglia - installs and configures the Ganglia cluster monitoring system
HPC – OPENMPI, MPICH, MPICH2 packages are included in this roll
Kernel – includes all the kernel utilities
OS – contains operating system utilities, Centos 5.1 for rocks 5.2
Rolls Area 51 - contains utilities and services used to
analyze the integrity of the files and the kernel on your cluster
Torque – contains torque/maui job scheduling and queuing tools
Web-server – required for setting up cluster on public internet for monitoring purpose
Service-pack – contains all the bug fixes for rocks version
PGI – contains all the pgi compiler packages
Intel-Developer – contains all the intel compiler packages
Hyperx Cluster Layout
Building of rocks cluster :The beginning
Download iso images of rolls from rocks web site Burn the iso image of package jumbo roll on dvd Burn the iso images of additional required rolls on cds Mount the jumbo roll DVD on the DVD rack of node which
will act as frontend node of the cluster Or connect the USB of external DVD drive to frontend if
there is no onboard DVD drive
Building of rocks cluster :Frontend installation
Boot the frontend node. The boot order in bios should be set to CD/DVD drive first
After frontend boots up following screen will be displayed
If you have used onboard CD/DVD drive type 'build'
If you have used external CD/DVD drive type 'build driverload=usb-storage'
Note: The above screen will remain for few seconds, if you missed to type this commands the node will be installed as a compute node and you have to reboot the frontend and reinstall node as a frontend
Building of rocks cluster :Frontend installation
Soon the following screen will be displayed
From this screen, you'll select your rolls.
In this procedure, we'll only be using CD media, so we'll only be clicking on the 'CD/DVD-based Roll' button.
Click the 'CD/DVD-based Roll' button
Building of rocks cluster :Frontend installation
The CD tray will eject and you will see this screen
Put your first roll in the CD tray (for the first roll, since the Jumbo Roll DVD is already in the tray, simply push the tray back in).
Click the 'Continue' button.
Building of rocks cluster :Frontend installation
The Rolls will be discovered and display the screen
Select the desired rolls listed earlier and click submit
Building of rocks cluster :Frontend installation
Network settings for private (eth0)
Hyperx : 192.168.2.1/255.255.0.0
Building of rocks cluster :Frontend installation
Network settings for public (eth1)
Hyperx : 10.101.2.3/255.255.255.0
Building of rocks cluster :Frontend installation
Network Settings
Hyperx : 10.101.250.1/10.200.1.11
Building of rocks cluster :Frontend installation
Head Node Partition
By default Auto partition will work
Building of rocks cluster :Frontend installation
Manual Partition done for hyperx
/root : 20GB
swap : 8GB
/var : 4GB
/export : Remaining
Building of rocks cluster :Frontend installation
Installation Process Starts
Building of rocks cluster :Frontend disk partition
$df -hFilesystem Size Used Avail Use% Mounted on/dev/sda1 19G 6.4G 12G 36% //dev/sda4 107G 5.8G 96G 6% /export/dev/sda3 3.8G 429M 3.2G 12% /vartmpfs 4.0G 0 4.0G 0% /dev/shmtmpfs 2.0G 16M 1.9G 1% /var/lib/ganglia/rrdsnas-0-0:/export/data1 1.3T 78G 1.2T 7% /export/home
Building of rocks cluster :Client installation
Login as a root in frontend and type 'Insert-ethers'
PXE boot the node which will act as first compute node for cluster
For compute nodes installation select Compute
Building of rocks cluster :Client installation
After frontend accepts dhcp requests from compute node, comunication between frontend and compute will be established and following screen will be displayed
During the process, frontend will detect the mac id of the compute node which is going to install
Building of rocks cluster :Client installation
After discovering the mac id, frontend will allot hostname and ipaddress for the particular compute node
Frontend will name compute-0-0 to first compute node detected and will continue as compute-0-1, compute-0-2 and so on
Image shows, kickstart started on the compute node (which means installation process started)
Building of rocks cluster :Client installation
The installation process starts and following screen will be displayed on compute node
For each compute node installation follow the steps presented in slide 20 to 23
After all the compute nodes in rack 0 are installed, for installing compute nodes on rack 1 type 'insert-ethers -rack=1 -rank=0'. This will start naming the compute nodes in rack 1 as compute-1-0 and so on
Building of rocks cluster :NAS installation
Login as root in frontend and type 'insert-ethers'
PXE boot the node which will act as NAS node of cluster
Frontend will name as a nas-0-0 for first NAS node detected and so on, same as that for compute nodes
For I/O node installation Select NAS Appliance
Building of rocks cluster :NAS Configuration
All installation steps of NAS are similar to compute. Accept the manual partitioning of NAS disk space
For hyperx NAS node: Manual partitioning /root : 20GB
swap : 8GB
/var : 4GB
/export : Remaining
In NAS the default home directory is /export/data1
Building of rocks cluster :NAS partition
$df -hFilesystem Size Used Avail Use% Mounted on/dev/cciss/c0d0p4 19G 3.4G 15G 19% //dev/mapper/VolGroup00-LogVol00 1.3T 78G 1.2T 7% /export/data1/dev/cciss/c0d0p2 1.9G 90M 1.8G 5% /boottmpfs 4.0G 0 4.0G 0% /dev/shm
Building of rocks cluster :NAS NFS configuration
After Nas node boots up, carry out the following steps to set NAS as NFS server of the cluster:
Edit the file /etc/exports in nas node and add the following lines
“/export/data1 192.168.2.0/255.255.255.0(rw,no_root_squash,sync) “
Edit the file /etc/fstab in head node and add the following lines
“nas-0-0:/export/data1 /export/home nfs defaults 0 0”
Run the command
#mount -a
This steps will mount /export/data1 directory of NAS node on /export/home directory of every other nodes of cluster with private ip address ranging from 192.168.2.0 to 192.168.2.255
Building of rocks cluster :NAS imp. directories
$ls -l /export/total 4drwxr-xr-x 25 root root 4096 Apr 17 09:47 data1$ls -l /export/data1/total 128drwx------ 18 amjad amjad 4096 Mar 30 09:52 amjad -rw------- 1 root root 9216 Apr 17 09:47 aquota.group -rw------- 1 root root 9216 Apr 17 09:52 aquota.userdrwx------ 7 asgerali asgerali 4096 Apr 2 11:56 asgeralidrwx------ 6 avinash avinash 4096 Apr 17 10:00 avinash drwx------ 6 ayan ayan 4096 Jan 23 11:05 ayandrwx------ 12 bharat bharat 4096 Apr 16 13:38 bharatdrwx------ 10 halbe halbe 4096 Apr 11 21:57 halbedrwx------ 12 krish krish 4096 Mar 28 12:23 krishdrwx------ 2 root root 16384 Jan 5 22:05 lost+founddrwx------ 25 nileshjrane nileshjrane 4096 Apr 18 22:11 nileshjrane drwx------ 11 nitin nitin 4096 Mar 29 00:58 nitindrwx------ 23 pankaj pankaj 4096 Apr 16 13:27 pankaj drwx------ 4 prasham prasham 4096 Jan 28 21:08 prasham
Building of rocks cluster :frontend imp. directories
$ls -l /export/total 4drwxr-xr-x 9 root root 4096 Apr 16 11:53 appsdrwxr-xr-x 4 biouser root 4096 Jan 6 18:19 biodrwxr-xr-x 25 root root 4096 Apr 17 09:47 homedrwx------ 2 root root 16384 Jan 6 17:40 lost+founddrwxr-xr-x 3 root root 4096 Jun 18 2009 rocksdrwxr-xr-x 3 root root 4096 Jan 6 18:08 site-roll$ls -l /export/home/total 128drwx------ 18 amjad amjad 4096 Mar 30 09:52 amjad-rw------- 1 root root 9216 Apr 17 09:47 aquota.group-rw------- 1 root root 9216 Apr 17 09:52 aquota.userdrwx------ 7 asgerali asgerali 4096 Apr 2 11:56 asgerali$ ls -l /export/apps/total 20drwxr-xr-x 3 root root 4096 Jan 21 00:25 modulesdrwxr-xr-x 3 root root 4096 Apr 16 11:54 mpich2drwxr-xr-x 4 root root 4096 Jan 13 00:47 old.pgi-9.0.4drwxr-xr-x 3 root root 4096 Feb 17 09:50 openmpidrwxr-xr-x 4 root root 4096 Mar 11 15:47 pgi-7.2.4
Adding user and setting user home directories
For creating user account carry out the following steps as root login to frontend:
First run the commands#rocks set attr Info_HomeDirSrv nas-0-0.local#rocks set attr Info_HomeDirLoc /export/data1
Add user with the following command#adduser <username>
After user addition task is over, to synchronise the user information throughout all the nodes of cluster, run the command#rocks sync users
To remove user account, run the commands:#umount /export/home/<username> (as root on frontend)#userdel <username> (as root on NAS)
Adding roll to running cluster
# rocks add roll /path/to/<roll_name>.iso# rocks enable roll <roll_name># rocks create distro# rocks run roll <roll_name> | sh# reboot
After the the frontend comes back up you should do the following to populate the node list:# rocks sync configthen kickstart all your nodes# tentakel /boot/kickstart/cluster-kickstartAfter the nodes are reinstalled they should automatically pop up in the queueing system.
For adding roll to already running cluster run the following commands on frontend as a root
Adding packages to the cluster(compilers/softwares)
All the packages has to be installed in /share/apps. This directory is the default NFS directory of frontend, mounted on all other nodes of the cluster.1) First example: Installation of pgiDownload tar package file from PGI site and run the following cmds.#tar zxvf pgi-7.2.4.tgz#cd pgi-7.2.4#./configure --prefix=/share/apps/pgi-7.2.4#make#make install This will install PGI-MPICH compiler in /share/apps directory of frontend and thus the compiler can be used on all the compute nodes Copy the license file into /share/apps/pgi-7.2.4/
Adding packages to the cluster II(compilers/softwares)
Add following lines in bash script file for setting up environment variable:
export PATH=/share/apps/pgi-7.2.4/linux86-64/7.2-4/bin:$PATHexport PGI=/share/apps/pgi-7.2.4
export LM_LICENSE_FILE=$PGI/license.datexport LD_LIBRARY_PATH=/share/apps/pgi-7.2.4/linux86-64/7.2-4/lib:$LD_LIBRARY_PATH
2) Second Example: Configuring OPENMPI with PGI
Download tar packaged file from http://www.open-mpi.org/ and
run the following commands#tar xvzf openmpi-1.3.tgz#cd openmpi-1.3#./configure CC=pgcc CXX=pgCC F77=pgf77 F90=pgf90 --prefix=/share/apps/openmpi/pgi/ --with-tm=/opt/torque#make#make install
Adding packages to the cluster III(compilers/softwares)
Add following lines in bash script file:export PATH=/share/apps/openmpi/pgi/bin:$PATHexport LD_LIBRARY_PATH=/share/apps/openmpi/pgi/lib:$LD_LIBRARY_PATH
Commands to compile and execute mpi codes For mpich-pgi shared library compilier use (fortran,C,C++) Compiling: $/share/apps/pgi-7.2.4/linux86-64/7.2/mpi/mpich/bin/mpif90 code.f -o code.exe Executing: $/share/apps/pgi-7.2.4/linux86-64/7.2/mpi/mpich/bin/mpirun -np <number> code.exe
For openmpi-pgi shared library compilier use (fortran,C,C++) Compiling: $/share/apps/openmpi/pgi-7.2.4/bin/mpif90 code.f -o code.exe Executing: $/share/apps/openmpi/pgi-7.2.4/bin/mpirun -np <number> code.exe
Commands to submit job in interactive mode or batch mode
In interactive mode:$qsub -I -q <dual or quad> -l nodes=<no. of nodes>:ppn=<4 or 8>
In batch mode:$qsub job.sh
Content of job.sh:$cat job.sh#!/bin/sh
#PBS -q quad -l nodes=2:ppn=8,walltime=12:00:00
code=code.exe#==================================# Dont modify below lines#==================================cd $PBS_O_WORKDIRecho `cat $PBS_NODEFILE` > host/usr/bin/killsh $code $PBS_O_WORKDIR
/share/apps/pgi-7.2.4/linux86-64/7.2/mpi/mpich/bin/mpirun -machinefile $PBS_NODEFILE -np `cat $PBS_NODEFILE | wc -l` ./$code
Building of rocks cluster :Quota Setup
Quota setup has to configured in I/O node only.
#quotacheck cvguf /home
#quotaon /home
#edquota <username>
Building of rocks cluster :Queue Setup on frontend
#qmgrQmgr:create queue dualQmgr:set queue dual queue_type = ExecutionQmgr:set queue dual acl_host_enable = FalseQmgr:set queue dual acl_hosts = compute-0-0Qmgr:set queue dual acl_hosts += compute-0-1Qmgr:set queue dual resources_default.walltime = 12:00:00Qmgr:set queue dual enabled = TrueQmgr:set queue dual started = True
Qmgr:create queue routeQmgr:set queue route queue_type = RouteQmgr:set queue route route_destinations = quadQmgr:set queue route route_destinations += dualQmgr:set queue route enabled = FalseQmgr:set queue route started = TrueQmgr:exit
Building of rocks cluster :Check current queue configuration
#qmgr -c 'p s'## Create queues and set their attributes.### Create and define queue route#create queue routeset queue route queue_type = Routeset queue route route_destinations = quadset queue route route_destinations += dualset queue route enabled = Falseset queue route started = True## Create and define queue quad#create queue quadset queue quad queue_type = Executionset queue quad acl_host_enable = Falseset queue quad acl_hosts = compute-1-9set queue quad acl_hosts += compute-1-8set queue dual resources_default.walltime = 12:00:00set queue dual enabled = Trueset queue dual started = True
## Create and define queue dual#create queue dualset queue dual queue_type = Executionset queue dual acl_host_enable = Falseset queue dual acl_hosts = compute-0-9set queue dual acl_hosts += compute-0-18set queue dual acl_hosts += compute-0-8set queue dual resources_default.walltime = 12:00:00set queue dual enabled = Trueset queue dual started = True
## Set server attributes.#set server scheduling = Trueset server acl_host_enable = Falseset server acl_hosts = hyperx.aero.iitb.ac.inset server managers = [email protected] server managers += [email protected] server default_queue = routeset server log_events = 511set server mail_from = admset server query_other_jobs = Trueset server scheduler_iteration = 600set server node_check_rate = 150set server tcp_timeout = 6set server next_job_number = 3413
Building of rocks cluster :Check current queue configuration
Reboot hyperx rocks run hosts “reboot”
Thank You
Top Related