PC クラスタを作ろう!!

95
PC クククククククク!! 廣廣 廣廣 廣廣廣廣廣 廣廣廣 廣廣廣廣廣 [email protected] p

description

PC クラスタを作ろう!!. 廣安 知之 同志社大学 工学部 知識工学科 [email protected]. Cluster. clus · ter n. Ⅰ 〔 ブドウ・サクランボ・フジの花などの 〕 房 (ふさ) 〔 of 〕a cluster of grapes 一房のブドウ . Ⅱ 〔 同種類のもの・人の 〕 群れ , 集団 〔 of 〕a cluster of spectators 一団の観客 . a cluster of butterflies チョウの群れ . - PowerPoint PPT Presentation

Transcript of PC クラスタを作ろう!!

  • PC [email protected]

  • Clustercluster n. ofa cluster of grapes .

    , ofa cluster of spectators .a cluster of butterflies .a cluster of stars .in a cluster , .in clusters , .

    New College English-Japanese Dictionary, 6th edition (C) Kenkyusha Ltd. 1967,1994,1998

  • PC

  • PC

    GA

  • 2001.7,8)

  • P2P

  • embarrassing parallel)PPPPPPPP

  • 25100%

  • Features

    High Performance Computing(HA)

  • P2PNapsterGnutella ex. SETI@homeProject rc5Intel Philanthoropic P2P Program

  • PC

  • TopNameRmax(Gflops)7226 252623792144

    1709# Proc81922528 96325808 1152 ASCI WhiteSP Power3 2http://www.top500.orgSR8000/MPP 5ASCI Blue-Pacific 4ASCI Red 31RankingParallel Computers

  • Commodity HardwareCPUPentiumAlphaPower etc.NetworkingInternetLanWanGigabitcable lessetc.PCs + NetworkingPC Clusters

  • PC?hardwareCommodity Off-the-shelf SoftwareOpen sourceFree warePeopleware

  • TopName77.4 104 Presto III 439Rmax(Gflops)547.9 512.4 237

    143.3 80.8 # Proc10241000512528520SCore IIICPlant Cluster

    42http://www.top500.orgABACUS Cluster 389CLIC PIII 156LosLobos10236Ranking

  • PC

  • PC1508nodes + gateway(file server)Fast EthernetSwitching Hub

  • ?HardwareCPUmemorymotherboardhard disccasenetwork cardcablehub

    Normal PCs

  • ?SoftwareOStoolsEditorCompilerParallel Library

  • TCP/IPTCP/IP

  • MPI (Message Passing Interface)PVM (Parallel Virtual Machine)

    PVM was developed at Oak Ridge National Laboratory and the University of Tennessee.

    MPI is an API of message passing.1992: MPI forum1994 MPI 11887 MPI 2http://www.epm.ornl.gov/pvm/pvm_home.htmlhttp://www-unix.mcs.anl.gov/mpi/index.html

  • MPIFree ImplementationMPICH LAMWMPI Windows 95NTCHIMP/MPIMPI LightBender ImplementationImplementations of parallel computersMPI/PRO

  • PCPCOStooltool

  • OS/toolsLinuxGNU Compiler, GDBrsh

  • MPICH/LAM# rpm ivh lam-6.3.3b28-1.i386.rpm# rpm ivh mpich-1.2.0-5.i386.rpm

    # dpkg i lam2_6.3.2-3.deb# dpkg i mpich_1.1.2-11.deb# apt-get install lam2# apt-get install mpich

  • 1(socket)(socket)

  • 2(bind/listen)(gethostbyname)

  • 3(connect)(accept)OK!

  • 4(send/recv)(send/recv)

  • (socket)(listen)(accept)(bind)(send/recv)(close)

  • C//client.c

    #include#include#include#include#include#include#define PORT (u_short)10000#define BUF_LEN 100char hostname[]="localhost";char buf[BUF_LEN];

    main(){ struct hostent *servhost; struct sockaddr_in server; int s;

    servhost = gethostbyname(hostname); bzero((char *)&server,sizeof(server)); server.sin_family = AF_INET; server.sin_port = PORT; bcopy(servhost->h_addr,(char *)&server.sin_addr,servhost->h_length);

    s = socket(AF_INET,SOCK_STREAM,0);

    connect(s,(void *)&server,sizeof(server));

    read(s,buf,BUF_LEN); printf(buf);

    close(s);}

  • C//server.c

    #include#include#include#include#include#include#define PORT (u_short)10000char hostname[] = "localhost";

    main(){ struct hostent *myhost; struct sockaddr_in me; int s_waiting, s; char msg[] = "Hello World!!\n";

    myhost = gethostbyname(hostname); bzero((char *)&me, sizeof(me));

    me.sin_family = AF_INET; me.sin_port = PORT; bcopy(myhost->h_addr,(char *)&me.sin_addr,myhost->h_length);

    s_waiting = socket(AF_INET,SOCK_STREAM,0); bind(s_waiting,(void *)&me,sizeof(me));

    listen(s_waiting, 1); s = accept(s_waiting, NULL, NULL); close(s_waiting);

    write(s, msg, strlen(msg));

    close(s);}

  • Javaimport java.io.*;import java.net.*;import java.lang.*;

    public class Client{ public static void main( String[] args ){ try{ // String host="localhost"; Socket socket = new Socket( host, 10000 );

    // DataInputStream is = new DataInputStream ( new BufferedInputStream( socket.getInputStream()));

    // byte[] buff = new byte[1024]; int a = is.read(buff); System.out.write(buff, 0, a);

    // is.close(); socket.close(); }catch(Exception e){System.out.println(e.getMessage()); e.printStackTrace(); } }}

  • Java//Server.java

    import java.net.*;import java.lang.*;import java.io.*;

    public class Server{ public static void main( String[] args ){

    try{ // ServerSocket svSocket = new ServerSocket(10000); // Socket cliSocket = svSocket.accept();

    // DataOutputStream os = new DataOutputStream( new BufferedOutputStream( cliSocket.getOutputStream()));

    // String s = new String("Hello World!!\n"); byte[] b = s.getBytes(); os.write(b, 0, s.length()); // os.close(); cliSocket.close(); svSocket.close();

    }catch( Exception e ){System.out.println(e.getMessage()); e.printStackTrace(); } } }

  • (MPI)Massive parallel computerPC-ClusterusergatewayJobsTasks

  • MPI# include mpi.h

    int main( int argc, char **argv ){MPI_Init(&argc, &argv ) ;MPI_Comm_size( )MPI_Comm_rank( ) ;

    /* parallel procedure */

    MPI_Finalize( ) ;return 0 ;}

  • Process AProcess BReceive/send dataReceive/send data

  • [Sending] MPI_Send( void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm)void *bufSending buffer starting address (IN)int countNumber of Data (IN) MPI_ Datatype datatypedata type (IN)int destreceiving point (IN)int tagmessage tag (IN) MPI_Comm commcommunicator(IN)

  • [Receiving]MPI_Recv( void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status status)void *bufReceiving buffer starting address (OUT) int sourcesending point (IN)int tagMessage tag (IN) MPI_Status *statusStatus (OUT)

  • ~Hello.c~#include #include "mpi.h"void main(int argc,char *argv[]){ int myid,procs,src,dest,tag=1000,count; char inmsg[10],outmsg[]="hello"; MPI_Status stat; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD,&myid); count=sizeof(outmsg)/sizeof(char); if(myid == 0){ src = 1; dest = 1; MPI_Send(&outmsg,count,MPI_CHAR,dest,tag,MPI_COMM_WORLD); MPI_Recv(&inmsg,count,MPI_CHAR,src,tag,MPI_COMM_WORLD,&stat); printf("%s from rank %d\n",&inmsg,src); }else{ src = 0; dest = 0; MPI_Recv(&inmsg,count,MPI_CHAR,src,tag,MPI_COMM_WORLD,&stat); MPI_Send(&outmsg,count,MPI_CHAR,dest,tag,MPI_COMM_WORLD); printf("%s from rank %d\n",&inmsg,src); } MPI_Finalize(); }

  • -Parallel conversion-

  • BroadcastMPI_Bcast( void *buf, int count, MPI_Datatype datatype, int root, MPI_Comm comm )DataRank of sending point

  • Communication and operation (reduce) MPI_Reduce( void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm comm )Operation handleOperationRank of receiving pointMPI_SUM, MPI_MAX, MPI_MIN, MPI_PROD

  • PC

  • CPUIntel Pentium III, IVAMD Athlon Transmeta Crusoe

    Hardwarehttp://www.intel.com/http://www.amd.com/http://www.transmeta.com/

  • NetworkGigabitWake On LANHardwareEthernetGigabit EthernetMyrinetQsNetGiganetSCI

    AtollVIAInfinband

  • MyrinetMyricomPC EthernetATM

  • NIC

    NIC

  • NIC

    NIC

  • SCSIIDERaidDiskless Cluster

    Hardwarehttp://www.linuxdoc.org/HOWTO/Diskless-HOWTO.html

  • RackHardwareBoxinexpensivecompactmaintenance

  • Software

  • OSLinux KernelsOpen sourcenetworkFree ware

    The /proc file systemLoadable kernel modulesVirtual consolesPackage management

    Features

  • OSLinux KernelsLinux DistributionsRed Hatwww.redhat.comDebian GNU/Linuxwww.debian.orgS.u.S.E.www.suse.comSlackwarewww.slackware.org

    http://www.kernel.org/

  • NFSNetwork File System)NIS (Network Information System)NTP (Network Time Protocol)

  • CONDOR http://www.cs.wisc.edu/condor/DQS http://www.scri.fsu.edu/~pasko/dqs.htmlLSF http://www.platform.com/index.htmlThe Sun Grid Engine http://www.sun.com/software/gridware/

  • EditorEmacsLanguageC, C++, Fortran, JavaCompiler

    GNU http://www.gnu.org/NAG http://www.nag.co.ukPGI http://www.pgroup.com/VAST http://www.psrv.com/Absoft http://www.absoft.com/Fujitsu http://www.fqs.co.jp/fort-c/Intel http://developer.intel.com/software/products/compilers/index.htm

  • MakeCVS DebuggerGdbTotal View http://www.etnus.com

  • MPILamhttp://www-unix.mcs.anl.gov/mpi/index.htmlEasy to useHigh portabilityfor UNIX, NT/Win, Globusmpichhttp://www.lam-mpi.org/High availability

  • MPICH VS LAM SMP)DGAGcc(2.95.3), mpicc-O2 funroll - loops

    # node32 ,2Processor typePentium III 700MHzMemory128 Mbytes OSLinux 2.2.16NetworkFast EthernetTCP/IPSwitching HUB

  • DGAGcc(2.95.3), mpicc-O2 funroll - loopsMPICH VS LAM (# process)

    # node8processorPentium 850MHzmemory256 Mbytes OSLinux 2.2.17NetworkFast EthernetTCP/IPSwitching HUB

  • MPE (MPICH)Paradyn http://www.cs.wisc.edu/paradyn/Vampierhttp://www.pallas.de/pages/vampir.htm

  • WinPVM PVM3.4 WPVMMPI mpich WMPI(Critical Software) MPICH/NT(Mississippi State Univ.) MPI Pro(MPI Software Technology)

  • FAI http://www.informatik.uni-koeln.de/fai/Alinka http://www.alinka.com/Mosix http://www.mosix.cs.huji.ac.il/Bproc http://www.beowulf.org/software/bproc.htmlScyld http://www.scyld.com/Scorehttp://pdswww.rwcp.or.jp/dist/score/html/index.htmlKondara HPC http://www.digitalfactory.co.jp/

  • Math LibraryPhiPac from BerkeleyFFTW from MITwww.fftw.orgAtlasAutomatic Tuned Linear Algebra softwarewww.netlib.org/atlas/

    ATLAS is an adaptive software architecture and faster than all other portable BLAS implementations and it is comparable with machine specific libraries provided by the vender.

  • Math LibraryPETScPETSc is a large stuite of data structures and routines for both uni and parallel processor scientific computing.

    http://www-fp.mcs.anl.gov/petsc/

  • GAsMaster Slave (Micro grained )Cellular (Fine grained)Distributed GAs (Island, Coarse grained)

  • Amdahls lowr: ratio of parallelization

  • Master Slave modela) delivers each individual to slaveMaster nodeclientclientclientclientclientclientclientclientclientb) returns the value as soon as finishes calculationc) sends non-evaluated individual from mastercrossovermutationevaluationselection

  • Cellular GAs

  • Distributed Genetic Algorithms(Island GAs)subpopulationmigration

  • Distributed Genetic Algorithmsisland 0island 1island nGenerationMigration interval= migration rate migrationmigration

  • CambriaVisual Technology # node 256 # CPUs: 256 CPU: Pentium III 0.8GHz Memory:32GB(128MB 256) Hard Disc: Network: FastEithernet Hub:

  • DGA

  • DGAs

  • Web sites

  • Building Linux ClustersHow to Build BeowulfHigh Performance Cluster Computing

  • MPI

  • Web sitesIEEE Computer Society Task Force on Cluster Computinghttp://www.ieeetfcc.org/

    White Paper http://www.dcs.port.ac.uk/~mab/tfcc/WhitePaper/

    Cluster top 500http://clusters.top500.org/

    Beowulf Projecthttp://www.beowulf.org/

    Beowulf Under Groundhttp://www.beowulf-underground.org/

  • IEEETFCCSOFTEK HPCDebian Beowulf

  • .GA