Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

36
Distributed Data Mining System in Java Group Member 王王王 王王王王 王王王 ,,

Transcript of Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

Page 1: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

Distributed Data Mining System in Java

Group Member

王春笙,林俊甫,王慧芬

Page 2: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

Overview of Project Overview of Project

• Project participants– 王春笙,林俊甫,王慧芬

Page 3: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

Project Programming Tasks Project Programming Tasks

• D92725002 林俊甫– Polling and reply Multicast between client and server– Client/Server Socket programming– Client dynamic join and leave mechanism– Multi-thread programming – Synchronization mechanism– Data chunks maintenance and dispatching mechanis

m– Client/Server communication link control

Page 4: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

Project Programming Project Programming Tasks(cont’d)Tasks(cont’d)

– Client failure handling• Reassign backup server, if failure client is backup• Restore failure client works (with 王春笙 )

– Server failure handling• Backup Server designate mechanism and logic design

– RMI mechanism (with 王春笙 )– Basic GUI

Page 5: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

System Infrastructure System Infrastructure

• System diagram

LAN

Server/Coordinator

Client Client Client

...

Mining data chunk

Mining result

Page 6: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

Basic OperationBasic Operation

Server Client1. Polling on port 4444 Group 230.0.0.1@: who is server?

2. Servername: I am the server

3. Connect to <servername, port 4445>

4. Client do: filechunk#

5. ok

6. Client do: next filechunk#

7…..8…..….

Time Time

Listen multicastGroup query and reply Server found;

Connect to the Server

Fork thread to Handle client connection

Receive server’sInstruction, ivokeRMI to get file chunk

Wait for client’sProcessed result,Order client to getAnother file chunk

Page 7: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

Port AssignmentPort Assignment

• Port 4444: for multicast

• Port 4445: for TCP/IP socket connection

• Port 4446: for RMI services

Page 8: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

Finding A ServerFinding A Server

• Once a client start up, it will query periodically every 3 sec. over the multicast group 230.0.0.1 port 4444 by sending 1 byte string “@” to locating the server host.

• Once a server start up, it will fork a thread to dealing with the query

6. Server failure detect -> if I am backup

go to backup serverprocedure, otherwise

go to step.1.

3.Connect to Server on port

4445

2. Listen forserver response

1. Client Query: who is the Server now?

4. Use RMI Get file chunk from

Server

5. Process data mining and return

result to server

Page 9: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

File DispatchingFile Dispatching

• Server maintain a file chunk pool .

• Server will find a available file chunk for client, set it to 1 and order client to get this file chunk by RMI file chunk will be update to 2 when client return result.

• Recovery: When server detects client’s link-broken, it will restore file chunk allocate to client to 0.

• File chunk class is declared as Serializable for RMI message passing to backup server

• File chunk class use Synchronization for concurrent control

FileChunks …………

-1: empty, 0: available, 1: using, 2:used

Page 10: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

Backup Server SelectionBackup Server Selection

• Server maintains and assigns unique id for each individual client.

• Unique id is incremented as serial number.

• Client with smallest id is assigned as backup server

• When client failure, server will check if it is the backup server to restart the selection process or not.

Page 11: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

Nodes MaintenanceNodes Maintenance

• Server maintain connected client’s records in an ArrayList

• ArrayList is compound with class Nodes, which records client’s detail information.

Key Value

Id Address Port Work on Status

ArrayList: ht

Nodes

Page 12: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

RMI ServicesRMI Services

• RMI services is written in independent program because server and client (which acts as backup server) will use it.

• RMI services provides:– Backup server data to backup-server.– Get file chunk from server– Return mining result to server– Receive nodes information from server

Page 13: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

Client FailureClient Failure

• Server’s action took:– Recovery– Reassignment – Redo backup server selection if failure nodes

is backup

• Client’s action– Do nothing except one is told by server to act

as backup

Page 14: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

Server Failure Server Failure Server S Client BTime Time

Server run backupSelection choose AAs backup

TimeClient A

1.A is told by S thatIt is the backupA invoke RMI to get all Server data

A: Do backup

RMI Get file

RMI reply

2. A periodically Get server services,File chunk data do reply

Client do #

Client do #

do reply

1. B receives instruction as discuss before

Server CrashX X3. Comm.link brokenIs detected, start ServerAction class

2. Comm.Link Broken is detected, multicast query who is the server now?

B Polling @: who is server?4. Create server Socket at 4445, fork threadTo listen to query And wait for connection

A reply: I am the server3. B know A is the backup, re-connect to A

Connect to A:4445

Page 15: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

Server/Client Life CycleServer/Client Life Cycle

Server Client

ServerNormal/AbnormalTermination

Normal/AbnormalTermination

evolve

Page 16: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

Project Programming Tasks Project Programming Tasks

• D91725001 王春笙– Web log file preprocessing and separating– Web pages traversal sequences parsing– Page items transferring and mapping– Web pages sequential patterns mining – Mining results maintenance – RMI mining results transfer– Mining results lookup and display

Page 17: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

Project Programming Project Programming Tasks(cont’d)Tasks(cont’d)

– Backup mechanism • Separate thread backup server files and memory data • Restore failure client works (with 林俊甫 )

– RMI mechanism (with 林俊甫 )– GUI global states refreshment– System integration

• Testing and debugging

Page 18: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

Web Log File FormatWeb Log File Format

• User IP

• Date

• Time

• Web pages URL

Page 19: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

Web File PreprocessingWeb File Preprocessing

• Select *.htm and *.html pages

• First sort by user ID

• Second sort by time

• Pages sequences separated by time– more than 30 seconds

Page 20: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

Chunk Data FilesChunk Data Files• Part*.ppp

• Items.ppp

6023 2 1 1 2 86024 1 1 2066025 7 1 1 1 1 1 1 1 2 5 17 18 19 20 116026 3 1 1 1 144 145 3386027 2 1 1 2 96028 3 1 1 1 2 8 3

/~visualdep/htm/p5b.htm 168/~businessdep/student/picture.html 169/~comedu/inde.htm 170/~account/91tuition.htm 171/~stuaffair/life/procedure-17.htm 172/~stuaffair/life/procedure-25.htm 173

Page 21: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

Apriori algorithmApriori algorithm

• 1:find all L1

• 2:generate C2 from L1

• 3:count C2 and find all L2

• 4:k=3

• 5:generate & prune Ck from Lk-1

• 6:count Ck and find all Lk

• 7:if Lk not empty then k++, goto 5

Page 22: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

Apriori algorithm Apriori algorithm (cont’d)(cont’d)

• join phase:s1 join s2 if s1(drop first) = s2(drop last)

– s1 join s2 =>

• prune phase:delete a k candidate if any k-1 sub sequence not large

• C & L are stored in hash data structure

},{},,{ 21 absbas

},,{ aba

Page 23: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

Mining Result DisplayMining Result Display• Client frequent patterns

– Web page ID– Support– Saved as *.pppl files

• Client frequent patterns– Web page ID– Support– Web page name

Page 24: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

Backup MechanismBackup Mechanism

• When backup server selected, that client start a backup thread

• Backup thread loop every 0.5 second

• RMI data transfer– Chunk data file(part*.ppp,items.ppp)– Client information– File chunk information

• determine MaxID and set “in use” to “available”

– Frequent patterns information

Page 25: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

System IntegrationSystem Integration

• Java class integration– Server component– Client component– Data mining component– GUI component

• Testing

• Debugging

Page 26: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

Project Programming TasksProject Programming Tasks

• D92725001 王慧芬

– Graphical User Interface• Since this is a system working on data mining task

in a distributed way, its GUI provides four panels :– A system console– A result window– A connection table– A graphical network configuration

Page 27: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

GUIGUI

• The system console shows how system proceeds

Page 28: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

GUI (cont’d)GUI (cont’d)

• The result window displays the progress and results of data mining

Page 29: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

GUI (cont’d)GUI (cont’d)• A connection table lists all of the on-line

client connection information

Page 30: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

GUI (cont’d)GUI (cont’d)• A connection table consists of 5 fields

– NO: client-server connection id– IP address: client’s IP address– Port: client’s port number– Status: connection status, it could be

• 0: offline 1: online• 2: file transfer from server to client• 3: client is doing data mining• 4: client returns value back to server if data mining finished• 5: client is doing the backup and data mining at the same time

– # chunk works on: if data mining and backup, it indicates the chuck number that the connection works on

Page 31: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

GUI (cont’d)GUI (cont’d)• A graphical network configuration follows the

connection table to depict the dynamic network configuration

Page 32: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

GUI (cont’d)GUI (cont’d)

• In the dynamic network configuration, we use different client GIFs to express the status :– Offline On-line

– Data mining

– Backup and mining

Page 33: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

GUI interfaceGUI interface• mw.showMsg()

– provided by GUI for server/client module to show the console message

• mw.showResultString()– provided by GUI for server/client module to show the re

sults of data mining

• Connection table– modified by server/client module for connection inform

ation– read by GUI every 0.01 second to depict the dynamic n

etwork configuration

Page 34: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

GUI designGUI design

• Java swing is used to generate label, text, scrollbar, and table, etc..

• Java AWT 2D painting is used to form the animation of the connection lines in the dynamic configuration panel

• ‘Photo Impact’ and ‘GIF animator’ are used to generate the node icons

• EasyRGB used to tune the color harmonies.

Page 35: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

GUI design (cont’d)GUI design (cont’d)• A new thread is forked from the GUI task to work on the

animation of the connection lines in the dynamic configuration panel,

– to read the table

every 0.03 second and

to show the connection

status with a moving

ball.

GUI

Generateconnection

table

Generateresult panel

Generatesystemconsole

Generateconnection

table

animation

Page 36: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬.

InstallationInstallation

• 以執行一個 server ,兩個 client 為例– 建立三個資料夾,此三資料夾 Ser(Server),Cli(Client1),Cli2(Client

2)– 將附檔解壓至 Ser 資料夾,此資料夾內要下載 weblog10.zip 檔,

並解壓– 將附檔解壓至 Cli 與 Cli2 的空資料夾– 開啟二個 dos 視窗 (1,2 號視窗 ) ,進入 Ser 資料夾– 開啟三個 dos 視窗 (3,4,5 號視窗 ) , 3,4 號進入 Cli 資料夾, 5 號

進入 Cli2 資料夾– 1 號視窗執行 compile.bat 批次檔,再執行 rmi.bat– 2 號視窗執行 server.bat 批次檔– 3 號視窗執行 compile.bat 批次檔,再執行 rmi.bat– 4 號視窗執行 client.bat 批次檔– 5 號視窗執行 compile.bat 批次檔,再執行 client.bat 批次檔