Optimisation of Data Access in Optimisation of Data Access in Grid Environment* Grid Environment*
Darin NikolowDarin Nikolow11 Renata Słota Renata Słota11
Łukasz DutkaŁukasz Dutka1 1 Jacek Kitowski Jacek Kitowski1212
Piotr NyczykPiotr Nyczyk1 1 Mariusz DziewierzMariusz Dziewierz11
11Institute of Computer Science - AGHInstitute of Computer Science - AGH22Academic Computer Centre CYFRONET - AGHAcademic Computer Centre CYFRONET - AGH
University of Mining and Metallurgy, Cracow, PolandUniversity of Mining and Metallurgy, Cracow, Poland
Cracow Grid Workshop, Nov.5-6, 2001 *CrossGrid Project - Task 3.4
OutlineOutline
Background Bottom-top approach Media management software
– middleware for existing HSM– dedicated VTSS
Local component-expert systems Global policy for migration/replication
FOR MORE INFO...
http://www.icsr.agh.edu.pl/
MotivationMotivation
Big and growing stuff of data Multimedia database systems (applications - medical, educational,
virtual reality, virtual laboratories, digital libraries, advanced simulations, ...)
Solution: Tertiary Storage Systems (TSS) = Media Libraries + Management Software
Examples of existing TSS:• HPSS, DataCutter, APRIL, Condor, OmniStore, UniTree, ......
Possible directions– Data access time estimation system - efficient usage– Data distribution and grid implementation - large scale experiments– Expert system for data management– Replication policies
PARMED Project(Uni. of Klagenfurt - Uni. of Mining & Metall. Cracow)
– to support physicians with telematic services for:
• long distance collaboration of medical centers,
• medical teleeducation• case archives
BackgroundBackground
ClientClientClient Site 1
Video Server
Storage Server
ClientClientClient
Site 2Video Server
ClientClientClient Site 3
Storage ServerClientClientClient Site 4
Disk Server
Meta-Database
WAN
r1
a1
d1
r2a2
d2
r3a3
d3
Media Management SoftwareMedia Management Softwareand its usage in X#and its usage in X#
Darin Nikolow
MotivationMotivation
Main purpose of the developed TSS: efficient index-based retrieving of video fragments (instead of file fragments)
– specific requirements for frequent data reading• startup latency• transfer time • minimal transfer rate > video bitrate
Two prototypes proposed and benchmarked– middleware layer for existing HSM– dedicated TSS
The developed systems are of general use -> possible grid implementations
Multimedia Storage and Retrieval System (MMSRS)Multimedia Storage and Retrieval System (MMSRS)
Requirements– use existing software (UniTree HSM)– reduce latency (start-up delay), i.e. -reduce
file granularity– file fragmentation (subfiles)
Implementation– splitting files into pieces of similar size
Middleware layer on HSM Consists of:
– Automated Media Library– UniTree HSM managing system– MPEG extension for HSM (MEH)
MEH receives the name of video file and the frame range - start/end frames
output stream via HTTP
Video Tertiary Storage System (VTSS)Video Tertiary Storage System (VTSS)
Repository Daemon REPD
– keeps repository information
Tertiary File Manager Daemon TFMD
– manages:filedb - tape ident and startup position of the fragmenttapedb - information about tape usage Dedicated TSS
Client requests to VTSS can be of the following kinds:– write a new file to VTSS, read a file fragment from VTSS, delete a file from VTSS.
The fragment range is defined in the frame units Two daemons implemented in C using Unix sockets
MMSRS and VTSS performanceMMSRS and VTSS performance
Hardware (AML Quantum|ATL)– ATL 4/52 (DLT 2000)– ATL 7100 (DLT 7000)– HP D-class server (with UniTree HSM)
Data– 790 MB MPEG1 file with B=0.4 MB/s bitrate (33 min.)– subfile for MMSRS - 16 MB (8,16, 32 MB tested)
• as short as possible to keep reproducing smooth (low latency)• “optimal” subfile length depends on
– positioning time – drive transfer rate – bitrate of the video file
DLT2000 DLT7000
load time [s] 60 37maximal position time[s] 120 120transfer rate [MB/s] 1.25 5
BenchmarksBenchmarks
Startup latency - time elapsed from issuing the request to receiving the first byte
Transfer time - time from receiving the first byte till the end of transmission
Minimal rate - minimal transfer rate experienced by a client with endless buffer (should be greater than the bitrate of the video stream to have smooth reproduction)
System performance for the whole System performance for the whole video file transfer (DLT2000)video file transfer (DLT2000)
UniTree MMSRS VTSS
startup latency [s] 718 90 70
transfer time [s] 135 710 617
avarege rate [MB/s] 5.85 1.11 1.17
total transfer time [s] 853 800 747
total throughput [MB/s] 0.93 0.99 1.06
For DLT2000:– T = 10 GB– N = 64– Br = 0.4 MB/s
Minimal transfer Minimal transfer raterate
MMSRS (DLT2000)MMSRS (DLT2000)
VTSS (DLT7000)VTSS (DLT7000)
VTSS (DLT2000)VTSS (DLT2000)
Qdt = 400 sQdt = 400 s
For DLT7000:– T = 35 GB– N = 52– Br = 0.4 MB/s
Qdt = 1723 s Qdt = 1723 s
Access Time Estimation: Access Time Estimation: Motivation for X#Motivation for X#Retrieving a file from TSS could last few
seconds or few hoursUser’s satisfaction increases when the access
time of data is known (e.g. user waiting to watch selected video; administrator recovering from backup)
Efficient use of storage resources in Grid environment (data replication subsystem)
Access Time Estimation: Access Time Estimation: ApproachesApproaches
Open TSS approach• source code changes• will be used as experimental platform
Black Box TSS approach - for existing HSMs in X# sites
• retrieving TSS’s state info via its native tools and available internal files
Access Time Estimation - Access Time Estimation - Black Box TSS ApproachBlack Box TSS Approach
TSS
databasesconf. files
logs
Monitoring tools
Disk cache
TSS Monitor
TSSSimulator
Request Monitor & Proxy
Client
events collecting update [4]TSS state
[5]
ETA [6]fileid [2]
queue state [3]
feedback [12]
data [10]
fileid ETA? [1]
ETA [7]fileid [8]data [11]Needed info by Simulator:nr of drivestape labelsmedia typesposition of file in medianr of requests...
fileid [9]
ConclusionsConclusions
MMSRS and VTSS more efficient than standard UniTree HSM MMSRS efficient enough to be used as a middleware for existing
HSM of UniTree type (in X# sites) Proposed measurements could be used for:
– building more sophisticated distributed storage systems (faster access to files stored in TSS)
– building access time estimation subsystem Access time estimation subsystem
--->>> an information provider for X# replication and migration of data
http://www.icsr.agh.edu.pl/
Basics of Component-Expert Basics of Component-Expert Technology and its usage in X# Technology and its usage in X#
Łukasz Dutka
Programusing component
technology
ComponentManagmentSubsystem
UID1
UID2
UID3
Components container
ComponentUID1
ComponentUID2
ComponentUID3
Classical Classical component strategycomponent strategy
Component-expert strategyComponent-expert strategyProgram using
component-experttechnology
ExpertComponent
ManagementSubsystem
(TID1, Env1)
(TID2, Env2)
(TID3, Env1)
Get (TID1, ENV1)
Get (TID2, ENV2)
Get (T
ID3,
ENV1)
Rule-basedexpert system
Fin
d th
e be
st c
ompo
nent
type
TID
x fo
r E
NV
x
ComponentsContain
Component(TID1, SPECa)
Component(TID2, SPECb)
Component(TID3, SPECa)
Components container
Get
info
rmat
ion
abou
t all
com
pone
nts
type
TID
xSystem
knowledgedatabase
Get additionalinformations
Com
ponen
ts differe
nt types m
ay have
the sa
me sp
ecialization
s
Co
mpon
ent typ
e T
ID3
Ko
mpo
nen
t type TID
2C
ompo
nen
t type
TID
1
(TID2, Env3)
Get
(TID
2, E
NV3
)
Component structureComponent structure
A code of the component
Dat
a st
ream
An input parametersstream
An output parametersstream
A header which describe atype and a specialization
Component header structureComponent header structure
Code of the component
Type: Type_of_Component (TIDx)
Attribute_1 = Value_1Attribute_2 = Value_2.........................
Attribute_n = Value_n
Com
ponen
t
specializa
tion
SP
EC
x
Structure of component codeStructure of component code
service run( call-environment,control-parameters)
{
service code
}
Dat
a st
ream
An input parametersstream
An output parametersstream
Call-EnvironmentCall-Environment
Describe state of the call place
Describe call place requirements
Caries information about user or programmer wishes
Expert system processes Call-Environment and finds best component for given Call-Environment
Rule-basedexpert system
Program usingcomponent-expert
technology
ExpertComponent
ManagementSubsystem
(TID1, Env1)
(TID2, Env2)
(TID3, Env1)
Get (TID1, ENV1)
Get (TID2, ENV2)
Get (TID3, ENV1)
(TID2, Env3)
Get (TID
2, ENV3)
Env1={Attribute_1=Value_1,Attribute_2=Value_2,…., Attribute_k=Value_k}
Env3={Attribute_2=Value_2,Attribute_4=Value_4,…., Attribute_z=Value_z}
Expert SubsystemExpert Subsystem
Rule-based expert systemTypical rule looks like If log-expr Then action1 Else action2
The rules describe what is meant by: The best component for given Call-Environment
Expert system logs calls and stores deduction results for further analysis
Profits from Component-Expert Profits from Component-Expert technology technology Dynamic expanding system possibility Ease of solving new problems Minimising programmer responsibility for component
choice Ease of programming in heterogeneous environment Maximal reusable of components Internal simplicity of components code Increase efficiency of programming process
Component-Expert Technology Component-Expert Technology for X# Task 3.4for X# Task 3.4
Basic analysis of Data-access Basic analysis of Data-access problems in X#problems in X# Different data set types Huge data files Distributed environment Long distance connections Mission critical applications Heterogeneous data storing systems Heterogeneous computing systems Open system Unpredictable file types
Basic connection diagram Basic connection diagram
Fires oneprocess
Storage Center
Storage Center
Storage Center
Computing Center
Computingnode
Storage Center
User
ComputingCenter
ComputingCenterComputing
Center
Component-ExpertSystem
GlobalExpertSystem
Component-ExpertSystem
Component-ExpertSystem
Component-ExpertSystem
Sequence DiagramSequence DiagramComputation Node Global Expert System
Local Component-ExpertSystem
Access Server
Get Access Strategy (ENV)
Ask for Capability
Test Capability
Use appropriate test component
Result(Access Strategy)
Get appropriate access component (ENV)
Fire choosen component
HandleHandle
Connect to data stream (Handle)
Data
Data
EOF
Example of Component-Expert Example of Component-Expert technology usage for data access in X#technology usage for data access in X#
Sample Attributes– User ID– Computing Node ID– Preferred replica
localisation– Required throughput– Application purpose– Data sharing– Critical level– Replica expiration .....
Example of local decisions– Devices choosing
(according to availability and type)
– Storing format (blocks, multimedia streams,......)
– Available delivering performance (network, storage devices,....)
– ... And much more ...
System Management for System Management for Migration/Replication Strategies (2/2)Migration/Replication Strategies (2/2) In cooperation with other projects High-level control system (e.g. cooperating with LDAP) Two possible realizations
– heuristic reinforcement learning based on heuristic strategies for migration/replication and system state
– classical rule-based expert system
ConclusionsConclusions
Some elements have been defined and implemented
Working on higher level structure and cooperation with other X# modules and services
Top Related