High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

58
DIS Presentation DIS Presentation High High Performance Performance Web Site Web Site R88725030 劉劉 R88725037 劉劉 劉劉劉劉 劉劉劉劉劉

Transcript of High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Page 1: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

DIS PresentationDIS Presentation

High PerformanceHigh Performance

Web SiteWeb Site

R88725030 劉朝銘

R88725037 程左一

指導老師:莊裕澤教授

Page 2: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

1/57

Outline

• Performance issues

• Typical Methods

• Case study

Page 3: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Performance Issues and Related Work

• The overall performance of the web is determined by the performance of the components which make up the web: the servers, the clients, the proxies, the networks, and the protocols used for communication.

1997, Martin F. Arlitt 2/57

Page 4: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Performance Issues and Related Work

• Caching still appears to be a promising approach to improving Web performance because of the large number of requests for a small number of documents, the concentration of references within these documents, and the small average size of these documents. ~ 1997, Martin F. Arlitt

3/57

Page 5: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

The clients

• Client connection bandwidth – The Web server has to maintain open connection with slow clients while request is being satisfied.

• Efficient Web browsers can use file caching to reduce the loads that they put on web servers and network links.

• Client computing - Java applets.

4/57

Page 6: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

The servers

• Identify performance bottlenecks and evaluate the performance impact of different Web server designs to reduce response times.

• Use file caching in reducing Web server loads.

• CGI improvement, dispatcher, replication, caching, system calls

5/57

Page 7: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

The proxies

• Researchers at several institutions are studying various cache replacement policies for Webs proxies:

• The most widely deployed proxy server,Squid, uses the LRU replacement policy. ( http://squid.nlanr.net/ )

LRU Doubly linked queue

LFU Priority queue Cache pollution

GDS Greedy Dual-Size Size, cost function

6/57

Page 8: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

The networks

• Several studies have suggested the use of network file caches to reduce the volume of traffic on the Internet.

• Wide-area file system with WWW, such as AFS, have mechanisms to address performance, reliability and security.

• Researchers at NLANR have implemented a prototype hierarchy of caches, and are currently focusing on configuring and tuning caches within the global hierarchy.(Squid)

7/57

Page 9: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

The protocols

• The current protocol used for client-server interaction within the WWW(HTTP/1.0) is very inefficient.– Connect -> request -> response -> disconnect– stateless

• A more efficient approach(HTTP-NG) would allow for multiple client requests to be sent over a single TCP connection.

8/57

Page 10: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Key Improvements in HTTP/1.1

• Bandwidth Optimization– Allow a client to request portions of a resource.

• Persistent Connection– Clients, servers, and proxies assume that a connection

will be kept open.

• Pipelining– A client need not wait to receive the response for one

request before sending another request on the same connection.

9/57

Page 11: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Components Affecting Web Server Performance

• Server software:Architecture, Protocol, CGI programs, caching, m

irroring, security, application level I/O buffer, compiler performance options, processing on the client side.

• Operating system

• The network~1996, Vittorio Trecordi and Alberto Verga

10/57

Page 12: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

CGI programs

• CGI forking and executing overhead is very high.• New mechanisms based on dynamic linking and ot

her private API increase performance.• This kind of approach provides the efficiency of n

ative machine code execution.• Apache Web server uses GNU dld to add, remove,

replace object modules within the process address space.

11/57

Page 13: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

CGI programs (cont.)

• To allow clients to execute server programs without spawning separate processes each time.

• This can be accomplished by linking server programs directly with the Web server or preforking multiple processes to threads which the Web server communicates with to invoke server programs.

• For example, IBM ICAPI, Netscape NSAPI, Microsoft ISAPI.

12/57

Page 14: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Dynamic Load Balancing

• Main load balancing techniques– Client-based approach

• Web clients

• Client-side proxies

– DNS-based approach• Constant TTL algorithms

• Adaptive TTL algorithms

13/57

Page 15: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Dynamic Load Balancing

– Dispatcher-based approach• Packet rewriting

• Packet forwarding

• HTTP redirection

– Server-based approach• HTTP redirection

• Packet redirection

14/57

Page 16: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Client-based Approach

• Web clients– Require software modification on the client side– One example: wwwi.netscape.com

• Limited practical applicability and not scalable

– Web-client scheduling via smart clients• Increased network traffic due to message exchanges

• Provide scalability and availability

• Lacks client-side portability

15/57

Page 17: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Client-based Approach

• Client-side proxies– Limited applicability– Combine caching with server replication

16/57

Page 18: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

DNS-based Approach

• Transparency at the URL level• DNS servers

– Authoritative DNS server– Intermediate name server

• TTL—a validity period for caching the result of the logical name resolution

• Factors that limit the DNS control on address caching

17/57

Page 19: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

DNS-based Approach

– Noncooperative intermediate name server– Browser caching

• DNS as a potential bottleneck

18/57

Page 20: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

DNS-based Approach

19/57

Page 21: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

DNS-based Approach

• Constant TTL Algorithms– System-stateless algorithms

• Round-robin (by NCSA)

– Server-state-based algorithm• Aviod system overload

• Sun-SCALR

• Choose the least-loaded server, set TTL to zero

20/57

Page 22: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

DNS-based Approach

– Client-state-based algorithms• Two kinds of information come from client side

• Multitier round-robin policy– Hidden load weight index

• Cisco Systems’ DistibutedDirector– Client-to-server topological proximity

– Client-to-server link latency

• Internet2 Distributed Storage Infrastructure Project (I2-DSI)– Based-on network proximity(e.g, round-trip delay)

• Problem with intermediate name server

– Server and client-state-based algorithms

21/57

Page 23: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

DNS-based Approach

• Adaptive TTL algorithms– Heterogeneity of server capacities– Use some server- and client-based DNS policy

to select the server, and dynamically adjust the TTL value

– Popular domain with lower TTL value

22/57

Page 24: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Dispatcher-based Approach

• Centralizing request control and routing• Single, virtual IP address(IP-SVA) of

dispatcher• Simple dispatching algorithms• Routing schemes:

– Packet rewriting– Packet forwarding– HTTP redirection

23/57

Page 25: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Dispatcher-based Approach

• Packet single-rewriting– TCP router as an IP address dispatcher– Can combine with DNS-based approach

• Packet double-rewriting– Rewriting both arrival and response packets– Underlies the Internet Engineering Task

Force’s Network Address Translator

24/57

Page 26: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Packet Single-rewriting

25/57

Page 27: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Packet Double-rewriting

26/57

Page 28: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Dispatcher-based Approach

– Two example:• Magicrouter

• Cisco Systems’ LocalDirector

• Packet Forwarding– Network Dispatcher(IBM)

• LAN Network Dispatcher

• WAN Network Dispatcher

27/57

Page 29: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

WAN Network Dispatcher

28/57

Page 30: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Dispatcher-based Approach

– ONE-IP address• Use “if config alias” option

• Publicizes the IP-SVA as the same secondary IP address of all nodes

– Routing-based dispatching—static

– Broadcast-based dispatching—dynamic

• The ONE-IP approach can be combined with a DNS-based solution

29/57

Page 31: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Dispatcher-based Approach

• HTTP Redirection– Largely transparent, increased response time– Can be implemented with:

• Server-state-based dispatching– Use Distributed Server Groups architecture

– Add new method to HTTP protocol

• Location-based dispatching– Used by CISCO Systems’ DistributedDirector

30/57

Page 32: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Server-based Approach

• HTTP Redirection– Request initially assigned by DNS to a Web

server, can be reassigned to another server via HTTP redirection

– Overhead in intra-cluster communication

31/57

Page 33: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Server HTTP Redirection

32/57

Page 34: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Server-based Approach

• Packet Redirection– Distributed Packet Rewriting (DPR)– Round-robin by DNS, then packets can be

rerouted by packet-rewriting mechanism– Two load balance algorithms

• Static(stateless) routing

• Least-loaded (by periodic server communications)

– DPR can be applied to both LAN and WAN

33/57

Page 35: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Comparing the approaches

34/57

Page 36: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Typical Methods

• File caching, web caching – proxy• Web server tuning

– multithreading, memory-mapped I/O

• Load balancing multi-node Web server (dispatching)

• Replication• Protocol improvement• Resolving dynamic content problem

35/57

Page 37: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Case Study 1: NCSA’s World Wide Web Server (1995)

• Extant access patterns and responses at NCSA’s WWW server:– At peak times, the NCSA server receives 30-40

requests per second, no single processor system could serve all requests.

– HTTP is connectionless and separate.

• Three key problems:– Information addressing.– Information distribution.– Load balancing.

by Thomas T. Kwan and Daniel A. Reed 36/57

Page 38: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Case Study 1: Architecture

• A group of loosely coupled WWW servers: they operate independently, but collectively they provide the illusion of a single server.

• Three components:– A collection of independent servers.– A WWW document tree shared among the servers and

stored by the Andrew distributed file system.– A round-robin domain name system that multiplexes

the domain name www.ncsa.uiuc.edu among the constituent servers.

37/57

Page 39: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Case Study 1: the servers and the network

• The WWW servers are connected to the AFS file servers via a 100-mb/s FDDI ring.

• Each HP735 has 96 MB RAM and uses its local disk as a moderate-size(130 MB) AFS cache. The local disk stores HTTP server log files and is the backing store for the virtual memory system.

38/57

Page 40: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Case Study 1: AFS configuration

• The shared document tree structure.

• The most frequently accessed documents and document tree are cached locally.( cache hit rate -> 90% ; critical!)

• Allow “plug and play” addition/removal of component servers and use of heterogeneous systems.

39/57

Page 41: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Case Study 1: Round-robin DNS

• The BIND 4.9.2 code has a round-robin option, but the rotation conflicted with extant software at NCSA. The BIND software was modified to allow a domain name with more than one associated IP address.

• Adding a new server to the group is as simple as adding its IP address to the DNS entry for www.ncsa.uiuc.edu .

40/57

Page 42: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Case Study 2: Improvement of the Apache Web Server(1999)

• Implement 6 techniques that improve the throughput of Apache by 61%.

• Performance analysis(SPECweb96, WebStone):– Apache spends more than 75-80% of its CPU time on O

S kernels.

– Small files are accessed more frequently.

• Experimental Environments: both uniprocessor and 4-way SMP machines with enough RAM.

by Yiming Hu, Ashwini Nanda, and Qing Yang 41/57

Page 43: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Case Study 2: Detailed CPU Time Breakdown

• 23% - user code

• 29% - TCP/IP

• 26% - interrupt

• 17% - file operation

It is possible to greatly reduce the file system overhead and the user code overhead!

42/57

Page 44: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Case Study 2: mmap function

• When Apache needs to read a file, we let the program open the file, then use the mmap function to map the data of the file into user space.

• Use the mmap function to eliminate the data copying between the file system cache and the user space.

• As a result, the server can send the mapped data directly to the network, avoiding the read system call altogether.

• When the entire file content is sent out, the file is unmapped and closed to limit the number of opened files in the system.

43/57

Page 45: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Case Study 2: Caching files in the User Space

• Caching the file data in the Web server user space can directly ship the cached files to the TCP/IP stack, avoiding all file system calls such as open, read and close.

• The caches stores both the file data and the file states in a user memory region shared by all processes.

44/57

Page 46: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Case Study 2: Speeding up Logging

• On the uniprocessor system with 128MB RAM, the SPECweb96 number of Apache with logging is 158 ops/sec.

• Simply turning off the logging operation results in a SPECweb96 number of 194 ops/sec. ( 22% improvement )

• It is desirable to reduce the logging overhead of Apache.

45/57

Page 47: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Case Study 2: Caching DNS results

• The majority of overheads of logging comes from looking up the host names of clients.

• Apache calls the getnamebyaddr function for every HTTP request being logged. (very wasteful, especially no local name server)

• The DNS cache is a small array indexed by hashing the IP address. (14% improvement)

• Or disable the host name lookup process.

46/57

Page 48: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Case Study 2: Caching Strings results

• Another logging overhead is caused by logging the request time and status code.

• Let the time-conversion routine store the resulting string in a static array.

• When Apache calls the conversion routine again , the routine returns the result in the array if the current time argument is the same as the last call, thus avoiding many redundant string operations.

• Similar optimizations can be applied to the status code conversion and other places in the Apache.

47/57

Page 49: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Case Study 2: Caching URI Processing results

• 60% of the user time are for processing URIs(e.g., parsing , directory checking, security checking, translating the URI to a file name, etc.)

• When Apache finishes parsing and checking a URI and obtains a file name corresponding to the URI, the URI/file-name pair is put into the cache.

• Especially useful for dynamic pages such as directory lists.

48/57

Page 50: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Case Study 2:Results of the Enhancement Techniques

apache

+DNS_cache

+DNS_cache+ str_cache

+DNS_cache+ str_cache+mmap

+DNS_cache+ str_cache

+mmap+state_cache

+DNS_cache+str_cache+mmap

+state_cache+data_cache

+DNS_cache+ str_cache

+mmap+state_cache

+data_cache+URI_cache

49/57

Page 51: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Case Study 2: Conclusions

50/57

Page 52: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Case Study 3: Caching Dynamic Content(1998)

• A distributed Web server called Swala– The nodes cooperatively cache the results of C

GI requests, and the cache meta-data is stored in a replicated global cache directory.

• Every Swala node contains two modules:– The HTTP module– The cache module

by Vegard Holmedahl,Ben Smith, Tao yang 51/57

Page 53: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Case Study 3: Components of the Swala architecture

52/57

Page 54: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Case Study 3: Components of the Swala architecture

• The HTTP module (request thread)– Handling HTTP requests– Take turns listening on the main port for incoming

connections and handling the requests.

• The cache module (three threads)– Receives info. About cache from the other nodes– Listens for cache data requests from other nodes and

return cache content.– Wakes up every few seconds and deletes expired cache

entries.

53/57

Page 55: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Case Study 3: Cache

Manager

54/57

Page 56: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Case Study 4: Admission-Control Web Sever(2000)

• Current HTTP server process requests using a first come first serve queuing policy.

• The more requests a client makes, the more replies the server will generate in response.

• A new algorithm is implemented on the Apache, which has been shown to be effective in allocating configurable fixed percentage of bandwidth across numerous simultaneous clients, independent of the aggressiveness of the client’s requests.

by Kelvin Li and Sugih Jamin 55/57

Page 57: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

Case Study 4: Admission-Control Algorithms

• Primary Daemon Process– Original duties, configure client bandwidth allocation and total

bandwidth available to all clients.

• Bandwidth Manager Process– Bandwidth manager determines the maximum measured

bandwidth of the server at each processing interval.

• Child Process– Each child process is equipped with the algorithms necessary to

determine whether to serve the request or not

56/57

Page 58: High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師:莊裕澤教授.

The End

57/57