High Performance Web Site R88725030 劉朝銘 R88725037 程左一指導老師：莊裕澤教授.

DIS PresentationDIS Presentation

High PerformanceHigh Performance

Web SiteWeb Site

R88725030 劉朝銘

R88725037 程左一

指導老師：莊裕澤教授

1/57

Outline

• Performance issues

• Typical Methods

• Case study

Performance Issues and Related Work

• The overall performance of the web is determined by the performance of the components which make up the web: the servers, the clients, the proxies, the networks, and the protocols used for communication.

1997, Martin F. Arlitt 2/57

Performance Issues and Related Work

• Caching still appears to be a promising approach to improving Web performance because of the large number of requests for a small number of documents, the concentration of references within these documents, and the small average size of these documents. ~ 1997, Martin F. Arlitt

3/57

The clients

• Client connection bandwidth – The Web server has to maintain open connection with slow clients while request is being satisfied.

• Efficient Web browsers can use file caching to reduce the loads that they put on web servers and network links.

• Client computing - Java applets.

4/57

The servers

• Identify performance bottlenecks and evaluate the performance impact of different Web server designs to reduce response times.

• Use file caching in reducing Web server loads.

• CGI improvement, dispatcher, replication, caching, system calls

5/57

The proxies

• Researchers at several institutions are studying various cache replacement policies for Webs proxies:

• The most widely deployed proxy server,Squid, uses the LRU replacement policy. ( http://squid.nlanr.net/ )

LRU Doubly linked queue

LFU Priority queue Cache pollution

GDS Greedy Dual-Size Size, cost function

6/57

The networks

• Several studies have suggested the use of network file caches to reduce the volume of traffic on the Internet.

• Wide-area file system with WWW, such as AFS, have mechanisms to address performance, reliability and security.

• Researchers at NLANR have implemented a prototype hierarchy of caches, and are currently focusing on configuring and tuning caches within the global hierarchy.(Squid)

7/57

The protocols

• The current protocol used for client-server interaction within the WWW(HTTP/1.0) is very inefficient.– Connect -> request -> response -> disconnect– stateless

• A more efficient approach(HTTP-NG) would allow for multiple client requests to be sent over a single TCP connection.

8/57

Key Improvements in HTTP/1.1

• Bandwidth Optimization– Allow a client to request portions of a resource.

• Persistent Connection– Clients, servers, and proxies assume that a connection

will be kept open.

• Pipelining– A client need not wait to receive the response for one

request before sending another request on the same connection.

9/57

Components Affecting Web Server Performance

• Server software:Architecture, Protocol, CGI programs, caching, m

irroring, security, application level I/O buffer, compiler performance options, processing on the client side.

• Operating system

• The network~1996, Vittorio Trecordi and Alberto Verga

10/57

CGI programs

• CGI forking and executing overhead is very high.• New mechanisms based on dynamic linking and ot

her private API increase performance.• This kind of approach provides the efficiency of n

ative machine code execution.• Apache Web server uses GNU dld to add, remove,

replace object modules within the process address space.

11/57

CGI programs (cont.)

• To allow clients to execute server programs without spawning separate processes each time.

• This can be accomplished by linking server programs directly with the Web server or preforking multiple processes to threads which the Web server communicates with to invoke server programs.

• For example, IBM ICAPI, Netscape NSAPI, Microsoft ISAPI.

12/57

Dynamic Load Balancing

• Main load balancing techniques– Client-based approach

• Web clients

• Client-side proxies

– DNS-based approach• Constant TTL algorithms

• Adaptive TTL algorithms

13/57

Dynamic Load Balancing

– Dispatcher-based approach• Packet rewriting

• Packet forwarding

• HTTP redirection

– Server-based approach• HTTP redirection

• Packet redirection

14/57

Client-based Approach

• Web clients– Require software modification on the client side– One example: wwwi.netscape.com

• Limited practical applicability and not scalable

– Web-client scheduling via smart clients• Increased network traffic due to message exchanges

• Provide scalability and availability

• Lacks client-side portability

15/57

Client-based Approach

• Client-side proxies– Limited applicability– Combine caching with server replication

16/57

DNS-based Approach

• Transparency at the URL level• DNS servers

– Authoritative DNS server– Intermediate name server

• TTL—a validity period for caching the result of the logical name resolution

• Factors that limit the DNS control on address caching

17/57

DNS-based Approach

– Noncooperative intermediate name server– Browser caching

• DNS as a potential bottleneck

18/57

DNS-based Approach

19/57

DNS-based Approach

• Constant TTL Algorithms– System-stateless algorithms

• Round-robin (by NCSA)

– Server-state-based algorithm• Aviod system overload

• Sun-SCALR

• Choose the least-loaded server, set TTL to zero

20/57

DNS-based Approach

– Client-state-based algorithms• Two kinds of information come from client side

• Multitier round-robin policy– Hidden load weight index

• Cisco Systems’ DistibutedDirector– Client-to-server topological proximity

– Client-to-server link latency

• Internet2 Distributed Storage Infrastructure Project (I2-DSI)– Based-on network proximity(e.g, round-trip delay)

• Problem with intermediate name server

– Server and client-state-based algorithms

21/57

DNS-based Approach

• Adaptive TTL algorithms– Heterogeneity of server capacities– Use some server- and client-based DNS policy

to select the server, and dynamically adjust the TTL value

– Popular domain with lower TTL value

22/57

Dispatcher-based Approach

• Centralizing request control and routing• Single, virtual IP address(IP-SVA) of

dispatcher• Simple dispatching algorithms• Routing schemes:

– Packet rewriting– Packet forwarding– HTTP redirection

23/57


• Packet single-rewriting– TCP router as an IP address dispatcher– Can combine with DNS-based approach

• Packet double-rewriting– Rewriting both arrival and response packets– Underlies the Internet Engineering Task

Force’s Network Address Translator

24/57

Packet Single-rewriting

25/57

Packet Double-rewriting

26/57


– Two example:• Magicrouter

• Cisco Systems’ LocalDirector

• Packet Forwarding– Network Dispatcher(IBM)

• LAN Network Dispatcher

• WAN Network Dispatcher

27/57

WAN Network Dispatcher

28/57


– ONE-IP address• Use “if config alias” option

• Publicizes the IP-SVA as the same secondary IP address of all nodes

– Routing-based dispatching—static

– Broadcast-based dispatching—dynamic

• The ONE-IP approach can be combined with a DNS-based solution

29/57


• HTTP Redirection– Largely transparent, increased response time– Can be implemented with:

• Server-state-based dispatching– Use Distributed Server Groups architecture

– Add new method to HTTP protocol

• Location-based dispatching– Used by CISCO Systems’ DistributedDirector

30/57

Server-based Approach

• HTTP Redirection– Request initially assigned by DNS to a Web

server, can be reassigned to another server via HTTP redirection

– Overhead in intra-cluster communication

31/57

Server HTTP Redirection

32/57

Server-based Approach

• Packet Redirection– Distributed Packet Rewriting (DPR)– Round-robin by DNS, then packets can be

rerouted by packet-rewriting mechanism– Two load balance algorithms

• Static(stateless) routing

• Least-loaded (by periodic server communications)

– DPR can be applied to both LAN and WAN

33/57

Comparing the approaches

34/57

Typical Methods

• File caching, web caching – proxy• Web server tuning

– multithreading, memory-mapped I/O

• Load balancing multi-node Web server (dispatching)

• Replication• Protocol improvement• Resolving dynamic content problem

35/57

Case Study 1: NCSA’s World Wide Web Server (1995)

• Extant access patterns and responses at NCSA’s WWW server:– At peak times, the NCSA server receives 30-40

requests per second, no single processor system could serve all requests.

– HTTP is connectionless and separate.

• Three key problems:– Information addressing.– Information distribution.– Load balancing.

by Thomas T. Kwan and Daniel A. Reed 36/57

Case Study 1: Architecture

• A group of loosely coupled WWW servers: they operate independently, but collectively they provide the illusion of a single server.

• Three components:– A collection of independent servers.– A WWW document tree shared among the servers and

stored by the Andrew distributed file system.– A round-robin domain name system that multiplexes

the domain name www.ncsa.uiuc.edu among the constituent servers.

37/57

Case Study 1: the servers and the network

• The WWW servers are connected to the AFS file servers via a 100-mb/s FDDI ring.

• Each HP735 has 96 MB RAM and uses its local disk as a moderate-size(130 MB) AFS cache. The local disk stores HTTP server log files and is the backing store for the virtual memory system.

38/57

Case Study 1: AFS configuration

• The shared document tree structure.

• The most frequently accessed documents and document tree are cached locally.( cache hit rate -> 90% ; critical!)

• Allow “plug and play” addition/removal of component servers and use of heterogeneous systems.

39/57

Case Study 1: Round-robin DNS

• The BIND 4.9.2 code has a round-robin option, but the rotation conflicted with extant software at NCSA. The BIND software was modified to allow a domain name with more than one associated IP address.

• Adding a new server to the group is as simple as adding its IP address to the DNS entry for www.ncsa.uiuc.edu .

40/57

Case Study 2: Improvement of the Apache Web Server(1999)

• Implement 6 techniques that improve the throughput of Apache by 61%.

• Performance analysis(SPECweb96, WebStone):– Apache spends more than 75-80% of its CPU time on O

S kernels.

– Small files are accessed more frequently.

• Experimental Environments: both uniprocessor and 4-way SMP machines with enough RAM.

by Yiming Hu, Ashwini Nanda, and Qing Yang 41/57

Case Study 2: Detailed CPU Time Breakdown

• 23% - user code

• 29% - TCP/IP

• 26% - interrupt

• 17% - file operation

It is possible to greatly reduce the file system overhead and the user code overhead！

42/57

Case Study 2: mmap function

• When Apache needs to read a file, we let the program open the file, then use the mmap function to map the data of the file into user space.

• Use the mmap function to eliminate the data copying between the file system cache and the user space.

• As a result, the server can send the mapped data directly to the network, avoiding the read system call altogether.

• When the entire file content is sent out, the file is unmapped and closed to limit the number of opened files in the system.

43/57

Case Study 2: Caching files in the User Space

• Caching the file data in the Web server user space can directly ship the cached files to the TCP/IP stack, avoiding all file system calls such as open, read and close.

• The caches stores both the file data and the file states in a user memory region shared by all processes.

44/57

Case Study 2: Speeding up Logging

• On the uniprocessor system with 128MB RAM, the SPECweb96 number of Apache with logging is 158 ops/sec.

• Simply turning off the logging operation results in a SPECweb96 number of 194 ops/sec. ( 22% improvement )

• It is desirable to reduce the logging overhead of Apache.

45/57

Case Study 2: Caching DNS results

• The majority of overheads of logging comes from looking up the host names of clients.

• Apache calls the getnamebyaddr function for every HTTP request being logged. (very wasteful, especially no local name server)

• The DNS cache is a small array indexed by hashing the IP address. (14% improvement)

• Or disable the host name lookup process.

46/57

Case Study 2: Caching Strings results

• Another logging overhead is caused by logging the request time and status code.

• Let the time-conversion routine store the resulting string in a static array.

• When Apache calls the conversion routine again , the routine returns the result in the array if the current time argument is the same as the last call, thus avoiding many redundant string operations.

• Similar optimizations can be applied to the status code conversion and other places in the Apache.

47/57

Case Study 2: Caching URI Processing results

• 60% of the user time are for processing URIs(e.g., parsing , directory checking, security checking, translating the URI to a file name, etc.)

• When Apache finishes parsing and checking a URI and obtains a file name corresponding to the URI, the URI/file-name pair is put into the cache.

• Especially useful for dynamic pages such as directory lists.

48/57

Case Study 2:Results of the Enhancement Techniques

apache

+DNS_cache

+DNS_cache+ str_cache

+DNS_cache+ str_cache+mmap


+mmap+state_cache

+DNS_cache+str_cache+mmap

+state_cache+data_cache


+mmap+state_cache

+data_cache+URI_cache

49/57

Case Study 2: Conclusions

50/57

Case Study 3: Caching Dynamic Content(1998)

• A distributed Web server called Swala– The nodes cooperatively cache the results of C

GI requests, and the cache meta-data is stored in a replicated global cache directory.

• Every Swala node contains two modules:– The HTTP module– The cache module

by Vegard Holmedahl,Ben Smith, Tao yang 51/57

Case Study 3: Components of the Swala architecture

52/57

Case Study 3: Components of the Swala architecture

• The HTTP module (request thread)– Handling HTTP requests– Take turns listening on the main port for incoming

connections and handling the requests.

• The cache module (three threads)– Receives info. About cache from the other nodes– Listens for cache data requests from other nodes and

return cache content.– Wakes up every few seconds and deletes expired cache

entries.

53/57

Case Study 3: Cache

Manager

54/57

Case Study 4: Admission-Control Web Sever(2000)

• Current HTTP server process requests using a first come first serve queuing policy.

• The more requests a client makes, the more replies the server will generate in response.

• A new algorithm is implemented on the Apache, which has been shown to be effective in allocating configurable fixed percentage of bandwidth across numerous simultaneous clients, independent of the aggressiveness of the client’s requests.

by Kelvin Li and Sugih Jamin 55/57

Case Study 4: Admission-Control Algorithms

• Primary Daemon Process– Original duties, configure client bandwidth allocation and total

bandwidth available to all clients.

• Bandwidth Manager Process– Bandwidth manager determines the maximum measured

bandwidth of the server at each processing interval.

• Child Process– Each child process is equipped with the algorithms necessary to

determine whether to serve the request or not

56/57

The End

57/57

High Performance Web Site R88725030 劉朝銘 R88725037 程左一指導老師：莊裕澤教授.

Documents

Transcript of High Performance Web Site R88725030 劉朝銘 R88725037 程左一指導老師：莊裕澤教授.

High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師：莊裕澤教授.

Documents

Transcript of High Performance Web Site R88725030 劉朝 銘 R88725037 程左 一 指導老師：莊裕澤教授.

High Performance Web Site R88725030 劉朝銘 R88725037 程左一指導老師：莊裕澤教授.

Transcript of High Performance Web Site R88725030 劉朝銘 R88725037 程左一指導老師：莊裕澤教授.