[Bernstein09] 2.6. Scalability

2.6. Scalability

Scalingup a TP system to handle high load involves two activities. First, onecan tune and grow each individual server system to handle the maximumpossible load. And second, one can distribute the load across multipleinterconnected server systems. The decision of which approach to takedepends on cost-performance as well as other goals, such asavailability, security, and manageability. In this section we focus onthe mechanisms that enable scaling up system performance.

Scaling up a Server

Thethroughput of a server system is ultimately dependent on its hardwareconfiguration; that is, on the speed of its processors, on the size andspeed of main memory and secondary storage, and on the bandwidth andlatency of its interconnects. Software too plays a major role. For agiven hardware configuration, there are two techniques that arecommonly used to get the most benefit from that configuration: cachingand resource pooling.

Caching

Acache is an area of memory containing data whose permanent home iselsewhere, such as in secondary storage or on a remote server system.Ideally, the cache contains data that frequently is accessed byprograms running on the system containing the cache and that is muchcheaper to access than its permanent home. Thus, the expense ofretrieving the data from its permanent home is amortized across a largenumber of accesses. This greatly improves performance over a system inwhich every access to the data requires accessing the data’s permanenthome.

Since manycomponents of a TP system need to access data that is not local to thecomponent, caches are heavily used in TP. A web browser may cache pagesof frequently accessed web sites, to avoid having to go to the web siteevery time those pages are requested. A web server may cacheinformation that is needed to service popular requests, such as theinformation displayed in response to a client’s initial request toaccess the web site. A proxy server, which sits between the network andthe web server, also offers this caching functionality. Some large dataitems, such as images, may be cached at a third-party’s web site thatis closer to the end user and can therefore offer higher speed accessto the information. A server running a TP application may cache populardata whose permanent home is a remote database system. And the databasesystem itself may cache data whose permanent home is secondary storage.

Theofficial copy of a data item in its permanent home may be updatedshortly after that copy was read and put into a cache. Therefore, oncedata has been put into a cache (somewhere other than its permanenthome), it potentially is no longer up to date. Thus, a cache is mostappropriate for data that is infrequently updated. For example, in a TPsystem, it is probably useful to cache catalog information, since itscontent changes slowly. But it may not be useful to cache the latestbid in an auction that will close shortly, since it may be updated veryfrequently.

Theimplementation of a cache requires a fast way to look up entries. Thisusually is done using a hash algorithm that maps the identifier of adata item to its memory location in the cache. It also requires a replacement algorithm,which selects an item to remove from the cache to make room for a newitem that has to be inserted. A commonly used replacement algorithm is least-recently-used, which replaces the item whose last access was longest in the past among allitems in the cache. There is a large repertoire of cache replacementalgorithms used in practice. However, coverage of these algorithms isbeyond the scope of this book.

Sometimes, items in the cache are invalidatedbefore they need to be replaced. Invalidation is done if it is knownthat the item is unlikely to be fresh enough to be used. When a serverstores an item in its cache, it may include an invalidation time thatthe cache manager enforces. For example, a web server may add aninvalidation time 10 minutes in the future when it caches a copy of aheadline news banner, thereby ensuring it refreshes the headline fromthe news server at least that often.

Alternatively,the server that is the data’s permanent home may keep track of whichcaches have a copy of that data. After the server processes an updatefor that data, it can issue an invalidation messageto every cache that has a copy, which tells the cache to invalidate itscopy of that data. This helps to ensure that the caches are coherent;that is, that a data item has the same value in all caches thatcurrently hold the data item. Clearly, there are limits to cachecoherence due to variance in the time it takes for each cache toreceive an invalidation message and process it.

Acache may be updatable. Each update to a data item in the cache must bepropagated back to the data item’s permanent home. Sometimes, this mustbe done explicitly by the client of the cache. That is, it stores theupdated data item in both the cache and the data item’s permanent home.If the cache manager knows how to map each cached data item to itspermanent home, then the client may only need to update the cache andthe cache manager propagates the update to the data item’s permanenthome. If the cache manager propagates the update immediately as part ofthe operation to update the cache, then the cache is called write-through. If it propagates the update lazily, potentially long after the cache was updated, then the cache is called write-back.

Clearly,cache coherence is affected by the way that updates to the cache arepropagated. For example, if the data item’s server uses invalidationmessages to notify caches when the item has changed, then awrite-through cache will yield better cache coherence than a write-backcache. But this better coherence has a cost. Usually, the cost of thewrite-back cache is lower, since multiple updates to the same cacheddata item within a short time period incur only one write-back, and awrite-back can batch multiple updates in a single message to the data’spermanent home.

Sincecaching mechanisms are complex and important, they are built into manytypes of products, notably transactional middleware and databasesystems. There are main memory database systems that are intended to beused for cached data. Some operate as a conventional transactionalresource manager, such as Oracle’s TimesTen, McObject’s eXtremeDB, andRaima’s RDM. Others are designed specifically for caching, for example,by offering the application explicit control of when to invalidatecached data or write-back updated cached data to its home location.Examples include Danga Interative’s memcached, Oracle’s Coherence andMicrosoft’s project codenamed “Velocity.”

Resource Pooling

Anothercase where caching can improve performance is when a resource is costlyto create and relatively inexpensive to access. Sessions are one suchresource. Consider an application that requires the use of a databasesystem. The server process that runs this application needs a sessionwith a database system for each transaction currently running. However,each transaction typically needs the session only for a fraction of asecond. Therefore, the server process can maintain a pool (i.e., cache)of sessions. Each transaction is given exclusive use of one of thesessions while it is running. After it commits or aborts, the sessionis returned to the pool. Thus, sessions are serially reused by different transactions.

Processthreads are another example of resources that can be pooled andserially reused. The process has a fixed number of threads. When itreceives an RPC, the RPC is assigned to a thread in which to execute.After the RPC finishes executing and returns to its caller, the threadis returned to the thread pool.

Athird example is server classes, which avoid the overhead of frequentprocess creation. Like threads, each process receives a call. After thecall completes, the process is available for reuse by another call.

Scaling Out a System

One way to scale up a system is to add more machines. This is called scale-out.There are two approaches to scale-out, partitioning and replication,which offer different ways of distributing the workload across themachines.

Partitioning

One way to distribute the workload is to partitionthe application and its data into different types of work and assigneach type to a different machine. For example, in a bank, one mightassign the credit card application to one system, the loan applicationto a second system, and the checking account application to a thirdsystem. Whena request arrives, it is directed to the system that supports therelevant application. This can be done by storing the mapping betweenapplications and servers in a registry service and looking up themapping for each request, as was described in Section 2.4.

Partitioningby application type is an effective technique. However, it is anincomplete solution if an application needs to scale up beyond thecapabilities of a single machine. Then the application itself needs tobe partitioned. A common way to do this is range partitioning,where different copies of the server handle different ranges of aninput parameter. For example, a debit-credit application dealing withretail banking might be split into five servers, each of which handlesa range of account numbers (see Figure 2.16 ).The database that supports each of these servers can be local to thesystem that supports those account numbers. So the first group ofaccount numbers is stored on the same computer as the applicationprogram that supports those account numbers, and so on.

Figure 2.16. Parameter-based Routing. The router application forwards each request to the appropriate server based on the account number parameter in the request.

Whenthe system is organized in this way, a routing function needs toforward each request to the correct server based not only on theidentifier of the request type, but also on one or more of the inputparameters. In the example, it would be the account number. This iscalled parameter-based routing.

Rangepartitioning can be implemented directly by the application, by havingthe application support the routing function. Many systems providebuilt-in support. For example, range partitioning and parameter-basedrouting are supported by many high-function database systems and sometransaction middleware products.

Partitioningschemes all suffer from the problem of load balancing, especially whenservers are partitioned statically. Usually, the workload varies overtime. For example, in the system shown in Figure 2.16 there may be a burst of activity for accounts in the 20,000 to 39,999range, thereby overloading the second server. This problem may arisefrequently if the load is correlated with value ranges. For example, ifaccount numbers are correlated with geographical regions, then a peakperiod in one time zone will cause its partition’s servers to be moreheavily loaded than those of other partitions. An overloaded partitionwill perform poorly. It doesn’t help that other servers may be lessheavily loaded, because they don’t have the data required to servicerequests in the 20,000 to 39,999 range.

One way to reduce the frequency of such overload situations is to use hash partitioning,where a hash function is used to map each parameter value to a serverpartition. A well-designed hash function will, with very highprobability, spread the load evenly across servers. It therefore isless likely to exhibit load-balancing problems than range partitioning.Hash partitioning commonly is used not only for partitioning a databasebut also for partitioning a large-scale cache that is spread acrossmany servers.

One solution to load balancing is automatic reconfiguration.That is, when a partition becomes overloaded, it automatically is splitinto two partitions and the routing function is updated accordingly.The decision to split a partition should be based on workload trends,not a short-term spike in load. If a partition is split based on atemporary load spike, the split partitions will be underutilized afterthe spike dissipates.

Another solution to load balancing is to use table-lookup partitioning,where a mapping table explicitly maps each input parameter value to aparticular server. There is a significant cost to maintaining all thismapping information when a lot of parameter values are present, thoughthis can be mitigated with the use of some network switches, such asLayer 7 switches. This partitioning approach offers some significantbenefits over range and hash partitioning. One benefit is fine-grainedcontrol over reconfiguration. When a server overflows, a new server canbe allocated and newly added parameter values can be assigned to thenew server. Another benefit is that different parameter values can beexplicitly assigned levels of service. For example, a bank may offertwo levels of service to checking accounts, depending on their minimummonthly balance. This account-type information can be stored in themapping table and the account stored at a server that supports theappropriate level of service. A third benefit is that users can beupgraded to a new release of an application one by one. By contrast,with range or hash partitioning, the application would not know whetherto access a user’s data in the partition using the old or new format.Thus, all the parameter values (e.g., accounts) in a partition would beinaccessible while the upgrade was in progress.

Whateverpartitioning scheme is used, configuring a system with the right amountof server capacity is important. Servers need to be configured for peakload, not average load, to ensure that they can offer good responsetime even in periods of high load. The more extra capacity (or headroom) that each system offers relative to its expected peak load, the less likely it will become overloaded.

Partitioning Sessions

Partitioningalso helps scale-out when communication sessions are required. In atwo-tier architecture, if there are many clients and each clientrequires a session with every server, the result is a polynomialexplosion in the number of sessions. For example, if there are 100,000clients and each one has to connect to all 500 servers, then eachserver would have 100,000 sessions, resulting in 50,000,000 sessionsoverall (see Figure 2.17 ).Each session consumes some main memory and requires some setup time.When there are too many sessions, this session overhead can betroublesome. It can limit the ability of the server system to scale outby adding more servers.

Figure 2.17. Polynomial Explosion in Two-Tier Model. If there are f front-end programs and t transaction servers, then there are f × t sessions.

Thetotal number of sessions can be greatly reduced by inserting a routinglayer between the clients and servers that partitions the set ofclients. Each router process connects to a subset of the clients and toall the servers. Thus, each client can still send messages to allservers, at the cost of an extra message hop through a router. See Figure 2.18.

Figure 2.18. Multilevel Routing. Byintroducing routers in between clients and servers, the overall numberof sessions is greatly reduced, compared to the two-tier model of Figure 2.17.

Nowsay you have 10 routers between the clients and servers, and eachclient is connected to one router. Each of the 10 routers would have10,000 sessions to their clients and 500 sessions to all the servers,resulting in 10,500 sessions per router, or 105,000 sessions overall—ahuge reduction from the 50,000,000 sessions required without therouting layer.

Groupingclients by routers can be based on geographical considerations. Forexample, all the clients on a given local area network might beserviced by the same router. More complex groupings may be needed forfault tolerance reasons. For example, the ATMs at a bank branch may besplit across two routers over two separate communication lines, so thefailure of one router still leaves half of the ATMs operating.

Replication

Anotherway to distribute workload is to replicate server processes and assignthe replicas to different systems. The replicas are identical, in thesense that they can all process the same kinds of requests. This worksespecially well if the processes are stateless. In that case, eachrequest can be assigned to any of the replicas, even if a previousrequest from the same client was processed by a different replica.

Asin the partitioning approach, it is desirable to balance the loadacross the replicated servers. This can be done by having each requestrandomly choose a server to process the request, sometimes called sprayingthe requests across the servers. It can be done by the client thatissues the request, or it can be done in a server system. For example,a network router that connects a server system to the Internet mighthave built-in load balancing functionality to forward messages based onround robin, least number of active connections, or fastest responsetime.

Even if eachclient sends the same number of requests to each server, the load maynot be distributed evenly, because one server may receive requests thatrequire more time to service than those received by another server. Toavoid this unbalanced load, each client can put requests into a queuethat is shared by all servers, and each serverdequeues a request whenever it is idle. Thus, each server obtains newwork if and only if it has additional capacity to process it. The maindisadvantage of this approach is the overhead of managing the queue. Itneeds to be accessible to all the servers, and clients and servers needto synchronize their accesses to the shared queue. We will have a lotmore to say about queuing mechanisms in Chapter 4.

Replication interacts with caching. Suppose a server is replicated and a client process C issues a request r that accesses one of those server replicas, S. To process r, S may access remote data, which S saves in a cache. For example, C may be a web browser running on a desktop machine and Smay be a web server at a site that has a large number of web serversrunning. The request may access a web page, which is cached by S. A given client often issues many similar requests. If C issues a request r′ that is similar to r and hence accesses the same data as r, then it would be cheaper to process r′ at S rather than at a different server replica that has not cached the data required by r′. In this case, we say that C has cache affinity for S. Although C can still access any of the server replicas, it performs better when accessing S than any of the other server replicas.

A more extreme example of affinity occurs when a server replica S is maintaining shared state with respect to a client C. In this case, it is essential that all requests from C be serviced by S, so the request has access to the shared state. Notice that this problem does not arise if C maintains the shared state. That is, if Cincludes the shared state with every request, then any server replicacan process a request, because the server replicas are stateless withrespect to C.

Whenreplicas contain updatable data, updates must be propagated to allreplicas to keep them identical. A common configuration is to requirethat all updates be applied to a primary replica, which forwards thoseupdates to the other read-only replicas. This offers simplersynchronization than immediately broadcasting updates to all replicas,but introduces delay by passing all updates through the primaryreplica. Synchronization algorithms for replicated data are covered in Chapter 9.

Replicationis a common feature of database systems. It can also be used toimplement cache coherence. If a replicated cache is updatable, then areplication mechanism can be used to propagate updates from one cacheto all the others.

Replicationalso is used to improve availability. If one replica is unavailable,then its workload can be handled by other replicas. Techniques forusing replicas in this way are also covered in Chapter 9.

本站仅提供存储服务，所有内容均由用户发布，如发现有害或侵权内容，请点击举报。