The Architecture of Open Source Applications ...

MQ is a messaging system, or "message-oriented middleware", if youwill. It's used in environments as diverse as financial services, gamedevelopment, embedded systems, academic research and aerospace.

Messaging systems work basically as instant messaging forapplications. An application decides to communicate an event toanother application (or multiple applications), it assembles the datato be sent, hits the "send" button and there we go—the messagingsystem takes care of the rest.

Unlike instant messaging, though, messaging systems have no GUI andassume no human beings at the endpoints capable of intelligentintervention when something goes wrong. Messaging systems thus have tobe both fault-tolerant and much faster than common instant messaging.

MQ was originally conceived as an ultra-fast messaging systemfor stock trading and so the focus was on extreme optimization. Thefirst year of the project was spent devising benchmarking methodologyand trying to define an architecture that was as efficient as possible.

Later on, approximately in the second year of development, the focusshifted to providing a generic system for building distributedapplications and supporting arbitrary messaging patterns, varioustransport mechanisms, arbitrary language bindings, etc.

During the third year the focus was mainly on improving usability and flattening the learning curve. We've adopted the BSD Sockets API, tried toclean up the semantics of individual messaging patterns, and so on.

Hopefully, this chapter will give an insight into how the three goalsabove translated into the internal architecture of ?MQ, and providesome tips for those who are struggling with the same problems.

Since its third year ?MQ has outgrown its codebase; there isan initiative to standardise the wire protocols it uses, and anexperimental implementation of a ?MQ-like messaging system inside the Linuxkernel, etc. These topics are not covered in this book. However, you can checkonlineresourcesfor further details:http://www.250bpm.com/concepts,http://groups.google.com/group/sp-discuss-group, andhttp://www.250bpm.com/hits.

24.1. Application vs. Library

MQ is a library, not a messaging server. It took us several yearsworking on AMQP protocol, a financial industry attempt to standardisethe wire protocol for business messaging—writing a referenceimplementation for it and participating in several large-scaleprojects heavily based on messaging technology—to realise that there'ssomething wrong with the classic client/server model of smartmessaging server (broker) and dumb messaging clients.

Our primary concern at the time was with the performance: If there's aserver in the middle, each message has to pass the network twice (fromthe sender to the broker and from the broker to the receiver) inducinga penalty in terms of both latency and throughput. Moreover, if allthe messages are passed through the broker, at some point it's boundto become the bottleneck.

A secondary concern was related to large-scale deployments: when thedeployment crosses organisational boundaries the concept of a centralauthority managing the whole message flow doesn't apply any more. Nocompany is willing to cede control to a server in differentcompany; there are trade secrets and there's legal liability. Theresult in practice is that there's one messaging server per company,with hand-written bridges to connect it to messaging systems in othercompanies. The whole ecosystem is thus heavily fragmented, andmaintaining a large number of bridges for every company involved doesn'tmake the situation better. To solve this problem, we need a fullydistributed architecture, an architecture where every component can bepossibly governed by a different business entity. Given that the unitof management in server-based architecture is the server, we can solvethe problem by installing a separate server for each component. Insuch a case we can further optimize the design by making the server andthe component share the same processes. What we end up with is amessaging library.

MQ was started when we got an idea about how to makemessaging work without a central server. It required turning the wholeconcept of messaging upside down and replacing the model of an autonomouscentralised store of messages in the center of the network with a"smart endpoint, dumb network" architecture based on the end-to-endprinciple.The technical consequence of that decision was that ?MQ, from the verybeginning, was a library, not an application.

In the meantime we've been able to prove that this architecture isboth more efficient (lower latency, higher throughput) and moreflexible (it's easy to build arbitrary complex topologies instead ofbeing tied to classic hub-and-spoke model).

One of the unintended consequences, however, was that opting for thelibrary model improved the usability of the product. Over and overagain users express their happiness about the fact that they don'thave to install and manage a stand-alone messaging server. It turnsout that not having a server is a preferred option as it cutsoperational cost (no need to have a messaging server admin) andimproves time-to-market (no need to negotiate the need to run theserver with the client, the management or the operations team).

The lesson learned is that when starting a new project, you should optfor the library design if at all possible. It's pretty easy to createan application from a library by invoking it from a trivial program;however, it's almost impossible to create a library from an existingexecutable. A library offers much more flexibility to the users, at thesame time sparing them non-trivial administrative effort.

24.2. Global State

Global variables don't play well with libraries. A library may be loadedseveral times in the process but even then there'sonly a single set of globalvariables. Figure 24.1 shows a ?MQ library beingused from two different and independent libraries. The applicationthen uses both of those libraries.

Figure 24.1: ?MQ being used by different libraries

When such a situation occurs, both instances of ?MQ access the samevariables, resulting in race conditions, strange failures and undefinedbehaviour.

To prevent this problem, the ?MQ library has no global variables. Instead,a user of the library is responsible for creating the global stateexplicitly. The object containing the global state is calledcontext. While from the user's perspective context looks more orless like a pool of worker threads, from ?MQ's perspective it's justan object to store any global state that we happen to need. In the picture above, libA would have its own context and libBwould have its own as well. There would be no way for one of them tobreak or subvert the other one.

The lesson here is pretty obvious: Don't use global state inlibraries. If you do, the library is likely to break whenit happens to be instantiated twice in the same process.

24.3. Performance

When ?MQ was started, its primary goal was to optimize performance. Performance of messaging systems is expressed using two metrics:throughput—how many messages can be passed during a given amountof time; and latency—how long it takes for a message to get fromone endpoint to the other.

Which metric should we focus on? What's the relationship between thetwo?Isn't it obvious? Run the test, divide the overall time of the test bynumber of messages passed and what you get is latency. Divide thenumber of messages by time and what you get is throughput. In otherwords, latency is the inverse value of throughput. Trivial, right?

Instead of starting coding straight away we spent some weeks investigating the performance metrics in detail and we found outthat the relationship between throughput and latency is much moresubtle than that, and often the metrics are quite counter-intuitive.

Imagine A sending messages to B. (See Figure 24.2.) Theoverall time of the test is 6 seconds. There are 5 messagespassed. Therefore the throughput is 0.83 msgs/sec (5/6) and the latency is 1.2 sec (6/5), right?

Figure 24.2: Sending messages from A to B

Have a look at the diagram again. It takes a different time for eachmessage to get from A to B: 2 sec, 2.5 sec, 3 sec, 3.5 sec, 4 sec. Theaverage is 3 seconds, which is pretty far away from our originalcalculation of 1.2 second. This example shows the misconceptions people are intuitivelyinclined to make about performance metrics.

Now have a look at the throughput. The overall time of the test is 6seconds. However, at A it takes just 2 seconds to send all themessages. From A's perspective the throughput is 2.5 msgs/sec(5/2). At B it takes 4 seconds to receive all messages. So from B'sperspective the throughput is 1.25 msgs/sec (5/4). Neither of thesenumbers matches our original calculation of 1.2 msgs/sec.

To make a long story short, latency and throughput are two differentmetrics; that much is obvious. The important thing is to understandthe difference between the two and their mutual relationship. Latencycan be measured only between two different points in thesystem; There's no such thing as latency at point A. Each message hasits own latency. You can average the latencies of multiple messages;however, there's no such thing as latency of a stream ofmessages.

Throughput, on the other hand, can be measured only at asingle point of the system. There's a throughput at the sender,there's a throughput at the receiver, there's a throughput at anyintermediate point between the two, but there's no such thing asoverall throughput of the whole system. And throughput make sense only fora set of messages; there's no such thing as throughput of a singlemessage.

As for the relationship between throughput and latency, it turns outthere really is a relationship; however, the formula involvesintegrals and we won't discuss it here. For more information, read theliterature on queueingtheory.

There are many more pitfalls in benchmarking the messaging systemsthat we won't go further into. The stress should rather be placed onthe lesson learned: Make sure you understand the problem you aresolving. Even a problem as simple as "make it fast" can take lot ofwork to understand properly. What's more, if you don't understand theproblem, you are likely to build implicit assumptions and popularmyths into your code, making the solution either flawed or at leastmuch more complex or much less useful than it could possibly be.

24.4. Critical Path

We discovered during the optimization process that three factors have a crucialimpact on performance:

Number of memory allocations
Number of system calls
Concurrency model

However, not every memoryallocation or every system call has the same effect on performance. The performance we are interested in in messaging systems is the number ofmessages we can transfer between two endpoints during a given amountof time. Alternatively, we may be interested in how long it takes fora message to get from one endpoint to another.

However, given that ?MQ is designed for scenarios with long-livedconnections, the time it takes to establish a connection or the timeneeded to handle a connection error is basically irrelevant. Theseevents happen very rarely and so their impact on overall performanceis negligible.

The part of a codebase that gets used very frequently, over and overagain, is called the critical path; optimization should focus onthe critical path.

Let's have a look at an example: ?MQ is not extremely optimized withrespect to memory allocations. For example, when manipulating strings,it often allocates a new string for each intermediate phase of thetransformation. However, if we look strictly at the critical path—theactual message passing—we'll find out that it uses almost no memoryallocations. If messages are small, it's just one memory allocationper 256 messages (these messages are held in a single large allocatedmemory chunk). If, in addition, the stream of messages is steady,without huge traffic peaks, the number of memory allocations on thecritical path drops to zero (the allocated memory chunks are notreturned to the system, but re-used over and over again).

Lesson learned: optimize where it makes difference. Optimizing piecesof code that are not on the critical path is wasted effort.

24.5. Allocating Memory

Assuming that all the infrastructure was initialised and a connectionbetween two endpoints has been established, there's only one thingto allocate when sending a message: the message itself. Thus, tooptimize the critical path we had to look into how messages areallocated and passed up and down the stack.

It's common knowledge in the high-performance networking field that the bestperformance is achieved by carefully balancing the cost of messageallocation and the cost of message copying (for example,http://hal.inria.fr/docs/00/29/28/31/PDF/Open-MX-IOAT.pdf: see different handling of "small", "medium" and "large"messages). For small messages, copying is much cheaper thanallocating memory. It makes sense to allocate no new memory chunks atall and instead to copy the message to preallocated memory wheneverneeded. For large messages, on the other hand, copying is much moreexpensive than memory allocation. It makes sense to allocate themessage once and pass a pointer to the allocated block, instead ofcopying the data. This approach is called "zero-copy".

MQ handles both cases in a transparent manner. A ?MQ message isrepresented by an opaque handle. The content of very small messages isencoded directly in the handle. So making a copy of the handleactually copies the message data. When the message is larger, it'sallocated in a separate buffer and the handle contains just a pointer tothe buffer. Making a copy of the handle doesn't result in copying themessage data, which makes sense when the messageis megabytes long (Figure 24.3). It should be notedthat in the latter case the buffer is reference-counted so that it canbe referenced by multiple handles without the need to copy the data.

Lesson learned: When thinking about performance, don't assume there'sa single best solution. It may happen that there are severalsubclasses of the problem (e.g., small messages vs. large messages),each having its own optimal algorithm.

24.6. Batching

It has already been mentioned that the sheer number of system calls ina messaging system can result in a performance bottleneck. Actually,the problem is much more generic than that. There's a non-trivialperformance penalty associated with traversing the call stack andthus, when creating high-performance applications, it's wise to avoidas much stack traversing as possible.

Consider Figure 24.4. To send four messages, youhave to traverse the entire network stack four times (i.e., ?MQ, glibc,user/kernel space boundary, TCP implementation, IP implementation,Ethernet layer, the NIC itself and back up the stack again).

However,if you decide to join those messages into a single batch, there wouldbe only one traversal of the stack (Figure 24.5).The impact on message throughput can be overwhelming: up to two ordersof magnitude, especially if the messages are small and hundreds ofthem can be packed into a single batch.

On the other hand, batching can have negative impact on latency. Let'stake, for example, the well-known Nagle'salgorithm,as implemented in TCP. It delays the outbound messages for a certainamount of time and merges all the accumulated data into a singlepacket. Obviously, the end-to-end latency of the first message in thepacket is much worse than the latency of the last one. Thus, it'scommon for applications that need consistently low latency to switchNagle's algorithm off. It's even common to switch off batching onall levels of the stack (e.g., NIC's interrupt coalescing feature).

But again, no batching means extensive traversing of the stack andresults in low message throughput. We seem to be caught in a throughputversus latency dilemma.

MQ tries to deliver consistently low latencies combined with highthroughput using the following strategy: when message flow is sparse anddoesn't exceed the network stack's bandwidth, ?MQ turns all the batchingoff to improve latency. The trade-off here is somewhat higher CPUusage—we still have to traverse the stack frequently. However, thatisn't considered to be a problem in most cases.

When the message rate exceeds the bandwidth of the network stack, themessages have to be queued—stored in memory till the stack is readyto accept them. Queueing means the latency is going to grow. If themessage spends one second in the queue, end-to-end latency will be atleast one second. What's even worse, as the size of the queue grows,latencies will increase gradually. If the size of the queue is notbound, the latency can exceed any limit.

It has been observed that even though the network stack is tuned forlowest possible latency (Nagle's algorithm switched off, NIC interruptcoalescing turned off, etc.) latencies can still be dismal because ofthe queueing effect, as described above.

In such situations it makes sense to start batchingaggressively. There's nothing to lose as the latencies are alreadyhigh anyway. On the other hand, aggressive batching improvesthroughput and can empty the queue of pending messages—which inturn means the latency will gradually drop as the queueing delaydecreases. Once there are no outstanding messages in the queue, thebatching can be turned off to improve the latency even further.

One additional observation is that the batching should only be done onthe topmost level. If the messages are batched there, the lower layershave nothing to batch anyway, and so all the batching algorithmsunderneath do nothing except introduce additional latency.

Lesson learned: To get optimal throughput combined with optimalresponse time in an asynchronous system, turn off all the batchingalgorithms on the low layers of the stack and batch on the topmostlevel. Batch only when new data are arriving faster than they can beprocessed.

24.7. Architecture Overview

Up to this point we have focused on generic principles that make ?MQfast. From now on we'll have a look at the actual architecture of thesystem (Figure 24.6).

The user interacts with ?MQ using so-called "sockets". They are prettysimilar to TCP sockets, the main difference being that each socket canhandle communication with multiple peers, a bit like unbound UDPsockets do.

The socket object lives in the user's thread (see the discussion ofthreading models in the next section). Aside from that, ?MQ is runningmultiple worker threads that handle the asynchronous part of thecommunication: reading data from the network, enqueueing messages,accepting incoming connections, etc.

There are various objects living in the worker threads. Each of theseobjects is owned by exactly one parent object (ownership is denoted by a simple full line in thediagram). The parent can live in adifferent thread than the child. Most objects are owned directly bysockets; however, there are couple of cases where an object is ownedby an object which is owned by the socket. What we get is a tree ofobjects, with one such tree per socket. The tree is used during shut down; no object can shut itself down until it closes all itschildren. This way we can ensure that the shut down process works asexpected; for example, that pending outbound messages are pushed tothe network prior to terminating the sending process.

Roughly speaking, there are two kinds of asynchronous objects; thereare objects that are not involved in message passing and there areobjects that are. The former have to do mainly with connection management. For example,a TCP listener object listens for incoming TCP connections and createsan engine/session object for each new connection. Similarly, a TCPconnector object tries to connect to the TCP peer and when it succeedsit creates an engine/session object to manage the connection. Whensuch connection fails, the connector object tries to re-establish it.

The latter are objects that are handling data transfer itself. Theseobjects are composed of two parts: the session object is responsible forinteracting with the ?MQ socket, and the engine object is responsible forcommunication with the network. There's only one kind of the sessionobject, but there's a different engine type for each underlyingprotocol ?MQ supports. Thus, we have TCP engines, IPC (inter-processcommunication) engines, PGM engines (a reliable multicast protocol, see RFC 3208), etc.The set of engines is extensible—in the future we maychoose to implement, say, a WebSocket engine or an SCTP engine.

The sessions are exchanging messages with the sockets. There are twodirections to pass messages in and each direction is handled by a pipeobject. Each pipe is basically a lock-free queue optimized for fast passingof messages between threads.

Finally, there's a context object (discussed in the previous sections butnot shown on the diagram) that holds the global state and isaccessible by all the sockets and all the asynchronous objects.

24.8. Concurrency Model

One of the requirements for ?MQ was to take advantage of multi-coreboxes; in other words, to scale the throughput linearly with the number ofavailable CPU cores.

Our previous experience with messaging systems showed that usingmultiple threads in a classic way (critical sections, semaphores, etc.)doesn't yield much performance improvement. In fact, a multi-threadedversion of a messaging system can be slower than a single-threadedone, even if measured on a multi-core box. Individual threads aresimply spending too much time waiting for each other while, at thesame time, eliciting a lot of context switching that slows the systemdown.

Given these problems, we've decided to go for a different model. Thegoal was to avoid locking entirely and let each thread run at fullspeed. The communication between threads was to be provided viaasynchronous messages (events) passed between the threads. This, asinsiders know, is the classic actormodel.

The idea was to launch one worker thread per CPU core—having twothreads sharing the same core would only mean a lot of contextswitching for no particular advantage. Each internal ?MQ object, suchas say, a TCP engine, would be tightly bound to a particular workerthread. That, in turn, means that there's no need for criticalsections, mutexes, semaphores and the like. Additionally, these ?MQobjects won't be migrated between CPU cores so would thus avoid thenegative performance impact of cache pollution(Figure 24.7).

This design makes a lot of traditional multi-threading problemsdisappear. Nevertheless, there's a need to share the worker threadamong many objects, which in turn means there has to be some kind ofcooperative multitasking. This means we need a scheduler; objects need to be event-drivenrather than being in control of the entire event loop; we have to takecare of arbitrary sequences of events, even very rare ones; we have tomake sure that no object holds the CPU for too long; etc.

In short, the whole system has to become fully asynchronous. No object canafford to do a blocking operation, because it would not only blockitself but also all the other objects sharing the same worker thread. Allobjects have to become, whether explicitly or implicitly, statemachines. With hundreds or thousands of state machines running inparallel you have to take care of all the possible interactionsbetween them and—most importantly—of the shutdown process.

It turns out that shutting down a fully asynchronous system in a cleanway is a dauntingly complex task. Trying to shut down a thousand movingparts, some of them working, some idle, some in the process of beinginitiated, some of them already shutting down by themselves, is proneto all kinds of race conditions, resource leaks and similar. Theshutdown subsystem is definitely the most complex part of ?MQ. A quickcheck of the bug tracker indicates that some 30--50% of reported bugsare related to shutdown in one way or another.

Lesson learned: When striving for extreme performance and scalability,consider the actor model; it's almost the only game in town in suchcases. However, if you are not using a specialised system like Erlang or?MQ itself, you'll have to write and debug a lot of infrastructure byhand. Additionally, think, from the very beginning, about the procedure to shut down the system. It's going to be the most complex part of the codebase andif you have no clear idea how to implement it, you should probablyreconsider using the actor model in the first place.

24.9. Lock-Free Algorithms

Lock-free algorithms have been in vogue lately. They are simplemechanisms for inter-thread communication that don't rely on thekernel-provided synchronisation primitives, such as mutexes orsemaphores; rather, they do the synchronisation using atomic CPUoperations, such as atomic compare-and-swap (CAS). It should beunderstood that they are not literally lock-free—instead, locking isdone behind the scenes on the hardware level.

MQ uses a lock-free queue in pipe objects to pass messages between theuser's threads and ?MQ's worker threads. There are two interestingaspects to how ?MQ uses the lock-free queue.

First, each queue has exactly one writer thread and exactly one readerthread. If there's a need for 1-to-N communication, multiple queuesare created (Figure 24.8). Given that this waythe queue doesn't have to take care of synchronising the writers(there's only one writer) or readers (there's only one reader) it canbe implemented in an extra-efficient way.

Second, we realised that while lock-free algorithms were moreefficient than classic mutex-based algorithms, atomic CPU operationsare still rather expensive (especially when there's contention betweenCPU cores) and doing an atomic operation for each message writtenand/or each message read was slower than we were willing to accept.

The way to speed it up—once again—was batching. Imagine you had10 messages to be written to the queue. It can happen, for example,when you received a network packet containing 10 smallmessages. Receiving a packet is an atomic event; you cannot get half of it. This atomic event results in the need to write 10 messages tothe lock-free queue. There's not much point in doing an atomicoperation for each message. Instead, you can accumulate the messagesin a "pre-write" portion of the queue that's accessed solely by thewriter thread, and then flush it using a single atomic operation.

The same applies to reading from the queue. Imagine the 10 messages abovewere already flushed to the queue. The reader thread can extract eachmessage from the queue using an atomic operation. However, it's overkill; instead, it can move all the pending messages to a"pre-read" portion of the queue using a single atomicoperation. Afterwards, it can retrieve the messages from the "pre-read"buffer one by one. "Pre-read" is owned and accessed solely by thereader thread and thus no synchronisation whatsoever is needed in thatphase.

The arrow on the left of Figure 24.9 shows howthe pre-write buffer can be flushed to the queue simply by modifying asingle pointer. The arrow on the right shows how the whole content of thequeue can be shifted to the pre-read by doing nothing but modifyinganother pointer.

Lesson learned: Lock-free algorithms are hard to invent, troublesometo implement and almost impossible to debug. If at all possible, usean existing proven algorithm rather than inventing your own. Whenextreme performance is required, don't rely solely on lock-freealgorithms. While they are fast, the performance can be significantlyimproved by doing smart batching on top of them.

24.10. API

The user interface is the most important part of any product. It's theonly part of your program visible to the outside world and if you getit wrong the world will hate you. In end-user products it's either the GUIor the command line interface. In libraries it's the API.

In early versions of ?MQ the API was based on AMQP's model ofexchanges and queues. (See the AMQP specification.) From a historical perspective it's interesting to have a look at thewhite paper from 2007 that tries to reconcile AMQP with a brokerlessmodel of messaging. Ispent the end of 2009 rewriting it almost from scratch to use the BSD SocketAPI instead. That was the turning point; ?MQ adoption soared from thatpoint on. While before it was a niche product used by a bunch ofmessaging experts, afterwards it became a handy commonplace tool foranybody. In a year or so the size of the community increased tenfold,some 20 bindings to different languages were implemented, etc.

The user interface defines the perception of a product. With basicallyno change to the functionality—just by changing the API—?MQ changed from an "enterprise messaging" product to a "networking"product. In other words, the perception changed from "a complex pieceof infrastructure for big banks" to "hey, this helps me to send my10-byte-long message from application A to application B".

Lesson learned: Understand what you want your project to be and designthe user interface accordingly. Having a user interface that doesn't align withthe vision of the project is a 100% guaranteed way to fail.

One of the important aspects of the move to the BSD Sockets API was that itwasn't a revolutionary freshly invented API, but an existing andwell-known one. Actually, the BSD Sockets API is one of the oldest APIsstill in active use today; it dates back to 1983 and 4.2BSD Unix. It's beenwidely used and stable for literally decades.

The above fact brings a lot of advantages. Firstly, it's an API thateverybody knows, so the learning curve is ludicrously flat. Even ifyou've never heard of ?MQ, you can build your first applicationin couple of minutes thanks to the fact that you are able to reuseyour BSD Sockets knowledge.

Secondly, using a widely implemented API enables integration of ?MQwith existing technologies. For example, exposing ?MQ objects as"sockets" or "file descriptors" allows for processing TCP, UDP,pipe, file and ?MQ events in the same event loop. Another example: theexperimental project to bring ?MQ-like functionality to the Linuxkernel turned outto be pretty simple to implement. By sharing the same conceptualframework it can re-use a lot of infrastructure already in place.

Thirdly and probably most importantly, the fact that the BSD SocketsAPI survived almost three decades despite numerous attempts toreplace it means that there is something inherently right in thedesign. BSD Sockets API designers have—whether deliberately or bychance—made the right design decisions. By adopting the API we canautomatically share those design decisions without even knowing whatthey were and what problem they were solving.

Lesson learned: While code reuse has been promoted from timeimmemorial and patternreusejoined in later on, it's important to think of reuse in aneven more generic way. When designing a product, have a look at similarproducts. Check which have failed and which have succeeded; learn fromthe successful projects. Don't succumb to Not Invented Heresyndrome. Reusethe ideas, the APIs, the conceptual frameworks, whatever you findappropriate. By doing so you are allowing users to reuse theirexisting knowledge. At the same time you may be avoiding technicalpitfalls you are not even aware of at the moment.

24.11. Messaging Patterns

In any messaging system, the most important design problem is that of how toprovide a way for the user to specify which messages are routed towhich destinations. There are two main approaches, and I believe thisdichotomy is quite generic and applicable to basically any problemencountered in the domain of software.

One approach is to adopt the Unixphilosophyof "do one thing and do it well". What this means is that the problemdomain should be artificially restricted to a small and well-understood area. The program should then solve this restricted problemin a correct and exhaustive way. An example of such approach inthe messaging area is MQTT. It's aprotocol for distributing messages to a set of consumers. It can't beused for anything else (say for RPC) but it is easy to use and doesmessage distribution well.

The other approach is to focus on generality and provide a powerfuland highly configurable system. AMQP is an example of such asystem. Its model of queues and exchanges provides the user with themeans to programmatically define almost any routing algorithm they canthink of. The trade-off, of course, is a lot of options to take careof.

MQ opts for the former model because it allows the resultingproduct to be used by basically anyone, while the generic modelrequires messaging experts to use it. To demonstrate the point, let'shave a look how the model affects the complexity of the API. Whatfollows is implementation of RPC client on top of a generic system(AMQP):

connect ("192.168.0.111")exchange.declare (exchange="requests", type="direct", passive=false,    durable=true, no-wait=true, arguments={})exchange.declare (exchange="replies", type="direct", passive=false,    durable=true, no-wait=true, arguments={})reply-queue = queue.declare (queue="", passive=false, durable=false,    exclusive=true, auto-delete=true, no-wait=false, arguments={})queue.bind (queue=reply-queue, exchange="replies",    routing-key=reply-queue)queue.consume (queue=reply-queue, consumer-tag="", no-local=false,    no-ack=false, exclusive=true, no-wait=true, arguments={})request = new-message ("Hello World!")request.reply-to = reply-queuerequest.correlation-id = generate-unique-id ()basic.publish (exchange="requests", routing-key="my-service",    mandatory=true, immediate=false)reply = get-message ()

On the other hand, ?MQ splits the messaging landscape into so-called "messagingpatterns". Examples of the patterns are "publish/subscribe","request/reply" or "parallelised pipeline". Each messaging patternis completely orthogonal to other patterns and can be thought of as aseparate tool.

What follows is the re-implementation of the above application using?MQ's request/reply pattern. Note how all the option tweaking isreduced to the single step of choosing the right messaging pattern("REQ"):

s = socket (REQ)s.connect ("tcp://192.168.0.111:5555")s.send ("Hello World!")reply = s.recv ()

Up to this point we've argued that specific solutions are better thangeneric solutions. We want our solution to be as specific aspossible. However, at the same time we want to provide our customerswith as wide a range of functionality as possible. How can we solve thisapparent contradiction?

The answer consists of two steps:

Define a layer of the stack to deal with a particular problem area (e.g. transport, routing, presentation, etc.).
Provide multiple implementations of the layer. There should be a separate non-intersecting implementation for each use case.

Let's have a look at the example of the transport layer in theInternet stack. It's meant to provide services such as transferringdata streams, applying flow control, providing reliability, etc., on thetop of the network layer (IP). It does so by defining multiplenon-intersecting solutions: TCP for connection-oriented reliablestream transfer, UDP for connectionless unreliable packet transfer,SCTP for transfer of multiple streams, DCCP for unreliable connectionsand so on.

Note that each implementation is completely orthogonal: a UDP endpointcannot speak to a TCP endpoint. Neither can a SCTP endpoint speak to a DCCPendpoint. It means that new implementations can be added to the stackat any moment without affecting the existing portions of thestack. Conversely, failed implementations can be forgotten anddiscarded without compromising the viability of the transport layer as awhole.

The same principle applies to messaging patterns as defined by?MQ. Messaging patterns form a layer (the so-called "scalability layer")on top of the transport layer (TCP and friends). Individual messagingpatterns are implementations of this layer. They are strictlyorthogonal—the publish/subscribe endpoint can't speak to the request/replyendpoint, etc. Strict separation between the patterns in turn meansthat new patterns can be added as needed and that failed experimentswith new patterns won't hurt the existing patterns.

Lesson learned: When solving a complex and multi-faceted problem itmay turn out that a monolithic general-purpose solution may not be thebest way to go. Instead, we can think of the problem area as anabstract layer and provide multiple implementations of this layer,each focused on a specific well-defined use case. When doing so,delineate the use case carefully. Be sure about what is in the scopeand what is not. By restricting the use case too aggressively theapplication of your software may be limited. If you define the problemtoo broadly, however, the product may become too complex, blurry andconfusing for the users.

24.12. Conclusion

As our world becomes populated with lots of small computers connectedvia the Internet—mobile phones, RFID readers, tablets andlaptops, GPS devices, etc.—the problem of distributed computing ceasesto be the domain of academic science and becomes a common everydayproblem for every developer to tackle. The solutions, unfortunately,are mostly domain-specific hacks. This article summarises ourexperience with building a large-scale distributed system in asystematic manner. It focuses on problems that are interesting froma software architecture point of view, and we hope that designers andprogrammers in the open source community will find it useful.

本站仅提供存储服务，所有内容均由用户发布，如发现有害或侵权内容，请点击举报。