Lew Pitcher's "Servers 101"


WFTL-Lug Linux Academy - Servers 101

Introduction

Welcome to the WFTL-Lug Linux Academy's "Servers 101" course.

In this course, we will explore the definition, theory, design and implementation of a software server in a Linux environment. This is the first
course in an intended 3-course suite, consisting of
Servers 101 - The basics of software servers
Servers 201 - An examination of 3 Linux server applications
Servers 202 - Managing and Securing Linux server applications

Servers 101 ("The Basics of Software Servers") will present the basics of server design. This is not intended to be a comprehensive course, but more a background on what software servers are, how they work, and how they are (or can be) built. From this course, you should gain an understanding of the techniques used to build servers, and thus have the background to tackle the more difficult questions like "What servers do I have?", "How do I manage my servers?" and "How do I secure my servers?"

In Lesson 1, we will try to answer the questions "What is a server?" and "What is a client?" Here, we'll explore the basic definition of client and server and try to understand how they interact. We'll also touch on the alternatives to client and server, in order to understand how they are different.

Lesson 2 will address the mechanics of servers and clients, answering the questions "How do servers work?" and "How do clients work?". Here, we'll gain an understanding of the mechanisms that servers and clients use to manage themselves, and in general, how their design can influence their stability and security.

Lesson 3 will examine some common servers found in most Linux systems. We'll look at servers that live their lives "under the covers", not visable to the casual user. We'll learn how these servers are started, and how they are shut down, and we'll see what sort of security facilities are available to them.

Lesson 4 will continue where lesson 3 left off, and will examine more common servers. However, this time we'll look at servers that live their lives "in front of the spotlight", publicly exposed through a GUI.

Lesson 5 and Lesson 6 will show the steps in designing and building a simple server. Lesson 5 will cover the design of the server, while lesson 6 will address its implementation. Code will be discussed here, so be prepared to do some reading and thinking.

Lesson 7 and Lesson 8 will show the steps in designing and building a simple client for our simple server. Lesson 7 will cover the design of the client, while lesson 8 will address its implementation. Again, code will be discussed here. Don't be afraid, I'll keep it simple.

Finally, Lesson 9 will wrap up "Servers 101" with a review of the material previously presented, and a summary of any discussion that the course has generated.

This course is not intended to be an indepth technical discussion of the subject matter. It will get into some technicalities (especially in Lessons 5 through 8), but you should need no more than a standard "bash" commandline and a little patience in order to follow along. The coding in Lessons 6 and 8 will be done as bash shell scripts, and will not involve root access.

If you want to follow along, you will need

  • some space on your system to build a couple of shell scripts in,
  • a text editor (your choice)
  • the bash shell, and
  • some patience

If you want more material to read, I can suggest the following books and websites:

Books (caution, these are technical, but worth it)

"The UNIX Programming Environment"
by Brian W. Kernighan and Rob Pike
(Old but still good book on developing in Unix. Has a lot of good info on
building shell scripts and programs for Unix.)

"Operating Systems - Design and Implementation"
by Andrew Tanenbaum and Albert Woodhull
(How unix is built, sort of. Uses client/server architecture
to illustrate many design points. Good book for understanding
how unix works, and how to build client/server systems)

"The Art of Computer Programming - Volume 1: Fundamental Algorithms"
by Donald Knuth
(Quinticential programmers textbook. Good discussion on subroutines and
coroutines, which make up the basis of client/server architectures)

"Advanced Programming in the UNIX Environment"
by W. Richard Stevens
("Must have" Unix programmers book that covers the details of building
daemons (servers that live under the covers), managing inter-process
communications (how client and server talk to each other) and other Unix
development techniques. Includes a chapter on the construction of a
particular client and server.)

"UNIX Network Programming - Volume 1: Networking APIs" and
"UNIX Network Programming - Volume 2: Interprocess Communications"
by W. Richard Stevens
("Must have" Unix programmers books that cover the details of the UNIX
communication APIs. Again, these are how client and server talk to each
other.)

Websites



Lesson 1: "What is a server?", "What is a client?"

Introduction

"My main server is a 2U[1] rackmount server and it hosts my email server and my web server. My backup server is a desktop, not a server, and it normally runs my database server."

Confusing? You bet. There are servers and servers and servers, and they are all different.

First off, there's software. These servers are programs that provide a service, like an email server or a web server. It doesn't matter where these servers run, laptop, desktop or rackmount; they provide a service and are known as servers.

Next, there's hardware. These servers are computers built from "industrial grade" parts to withstand constant and frequent use, like a rackmount 2U processor frame. These servers are reliable, durable, and fast[2] (as compared to hardware platforms like desktops or laptops that can be tempermental, fragile, or slow) that provide the computing power and data throughput that can accompany use of common server programs.

Finally, there's the concept. These servers are systems that provide some value to external bodies. Almost any sort of hardware and software combination could be called a "server", so long as it provides some sort of service.

In this course, we'll explore the "software" variety of server, concentrating on their purpose and design. We'll learn a bit about their construction, and learn how to identify some of the common servers found on many Linux systems. This course will not cover the management, maintenance, or security of any one server; that's the topic of another course.


"What is a server?", "What is a client?"

Imagine that you have a hunger for a flame-broiled hamburger. You go to the counter of your friendly local burger joint, and a bright-eyed youngster asks
you:

"May I take your order?"

You reply:

"One hamburger, mustard, relish, extra lettuce and tomato, hold the onion."

and wait patiently for a minute or two. Then, after that brief interlude, the bright-eyed youngster says

"Here you are - one hamburger, just the way you like it."

and hands you your hot meat sandwich on a bun. You pay your bill, take your hamburger to a table, and have a juicy, cholesterol-filled lunch.

So, what does a fast food lunch have to do with "servers"? The answer is that your lunchtime activities and the activities performed by a server are alike in many ways.

At lunch, we had two immediate participants: the customer who ordered and received the hamburger, and the counter person who took the order and delivered the hamburger. On our computer, we have programs and processes that ask for and receive information, and programs and processes that receive requests and fulfill them. We call the processes that ask for things "clients" and the processes that answer the requests "servers".

The analogy goes deeper than this, though. Both "customer" and "client" initiate their respective interactions and wait for the other participant to act. Both "counterperson" and "server" wait for the other participant to initiate the interaction, and both only participate to the extend of answering the request. In both the computer and the hamburger joint, the
counterperson/server handles, serially, many interactions with different customers/clients. Each interaction, computer or restaurant, will consist of at least one back-and-forth exchange, and may span several back-and-forth exchanges before the interaction is complete.

From this analogy, we get a pretty good picture of what a server and a client "is".

A server provides a service. It waits until someone (a client) asks for the service, and it then satisfies that request. The client, on the other hand, finds that it cannot perform some task on it's own, and so it asks the server to perform the task for it. The client waits patiently while the server takes care of the request, and then continues with it's own work when the server replies with the results.

As a visual aid, take a look at the following two diagrams. You will see the similarity between how the fast food restaurant example works and how a computer client and server work.

Example: Order a Hamburger at a Burger Joint

                               counter
            customer           person
 
                         [previous customer]
                                 V
 hunger ----->.                  :
              | "burger please"  :
              |----------------->:
              :                  |
              :                  |
           (wait)            make burger
              :                  |
              :  "here you go"   |
              :<-----------------|
              |                  :
  eat <-------'                  :
                                 V
                           [next customer]

Example: Request Data on a Computer


           client             server    
 
                         [previous client]
 need                           V
 data ------>                   :
             |  "data please"   :
             |----------------->:
             :                  |
             :                  |
          (wait)            compute data
             :                  |
             :  "here it is"    |
             :<-----------------|
             |                  :
 use<--------+                  :
 data                           V
                          [next client]

In both cases, there are two participants: one who asks for service, and one who supplies service. The service requester (the customer or the "client") interacts with the service supplier (the counterperson or the "server") once, to have their request (food or data) serviced. Once they are satisfied, the customer/client goes away, and continues their own actions without further assistance from the counterperson/server. The service supplier (the counterperson/server) services customers/clients sequentially, taking care of one client completely, then moving on to the next client in line. In computers, this arrangement of a single client and a single server is called a "two tier client/server architecture[3]". The "two tiers" refers to the two participants: client and server.


How many participants?

But, if we take a closer look at our fast food restaurant, we might notice something unusual about the customer and counterperson. The counterperson doesn't actually cook the hamburger that the customer ordered; instead, he passes the food order along to a cook, who actually prepares the hamburger.

The counterperson waits patiently for the cook to make the hamburger and once the cook is done, he packages the hamburger in it's convenient foil wrapper and turns back to the customer.

If we didn't know better, we'd think that the counterperson is a client of the cook. He acts like a client, asking for a service, and waiting for the results. And the cook acts like a server, waiting for an order, satisfying it, and then waiting for the next order. Well, indeed, the counterperson and cook are in a client/server relationship.

So, if we take a step back, and look at the larger picture, we see the customer, the counterperson, and the cook all need to be there. Since there are three participants, we would call this a "three tier client/server architecture". You can see where this is going... the more people that are involved in satisfying the initial request (including the person who makes the request), the more "tiers" there are. There is one tier or level for each requester, and one tier for the ultimate server. The general term for this sort of thing is a "n-tiered client/server architecture", where the "n" refers to an unknown (or general) number of tiers.

Example: A closer look at the Burger Joint


                               counter
            customer           person                cook
            (tier 1)           (tier 2)            (tier 3) 
  
                         [previous customer]   [previous order]
                                 V                    V
 hunger ----->.                  :                    :
              | "burger please"  :                    :
              |----------------->:                    :
              :                  |    "cook burger"   :
              :                  |------------------->:  
              :                  :                    |
              :               (wait)              cook burger
              :                  :                    |
           (wait)                :<-------------------+
              :                  |     "order up"     :
              :                  |                    :
              :               wrap burger             :
              :                  |                    :
              :  "here you go"   |                    :
              :<-----------------:                    :
              |                  :                    :
  eat <-------'                  :                    :
                                 V                    V
                           [next customer]      [next order]

Example: A closer look at the computer



             client          public server      support server
            (tier 1)           (tier 2)            (tier 3) 
  
                          [previous client]  [previous client]
                                 V                    V
 need ------->.                  :                    :
 data         |  "data please"   :                    :
              |----------------->:                    :
              :                  |    "get data"      :
              :                  |------------------->:  
              :                  :                    |
              :               (wait)             retrieve data
              :                  :                    |
           (wait)                :<-------------------+
              :                  |     "data here"    :
              :                  |                    :
              :             modify data               :
              :                  |                    :
              :  "here you go"   |                    :
              :<-----------------:                    :
              |                  :                    :
  use <-------'                  :                    :
 data                            V                    V
                           [next client]       [next client]

You will also notice that some servers act as clients to other servers. This is not unusual in a multi-tiered environment. A digital example of this sort of thing would be how your email works. To send an email, your client (thunderbird or kmail or mutt) talks to an email server. That email server passes the email to the email server that the recipient uses, and that email server passes the email on to the program that the recipient uses to read his mail. In that scenario, email is at least a 4-tiered client/server environment:

your client -> your server -> his server -> his client.

"You want it when?"

Let's try a slightly different scenario: this time, you aren't looking for fast food, but instead are interested in a sit-down lunch. You sit down at a table in your friendly neighbourhood sports bar, and the jersey-clad waitress asks you

"May I take your order?"

Again, you reply

"One hamburger, mustard, relish, extra lettuce and tomato, hold the onion."

and the waitress replies

"Sure thing, honey."

as she saunters off to the kitchen.

This time, instead of waiting patiently for your meal, you turn to the ever present TV, and watch the sports news. Perhaps you take out your newspaper, and work on the crossword puzzle a bit, or turn to some of the paperwork you brought from the office and complete that estimate you promised for 1PM.

After a while, the waitress returns to your table, says

"Here you are - one hamburger, just the way you like it."

and hands you your hot meat sandwich on a bun. You take your hamburger, and again have a juicy, cholesterol-filled lunch.

Asynchronous client/server

The difference between this meal and the previous one is that, instead of stopping and waiting patiently for the server to give you your hamburger, you continued to do your work, and let the server interrupt you when she was ready with your meal. Your activities were no longer synchronized with the waitress; you participated in an "asynchronous" relationship with her.

Similarly, computer client/server interactions can either be "synchronous" (where the client stops and waits for the server to complete the request) or "asynchronous" (where the client continues on with it's work while the server satisfies the request). Simple client/server environments rarely use this "asynchronous" model; there's just no need for that complexity. However, there are occasions where "asynchronous" is more useful than "synchronous".

When you launch a program in your Linux GUI, often you notice that your mouse cursor flashes or sparkles while the program is loading up. This "busy cursor" is the result of an asynchronous client/server environment. Your GUI says "launch this program" to the subsystem (a server) that executes applications, and then makes the cursor go sparkly. After a while, the launcher subsystem says to the GUI "the program is launched", and the GUI changes the cursor back into the normal pointer. The GUI didn't sit and wait for the launcher to complete; you could still move the mouse around, open windows, and click on buttons and icons while the server worked to get the program launched. The GUI and the launcher acted asynchronously of each other.

Servers that do not reply

This "asynchronous" behaviour can be useful in certain circumstances. What if the client didn't care to know if the server accomplished the request or not? If the delay between request and reply is too long (minutes, hours, days, or even forever), the client might not want to wait for the server to reply. In that case, the client and server can act asynchronously, and the client can just ignore the server's reply when it comes. Your email process does this; Kmail hands the mail to your server, and does not wait for the server to tell it that the mail was received by the recipient. In fact, your email server passes the email message to the recipients email server, and does not wait for the recipient to read the mail. Your email client, and the email servers all act asynchronously.


Peer-to-Peer, or "I'll scratch your back; you scratch mine."

There are alternatives to "client/server" style architectures, the most common of which is known as "peer-to-peer". In peer-to-peer (or P2P, if you will), the participants are equals, acting both client to the other's server and server to the others client. To explain how these work, we'll again look at a meal, but this one occurs at home.

You and your spouse arrive home from work. You tell your spouse,

"I'm hungry. Let's make dinner."

and your spouse replies

"OK. Let's have steak and baked potato."

You say

"Fine. I'll barbecue the steak. Will you make the baked potatoes?"

to which your spouse says

"Sure, but you set the table, and I'll make the salad."

You barbecue the steak and set the table, while your spouse bakes the potatoes and makes the salad. When you both are done, you both distribute the steaks, baked potatoes and salad, and both sit down to a well-deserved meal.

In this scenario, you and your peer each acted as client and server. You (as a client) asked for baked potatoes and a salad, which your spouse (as a server) prepared. Your spouse (as a client) asked for steak and a set table, which you (as a server) prepared. You both satisfied the other's requests and had your own requests satisfied in turn. Instead of the client and server rolls, you took on the roll of peers, performing different activities but cooperating to accomplish a mutually-agreed-upon goal.

Peer-to-Peer architectures do this as well; your "file-sharing" program not only pulls files in from other P2P systems for you to use, it supplies your files to the other P2P systems for their use. Your file-sharing tool not only acts as a client to those other systems, it acts as a server to them as well.

For what it's worth, Peer-to-peer architectures are complex enough to be the subject of a separate tutorial. Other than mention them here, we will not explore P2P in this series.


[1] 2U - a size of rack mounted computer box, 2 "units" high. See this picture for an example.
[2] As Charles points out, hardware servers have features like hot-swappable power supplies and hard drives, so that they can be repaired even while they are still running. Additionally, these sorts of systems typically have more, faster, and more stable ("better") RAM memory to reduce the amount of disk access and keep programs and data moving at a good pace.
[3] OK, so I sprang that term on you. Think of an "architecture" as a general plan or method of building things. Igloos are all different in shape, size and placement, but they all follow the same "architecture" of a small one-room domed building with a low sheltered entrance. Similarly, parking garages, while various in shape, size, and form, all follow a similar "architecture" of a long, covered, winding aisle lined with car-sized spaces on both sides. Computer programs also have "architectures" or common ways in which they are put together. In this series of lessons, we are going to explore one of those ways: the "client/server" architecture.


Lesson 2: "How do servers work?", "How do clients work?"

Recap - The Story So Far

In Lesson 1, we saw how client and server work together to satisfy some activity that the client needs to complete. The client asks the server for a service, and either waits for the server to complete the request ("synchronous") or continues on with its processing, depending on the server to tell it when the request has been completed ("asynchronous").

The server, on the other hand, listens for client requests and processes them. The server may delegate part of the processing to other servers by acting as a client to those servers ("3 tier" and "n tier" servers). Finally, when the server has completed the request, it delivers the results back to the client, and begins the procss of listening for a client request again.

But, while the restaurant meal example that I used in Lesson 1 gives us a good idea of what roles servers and clients fulfil, it doesn't really explain how two pieces of software can work together in a client/server role. You may think that it can't be as simple as it looks, and (if you do), then you'd be correct.

So, let's take a closer look at how servers and clients work.

Overall Design

Clients talk to servers, and servers service clients. This alone ties the design of the client to the design of the server so that when we talk of one, we must talk of the other.

The typical "synchronous" client (for any server) has a simple form. It

    ------------ Client ---------------   ---- Fast Food Customer ------
 a) establishes a temporary connection  | a) Stands in line to order,
    to the server,                      |
                                        |
 b) formats and writes it's request to  | b) Orders food
    the server connection               |
 c) blocks, waiting for the server to   | c) Waits for order-taker
    reply,                              |    to return with food.
 d) reads the server's reply and        | d) Takes food from order-taker
    extracts the relevant details       |
                                        |
 e) disconnects the temporary           | e) Leaves the order line
    connection to the server

Of course, there are variations on this logic. If the client intends to reuse the server's services many times, it may decide to maintain the connection to the server between requests, rather than disconnecting at the end of each request and connecting at the beginning of the next one. If the server does not deliver replies, or the client does not want the reply, it can forego the wait and read steps.

The typical "asynchronous" client has a similar structure, in that it

    ------------ Client ---------------   ---- Restaurant Customer -----
 a) establishes a temporary connection  | a) Flags down waitress
    to the server,                      |
                                        |
 b) sets up an "interrupt handler"[1]   | b) prepares to be interrupted
    to process the server's reply       |
 c) formats and writes it's request to  | c) orders food
    the server connection               |
                                        |
 d) performs usefull work until the     | d) Watches "Foxy Boxing" on
    "interrupt handler" is invoked      |    TSN, completes crossword
                                        |
 e) the "interrupt handler" will        | e) when food arrives
    i)  read the server's reply and     |    i) pay for and eat food
        extract the relevant details    |
    ii) disconnect the temporary        |    ii) leave restaurant
        connection to the server

Again, the client can vary some of the details, like whether or not to disconnect from the server between requests, and ignoring the server's interrupt if the reply isn't required.

Typically, servers service many clients, often "simultaneously". This behaviour imposes restrictions and decisions on how the server is built, how it operates, and how it is managed and secured. All servers have similar core logic, but their "wrappings" differ, depending on how they are intended to run.

The server's core logic just consists of:

    ---------- Server --------------   ----------- Waiter ------------
 a) block, waiting for a client      | a) wait for next customer to
    request                          |    come to counter
 b) read the client request and      | b) takes customer food order
    extract the relevant details     |
 c) process the request, performing  | c) prepares the food order
    the requested task or generating |
    an error                         |
 d) format and write a reply to the  | d) Gives customer the order
    client                           |

This logic is usually "wrapped" in one of two wrappers, depending on how the server is intended to run. If the server is intended to handle one request and then quit, the server wrapper is very simple:

    ---------- Server --------------   ----------- Waiter ------------
 a) open channel for client          | a) open cash register
 b) perform server function as above | b) handle customer
 c) close channel                    | c) close cash register
 d) terminate                        | d) leave

Obviously, this only handles one client request, and the server will need some external assistance to determine if and when a client wants to use it.

However, if the server is intended to continue indefinitely, handling many client requests, the server wrapper becomes a little more complex:

    ---------- Server --------------   ----------- Waiter ------------
 a) establish method for external    | a) start shift
    environment to request server    |
    termination                      |
 b) open a channel for the client(s) | b) open cash register 
 c) loop until the external          | c) while still on shift,
    environment requests             |    - service customers
    termination,                     |      as above
     - perform server function as    |
       above,                        |
 d) close the channel to the clients | d) close cash register
 e) terminate                        | e) go home

This sort of server handles multiple clients one after another ("serially"), and (depending on how sophisticated the programmer got) can be made to handle several clients at the same time ("in parallel"). Serial processesing is easier, everybody waits their turn, and the server just handles one client at a time. You can think of serial servers as being like the cashier in the fast food restaurant, who must finish with one customer before taking the next customer's order. The cashier processes the customers "serially", just like a serial server.

For the server to handle several clients at once, it in essence makes several copies of itself and arranges for each copy to handle one client. The copies can run as independant processes[2] or as threads[3] within the server. You can think of this sort of server as being like the order-taker at a fast food drive-through; every customer sees only one order-taker (the speaker/microphone at the head of the drive-through), but there really are several order-takers taking care of drive-through customers, alternating from car to car as the orders are completed.

So, why is all this important to us? Well, servers that use separate threads or processes to handle individual clients have trade-offs that we should be aware of. They trade an increase in the amount of work that they can do at any one time for an increased load on the system. Servers that work this way tend to have more complex security issues as well, because there are more opportunities for the clients to interact with each other through the server.

"Single threaded" servers (servers that process all their requests serially) are slower, but don't impose as big a load on the system. These servers are usually a bit more secure than servers that handle clients in parallel, because the server must finish with one client before it can go on to the next one, and clients have no chance to interact with each other through the server.

Transactions

An important part of servers and clients is that clients expect servers to do something. That usually means that servers and clients need a commonly agreed apon method of determining when to do something, and what the results should be. So, we'll first look at the interaction between client and server, and server and data to see whats important in those interactions.

In Lesson 1, we saw in our restaurant examples that the client and server carry on a conversation that is typically initiated by the client and and completed by the server. The client asks for something, and the server responds. We use the word "transaction" to signify a conversation consisting of a single, complete request and reply to that request. The conversation that goes

Client: "How are you?"
Server" "I am fine."

is considered to be a single transaction.

On the other hand, the conversation that goes:

Client: "How are you?"
Server: "I feel lousy. Do you want to hear what's wrong?"

is not considered to be a single transaction, because the server has not given a complete reply. The conversation will be considered to be a single transaction when it concludes with

Client: "No thanks."
Server: "OK then."

You see, a transaction can have more than one back-and-forth, client-to-server-to-client action before it is a complete transaction.

Similarly, a conversation that goes:

Client: "How are you"
Server: "I am fine"

Client: "How is your wife?"
Server: "My wife is fine."

is a conversation with two transactions. Conversations don't necessarily end once the transaction completes; they can continue into the next transaction.

So, a "transaction" is a conversation that has a definitive conclusion, or at least points where conclusions can be found. A "transaction" represents one or more "units of work".

And, a "unit of work" is what we call the simplest task that a server can perform. A "unit of work" is "atomic"; it can't be broken up into smaller pieces for the server to do individually.

Why would a "unit of work" be important? Consider the following conversation:

Client: "Please tell me the temperature in Toronto and Regina"
Server" "Toronto is 15 degrees Celcius. Regina is 19 degrees Celcius"

That conversation looks like a transaction, but the server took two units of work to complete it. If we rephrase the conversation, we see the units of work:

Client: "Please tell me the temperature in Toronto."
Server: "Toronto is 15 degrees Celcius."

Client: "Please tell me the temperature in Regina."
Server: "Regina is 19 degrees Celcius"

Units of work help delimit how much work the server must do before it stops, saves its results and produces a reply. The server can defer the reply, and consolidate the results of many units of work, but each unit of work must be complete before the next one starts.

Enquiry servers (those sorts of servers that clients ask questions of) rarely update things, but update servers (those sorts of servers that clients ask to perform activities through) arrange things so that the updates all happen at the end of the unit of work. At that point, the server either says "It's done and recorded", or "I've discarded it all".

Of course, a server may have partial results that it wants to save during it's processing. It can "commit" (save) those partial results, but there has to be a way to undo the update if the remaining processing has a problem. For instance, the request

Client: "Move $50.00 from my chequing account to my savings account"

might result in the server generating a withdrawal of $50 from one account, and a deposit of $50 to a second account. If the server "commits" the withdrawal, and then finds that the customer does not have a savings account to deposit to, it cannot just say "No account" and forget the request; It has to put the $50 back into the customer's chequing account.

If the server only commits at the end of the transaction, then server could just "forget" the whole thing when it determines that the savings account was missing; since no results were saved (no "commit" taken), nothing would have changed.

But, the server that "commits" part way through the transaction must have a way to "revoke" or "undo" its work. So, our server just got a little bit more complicated. If it finds that it has to update part way through a unit of work, it needs a way to undo the update. So, most servers do not update part way through their units of work because their developers didn't want to (or couldn't) find a way to undo intemediary results.

So, what makes conversations, transactions and commit points interesting to us? A server begins life at the start of a conversation and a transaction, and ends life at the end of a conversation, the end of a transaction, and at a commit point. If one of these is not complete when the server stops, then we have problems that will need to be fixed, either at the server or in the client. If our server fails, then we need to inspect whether or not there were any outstanding conversations, transactions or commit points, and take actions to get things back in order. This may include correcting data or restarting clients.



Messages between client and server

So far, we've talked alot about the client and the server exchanging information but we really haven't described what this information looks like or how the client and server perform the exchange.

Typically, clients and servers exchange information in atomic chunks called "messages". The client understands enough of the server's requirements to format it's information into a "request message", and the server understands enough of the client's requirements to format it's information into a "response message". The format of these messages is, in theory, dictated by an adherance by both parties to a set of common standards, and (in practice) is usually dicatated by the server, but sometimes is set by common agreement between client and server. In any case, both client and server must "speak the same language" as it were.

So, what does this language look like?

In some cases, it is a simple copy of internal program structures within the client or server programs. Such a message is usually constructed with fixed-length, fixed-position elements that often contain "machine language" binary data.

For example, a message that represents a customer order at a restaurant may be structured as a list of binary numbers, each one representing either a menu item or a modification to the menu item. Or, closer to home, the messages exchanged between DNS server[4] to resolve Internet Domain Name issues consists of a series of fixed size, fixed position binary numbers (representing identifiers and flags of a DNS query or response), followed by a number of variable length text blocks (with lengths imbedded in the first byte of each block) that carry the domain name information.

For such messages, both client and server must agree on more than just the positioning of elements and their meanings. They must also agree on how binary numbers are to be interpreted (which "end" of a multibyte binary number represents the low order digits, or what format will be used to represent floatingpoint numbers) and what characterset the textual elements will be expressed in. If client or server normally operate with different binary or floatingpoint formats, or on different text charactersets, the two programs must agree on how the "foreign" information is to be transformed and still retain it's meaning.

As you might guess, dealing with the incompatabilities between expressions of these internal program structures ("binary records") often becomes more work than is saved by using program structures rather than interpreted messages. When you work so close to the internals of the programs, any slight deviation or corruption can cause catastrophic consequences. Often, designers look for ways to increase the redundancy in messages, just so they can ensure that corruption and message deviations wont cause problems.

The simplest way to introduce redundancy is to express the entire message as one or more lines of text. Text has the advantage of being fairly easy to process with normal program tools, while retaining the redundancy that solves data corruption problems. Of course, it still suffers from the characterset issues that also haunt the "binary record" format, and does require more processing because data is rarely found at fixed locations in the text messages.

The simplest text message format is straight, unformatted text. A request like

"what time is it"

and a response like

"It is 2330 EST"

are easy to generate. They are also hard to interpret, so much logic goes into determining what exactly the message means. Programmers dislike writing extra logic, especially logic that has to manage unpredictable and highly variable things like free-form text, so this sort of message is not often used in standard servers.

A little more "regular" text is preferred. Something structured, but not necessarily too structured. Many client/server systems use simple message structures to reduce the effort of interpretation. Protocols (for that's the term for the agreements that client and server make) often define the number and meaning of the words in a message. The HTTP protocol[5], for instance, dictates that the client will send in a request message that consists of two or more words. The first word will be a "verb" ("GET", "POST", etc.) and the second and subsequent words will depend on the meaning of the "verb".

For example, an HTTP request might consist of

GET http://www.w3.org/pub/WWW/TheProject.html HTTP/1.1

or

DELETE http://www.w3.org/pub/WWW/junk.htm HTTP/1.1

The server response is also structured, with the first word consisting of a version code, the second a 3 digit status code, and the third and subsequent words being a verbal description of the result. An HTTP reply would look something like:

HTTP/1.1 200 OK

or

HTTP/1.1 400 Bad Request - are you sure you know what you are doing?

Some client/server architectures find this semi-freeform structure too inconsistant, and impose even more rigid formatting on the structure, format, and content of their messages. A current favourite in the client/server world is the XML[7] format, which generalizes and codifies the structure and expression of textual information in such a manner that programs have a fairly easy time of reading the data. An XML request might look like

<?xml version="1.0"?>
<request>
<function>ORDER</function>
<dish special="hold onions">hamburger</dish>
<dish>diet cola/<dish>
<dish>french fries</dish>
</request>

and the response might look like

<?xml version="1.0"?>
<reply>
<function>CONFIRM</function>
<bill>
<billnumber>1322</billnumber>
<amount>$12.95</amount>
<tax>$5.45</tax>
<amtdue>$18.40</amtdue>
</bill>
</reply>

XML has become ubiquitous in the "web services" world where it is not only used to transport requests and replies, it is also used to extend server programs by presenting APIs[8] to client programs through techniques like XMLRPC[9]or SOAP[10].

It is worth noting that both ends of the spectrum ("binary record" messages and XMLish messages) suffer from similar problems, in that their format is tightly controlled and easy to damage. The "structured text" formats of the Internet protocols tend to be somewhat more robust, and easier to debug. Debugging is important: when you are trying to solve a problem between your client and server, which is easier to enter or read

<?xml version="1.0"?><rqst><cmd>HALT</cmd></rqst>

or

HALT

?



Getting the message there

It is one thing to define the format of a message, and quite another thing to actually deliver the message. Like message format, clients and servers have to agree on exactly how messages will go from one partner to another. There's no communication if one partner is waiting for a phone call but the other partner sends a letter.

This "talking to each other" is generally known as "Inter Process Communication" or IPC. IPC is just the umbrella term for all the different communication channels that are available to clients and servers.

On Linux systems, there are many different ways for processes to talk to each other. However, many of them require that both client and server reside on the same machine, or have access to the same files. Only a couple of methods permit client and server to live apart, on separate systems.

First off, programs can communicate through files. The client and server agree ahead of time on which files they will use to transfer the request and response messages in, and then each process uses the files as required. They also have to agree beforehand on how the client will identify the response to it's request; this is often done by having the client append an unique identifier to the request that the server will copy into the reply.

In this scenario, the client will open the "request" file, append it's request to the end of the file, and close it again. It then opens the "response" file and starts reading through the responses to locate the one that matches the request. The server, on the other hand, opens and reads through the "request" file, processing requests and writing the results to the "response" file. When it reaches the end of the "request" file, it just waits until there are more requests in the file, and then continues reading and writing.

This strategy also works for Unix "Named Pipes" (aka "fifos"). A named pipe is a type of file that doesn't take up disk space. Instead, the data is held in a smallish buffer in memory, and discarded after it has been read. The filenames for "Named Pipes" are recorded in the filesystem, so both client and server can locate the fifo and read or write it. Named pipes have a disadvantage not shared by true files: there can be only one process reading a named pipe. Thus, the agreement between client and server as to how responses will be identified may be modified a bit, so that instead of adding a unique identifier to the request, the client adds the name of the fifo it expects to read the response from (with each client using a different and unique fifo), and the server opens and writes the response to that fifo.

A variation on Named Pipes that both eliminates the need for an identifier, and permits the programmers to use network APIs is the "Unix domain sockets". These are special files that work like named pipes, but the system (rather than the programs) keeps track of the source of each message and takes care of getting the reply to the right client. Unix domain sockets are again implemented as smallish buffers in kernel memory with filenames in filespace, but instead of being open()ed, etc., they are manipulated with the standard sockets API. In this case, one Unix domain socket file can be used by all clients to communicate with the server, and the server's replies will get back to the proper client without any devious manipulation on the server's part.

Clients and servers can also share parts of their memory, so that a block of server memory is also available to each client. With this "shared memory" approach, a client would move it's request into the shared memory, and just wait. The server would recognize that it's memory had been altered (by the client; remember, this part of memory is shared between both the client and server - they both have access to it) and would process the request. The server would move it's reply into the shared memory region and go on; the client would recognize that it's memory had been altered (by the server) and would process the reply it finds there. Of course, the server would have to make enough memory available so that each client has a place to write its request and read its reply, and both client and server have to use the Unix "shared memory" calls to get access to the messages.

Message queues are similar to Unix domain sockets. Again, these special APIs permit clients to establish unique connections to a server, while causing the system (and not the programs) to manage the proper delivery of messages back and forth.

Both "shared memory" and "Message queues" are part of the Linux System V interprocess communications mechanisms support, and the documentation for their APIs can be found in the ipc(5) manual page ("man 5 ipc").

All of the communications strategies discussed so far work only when both client and server are running on the same system. None of these work when client and server are on different machines. However, Linux supports a comprehensive network stack that, when used by clients and servers permits programs to interact remotely or locally.

Like Unix domain sockets, Internet sockets permit client and server to interact without special "rendesvous" logic. Again, the system (rather than the programs) keeps track of the source of each message and takes care of getting the reply to the right client. Internet sockets (and, for that matter Unix domain sockets) come in two flavours: "record" oriented UDP and "byte stream" oriented TCP. Most Internet servers use TCP, but some use UDP exclusively, and a very few use both TCP and UDP.

TCP sockets maintain a "session" between the two parties. A session is like a telephone call; it is between the two parties only, and lasts until one party disconnects. This makes TCP ideal for long-running conversations between client and server, as the TCP session maintains continuity for the conversation. TCP can transport a potentially unlimited amount of data back and forth without the limits of the network interfering with the delivery.

UDP sockets do not maintain sessions; they simply send and receive blocks of data. UDP can be likened to snail-mail letters; the thing that is delivered contains lots of information, but delivery takes time and is not 100% guaranteed. There is no connection between the two parties; delivery of a letter or a UDP datagram does not guarantee that the recipient will or even can send a reply. UDP is limited to, at most, 64K of information per datagram, and often less, depending on the limits of the network.

Finally, clients and servers can piggyback their protocols on top of other protocols. For instance, web services servers and clients often use HTTP to move their own request and response messages back and forth. Similarly, it is not inconcievable that some client and server could interact using email, which is transported using the SMTP, POP, and IMAP protocols. Finally, many protocols make themselves secure by only talking through channels protected by SSH, using SSH to transport the insecure data back and forth.

The choice of transport protocol determines how the client and server can talk to each other. It puts bounds on where clients can be run, and how big (and how frequent) their messages can be. It also limits (and in some cases, simplifies) the choice of message language that client and server can use.



Session and State

Sometimes, the activities that a server must perform not only depend on what the client has requested now, but also on what the client has requested previously. A server that must accomondate such activity must remember what it is that the client has previously requested, and what the server reply was. Since a server would normally deal with many clients, synchronously, it must have a method to associate any memories it has for a particular client with the interactions that it performs with that client.

The memory of prior activities that such a server keeps is known as the server "state", and the association of those memories to a particular client is known as a "session". Sessions are hard to maintain, as few clients maintain a continious connection to the server. Without a continious, uninterrupted connection, the server has to be inventive as to identifying which client is which, and which memories belong to which clients. Clients also keep memories of their (and the servers) prior activities, but since the client always initiates the "sessions" with the server, it doesn't have to be as inventive as the server in remembering it's state.

To illustrate what we mean by "state", consider how food orders might be handled at a take-out pizza restaurant. A customer calls in his order to the pizzaria

Customer: I'd like a 12" Deluxe with Pineapple and Mushrooms
Pizzaria: OK. That'll be ready in 20 minutes.

and the order is taken.

Now, if the customer calls back to modify the order

Customer: I'd like to change my order - no Mushrooms
Pizzaria: What order?

the pizzaria has no way to connect the second call back to the original order.

The "state" in this case is the customer's order for food. Not only does the restaurant need to process the order, it needs to remember it. And, the restaurant needs some way to tie the order back to a particular customer so that, when the customer calls to check on or change the order, the status or changes can be determined properly.

If our first pizzaria conversation had gone like:

Customer: I'd like a 12" Deluxe with Pineapple and Mushrooms
Pizzaria: OK. That'll be ready in 20 minutes. Your invoice is number 1467.

then the subsequent conversation could have gone like:

Customer: I'd like to change my order on invoice 1467 - no mushrooms
Pizzaria: OK, one 12" Deluxe with Pineapple only.

and the two conversations would have been tied together into one session by the invoice number.

Of course, had the customer originally remained on the phone, the conversation might have sounded like:

Customer: I'd like a 12" Deluxe with Pineapple and Mushrooms
Pizzaria: OK. That'll be ready in 20 minutes. Your invoice is number 1467.
Customer: On second thought, I'd like to change my order - no Mushrooms
Pizzaria: OK, one 12" Deluxe with Pineapple only.

and the conversation would have still existed in one session, tied together by the continuity of the single telephone call.

With the pizzaria, tying a state to a session is fairly simple. For the person still on the phone, the state is the order that you are immediately working on. If the person is calling back, the state is the order referenced by the invoice number.

This brings us to another key part of state management, how the server manages the relationship between client and state. In the example, the server kept all the state information internally, and gave the client (the customer) just a 'key' value (the invoice number). The client gave this key value back to the server when it needed to resume the session, and the server looked up the state (order) associated with that key in order to restore it's state. This technique is known as a "server side" state management (because the state information is kept in the server), and the key (in this case, the invoice #) is known as a "token".

An alternate technique would have the server telling the client everything about the state (all the details of the order, who will cook it, where it will be stored once it is cooked, etc), and ask the client to give all those details back with each call. That sort of conversation might go like

Customer: I'd like a 12" Deluxe with Pineapple and Mushrooms
Pizzaria: OK. That'll be ready in 20 minutes. You ordered a 12" Deluxe Pizza with Pineapple and Mushrooms. Henry will prepare the pizza and cook it in oven #4. Jean will box the pizza in a #3 box, and place it on warming rack 7.

and the subsequent callback would sound like

Customer: I'd like to change my order on the 12" Deluxe Pizza with Pineapple and Mushrooms that Henry is cooking in oven #4 and Jean will box in a #3 box and store on warming rack 7: No mushrooms
Pizzaria: OK, Your ordered one 12" Deluxe with Pineapple only. Henry will prepare the pizza and cook it in oven #3. Jean will box the pizza in a #3 box, and place it on warming rack 2.

This technique is an example of "client side" state management (because the state information is kept in the client), and the state information that the client keeps is known as "cookies"[11]. Yes, "cookies". Like in the cookies that your web server apps give to web browsers so that the browser can pick up from where it left off in the web server application.

You may have observed that a "token" is a special sort of "cookie", in that on the client side, a token is one of that thing that cookies are many of.

State management techniques influence how client/server applications are built and how they work. "Client side" state management allows the server to be dumb, because the client gives it all the information it needs to work with. However, this forces the client to be a bit smart, because if any one piece of that information is wrong, then the server may not be able to properly handle the client request. "Server side" state management allows the client to be dumb (it only has to handle one piece of information, not a lot of information), but the server must now be smart, and have some way of storing and retrieving all the details of the interaction by using the token that the client returned.

Server side servers need permenant data stores. Client side servers need redundant data.


Security Issues

While most servers do not worry about security, some do. Those that do worry about security tend to be the sorts of servers that deal in sensitive information or activities. Servers that permit dynamic update of critical systems or information typically impose some amount of security around their conversations, as do those that provide sensitive information. Most server security issues fall into one of three categories: User security, Data Security, and Activity Audit.

"User Security" refers to the security issues around identifying who the client is, and determining if the server is permitted to perform the function that the client has requested. The term "Authentication" refers to the process of determining who the client is. The term "Authorization" refers to the process of determining if the server is permitted to act on the client's request.

Suppose, after ordering our pizza, we send the neighbours son to pick it up from the pizzaria. The pizzaria first has to determine who this person is, and then determine if they are allowed to give this person your pizza.

There are many different ways that the server can authenticate the client. Frequently, the server and the client share a secret that only belongs to that specific client. The client hands this secret to the server at the beginning of it's request, and the server checks to see who belongs to the secret. If the secret belongs to someone it knows, then the client is presumed to be the someone that the server associates the secret with. Obviously, the more identification that the client can supply, the easier it is for the server to authenticate it. The server will cross-check each piece of identification with its records and with the other pieces of identification, and decide whether the client is who he says he is or not.

Once the server authenticates the client, it then goes on to determine if what sort of authorization the client has. This is normally performed within the internals of the server by checking the clients authentication and request against a list of permissions. Authorizations are ususally expressed in terms of what the client is permitted to do, but really should be thought of in terms of what the server is permitted to do for the client.

For example, I can walk up to a teller in a bank, and the teller can easily determine if I have an account at the bank. He (the teller) has authenticated me (the client) as a particular customer. There is nothing stopping me, as the client, from asking him (as the server) to give me ten million dollars in small non-sequential unmarked bills. What permits or prevents him, the teller from satisfying my request is the authorization. He is not authorized to overdraft my account by nine million plus dollars in order to satisfy my request. Had I been a different customer, he might have had that authority. Of course, the "usual" way to look at this situation is to say that I (the client) did not have the authority to ask for the service, rather than he (the server) did not have the authority to provide the service.

Authentication and authorization become important issues in server design and operation because, if they are included, they can stop an otherwise valid request from being satisfied. The challenge in managing such services is to configure the authentication and authorization rules tight enough to prevent bad things from happening (for suitable values of "bad things"), while still permitting good things to happen.

But, if we are dealing with sensitive data or actions to the point where the server must implement authentication and authorization, then we likely have concerns about the security of the data as it is transferred and used. We want to ensure that the information goes from client to server or server to client without being altered in any way. If they occurred, such alterations would be the results of a third party, and would interfere with the proper execution of the request. "Noise" could jumble our data, causing a perfectly good message to become nothing but digital babel. Or it could cause subtle changes to our message so that it no longer means what we wanted it to mean. Even worse, the alteration could be deliberate, coming from a "black hat", with the specific intention of causing events to happen that we didn't intend to happen. In situations where client and server have concerns about the integrity of the exchanged data, they often include "checksum" or "message digest" data in the messages. These extra components provide a mathematical summary of the contents of the message, and any change to the message contents, deliberate or accidental, will cause a difference between the computed value and the value included in the message. The message recipient first reads through the message and computes it's own version of the "checksum" or "message digest", and then compares that value to the value that accompanied the message. Any difference between those two values will be the result of an outside change to the data, and the receiving party can take the appropriate steps to ignore the message.

But, what if we don't want outsiders to read the message at all? Well, outside of appropriating Harry Potter's Cloak of Invisibility, the best we can do is encrypt the data so that only client and server can read the message. Everyone else (primarily eavesdroppers) should only see digital hash instead of a real message. Encryption secures the contents of the message from prying eyes, and from diagnostic tools, and can be implemented in a number of different ways. Many client/server systems use encryption tools like PGP or GPG[12] to encrypt and decrypt the message data, but more and more systems are moving to encryption of the entire communications channel through security protocols like SSH[13].

Finally, after all is said and done, security doesn't last unless you are eternally vigelant. So, many secure servers also institute some form of activity audit. These audit mechanisms permit the system maintainers to detect security breaches or attempted breaches and repair or remedy the failures. Often, the activity audit mechanism maintains a sort of "before and after" picture of the data to show what changed, and this data capture can often be used to restore the server to it's prior state at any point in time.

Operational Issues

"Operational issues" is a catch-all phrase that refers to "all that other stuff that has to be dealt with" after all the theory is taken care of.

The server needs a mechanism to signal an asynch client when the reply is ready. Similarly, the operating environment needs a mechanism to signal a server when special operations (like a controlled shutdown) is requested. In unix environments, one common way to perform these tasks is through the unix signal mechanism[14]. To make this work, the signalled process publishes it's process ID, and the signaller sends a unix signal to the process when activity is requested. The client must implement a signal handler that will take care of receiving the incoming reply and making it available to the rest of the client logic. The server, on the other hand, must implement it's signal handler so that it flags the termination of the server read/process/write loop.

A common way for the client to publish it's PID is to include it in the contents of the request message that it sends to the server. Servers typically publish their own PIDs in files in the /var/run directory, and it is common to see in shutdown scripts the line

kill -SIGTERM `cat /var/run/$SERVERPID`

Many servers publish a list of signals and their associated activities; SIGHUP may cause one server to terminate, but cause a different server to refresh it's control data.

Servers often log their activities to stderr and to the syslog facility. While stderr may be redirected to file or even discarded (through a redirection to /dev/null), syslog is directed to plain-text files in the /var/log (or /var/adm) directory. These logs usually contain activity and diagnostic information that is invaluable in determining what a server did and where it went wrong.


[1] An "interrupt handler" is a special piece of code that allows the client to receive and handle an asynchronous event. In a later lesson, we will examine the "interrupt handler" approach in a bit more detail.
[2] A process is an independant computer activity, separate from all other computer activities. It may share files with other processes but it does not share memory. One process cannot interfere with another process.
[3] A thread is a computer activity that occurs within a process. Usually each process only has one thread (of execution), but with special programming, you can make it appear that there are more than one thread (of execution) in a process. Such threads share files and memory with other threads within the process. In some operating systems, threads have less overhead than processes do. In Linux threads and processes have the same amount of overhead.
[4] DNS messages are defined by IETF[6] RFC 1035 which can be found at ftp://ftp.rfc-editor.org/in-notes/rfc1035.txt
[5] The HTTP protocol (v1.1) and it's messages are defined by IETF RFC 2616, which can be found at ftp://ftp.rfc-editor.org/in-notes/rfc2616.txt
[6] The "Internet Enginerring Task Force" (or IETF) is the body that publishes and maintains the various standards to which most Internet services are built. These standards are called "RFC"s (for "Request For Comment", the original purpose of the standards documents issued by the IETF), and they can be found at the RFC Editor's webpage at http://www.rfc-editor.org/. When in doubt, these are the reference documents you should use in trying to understand the Internet protocols.
[7] XML or "eXtensible Markup Language" is a text format that wraps around text data in such a way as to logically separate out elements with regular predictable formatting. XML and it's supporting standards is defined by the World Wide Web Consortium (http://www.w3.org/), and the XML standard specifications can be found at http://www.w3.org/TR/xml
[8] An "Application Programming Interface" (or API) is the way a program invokes it's subprograms. It is considered to be a neat trick if you can make a program believe that it has everything it needs locally, through an API, rather than remotely through client/server interactions. So, much effort has gone into disguising client/server interactions as APIs so that the client application doesn't need to know that it /is/ a client of some (possibly remote) server.
[9] XMLRPC is one way of using XML to encapsulate the information presented to an API so that the API can make a "remote procedure call". Think of remote procedure calls as client/server messages that the client doesn't realize it is making. It thinks that it is doing a local API (or "local procedure") call, and the API transforms it. You can find out more about XMLRPC from the XML-RPC Home page at http://www.xmlrpc.com/
[10] Simple Object Access Protocol (or SOAP) is another way to use XML to encapsulate information for remote procedure calls. In this case, the XML is used in an "Object Oriented" way to support "Object Oriented" program code. It's still XML, and it's stil a remote procedure call. The SOAP standard is maintained by the World Wide Web Consortium, and the standard itself can be found at http://www.w3.org/TR/soap/
[11] Cookies are a now-familiar technique used in web services. While the idea is common to many servers, Netscape pioneered the use of cookies in web applications. Their original specifications are no longer hosted at netscape.com, but can be found at http://curl.haxx.se/rfc/cookie_spec.html. The most current specifications for web cookies can be found at http://www.ietf.org/rfc/rfc2965.txt
[12] PGP (Phil Zimmerman's "Pretty Good Privacy") and GPG (the "Gnu Privacy Guard") are both implementations of the IETF proposed standard RFC2440 (http://www.ietf.org/rfc/rfc2440.txt). The OpenPGP Alliance (http://www.openpgp.org/) is a collection of businesses and individuals who develop products or services based on the RFC2440 standard. PGP is a propriatary encryption product that can be obtained from http://www.pgp.com/. GPG is an open source product that can be obtained from http://www.gnupg.org/.
[13] The SSH protocol is documented in several RFCs. Of note, RFC4250 (http://www.ietf.org/rfc/rfc4250.txt) documents the "assigned numbers" and RFC4251 (http://www.ietf.org/rfc/rfc4251.txt) documents the protocol architecture. There are a few commercial SSH implementations available, but the most common implementation seems to be the OpenSSH open source implementation found at http://www.openssh.com/.
[14] Unix signals are documented in the standard manual pages. Look at the signal(7) man page, along with the kill(1), kill(2), and signal(2) man pages for the basic details on Unix signals and signal handlers.


Lesson 3: Some Common Servers - Daemons


This lesson is not ready yet. Please check back later

Lesson 4: More Common Servers - GUI services


This lesson is not ready yet. Please check back later

Lesson 5: Design a simple server


This lesson is not ready yet. Please check back later

Lesson 6: Implement a simple server


This lesson is not ready yet. Please check back later

Lesson 7: Design a simple client


This lesson is not ready yet. Please check back later

Lesson 8: Implement a simple client


This lesson is not ready yet. Please check back later

Lesson 9: Review and Discussion


This lesson is not ready yet. Please check back later

Post new comment