Sunday, 13 December 2009

Apache configuration file layouts

A traditional Apache configuration consists of one file (httpd.conf) that contains all the required configuration directives. However a single file is a problem for  packaging systems where different packages are responsible for different aspects of Apache's operation. For them it's much easier if they can contribute one or more files containing configuration fragments and if these are then incorporated into the Apache configuration using the 'Include' directive. While convenient for the packaging system, this is less convenient for the system administrator who now finds his Apache configuration spread across multiple files in several directories. Here are a two diagrams showing the configuration file layout in two common Linux distributions - Debian (and so Ubuntu), and SLES:

Debian


Note that the mods-enabled and hosts-enabled directories contain symlinks to files actually stored in the parallel mods-available and hosts-available directories,  and that commands, a2enmod, a2dismod, a2ensite and a2dissite, are provided to manipulate these symlinks. Within the mods-{available,enabled} directories, the *.load files contain the Apache configuration directive to load the module in question; the coresponding *.conf files contain configuration directives necessary for the module. httpd.conf is included for backwards compatability and to support installing 3rd party modules directly via apxs2. See the file /etc/apache2/README for more details.

SLES

The files shown in yellow boxes, all of which appear in the /etc/apache2/sysconfig/ directory, are regenerated automatically from information in /etc/sysconfig/apache2/ on Apache startup and so shouldn't be hand edited. See comments at the top of /etc/apache2/httpd.conf for more information.

This diagram is for SLES10 and Apache 2; similar arrangements were used with SLES 9 and Apache 1.3, with 'apache2' replaced by 'httpd' in filenames. In SLES 9 it was necessary to run SuSEconfig to regenerate the files based on sysconfig information.

2010-02-23: Debian diagram amended - the 'master' file was incorrectly labelled httpd.conf and should have been apache2.conf. Apart from anything else, you can't have httpd.conf including itself!

Wednesday, 9 December 2009

Paul Walk's 'Infrastructure service anti-pattern

Following on from Service-to-service communication, I've just seen a excelent blog posting in paul walk's weblog entitled 'An infrastructure service anti-pattern' which makes an excellent case for how machine APIs should be used. Well worth reading.


Monday, 7 December 2009

Service-to-service communication

The University of Cambridge's Raven service works well enough for interactive logins using a web browser, but doesn't (and was never intended to) support non-interactive authentication, or authentication between one service and another, rather than between people and services. Here's a set of suggestions for filling this gap and for supporting general service-to-service communication - I happen to like these today but I'm making no promises that I won't have changed my mind by tomorrow.

For 'proxied' or non-interactive authentication on behalf on individuals I'd recommend OAuth. This is essentially a standardised protocol for establishing a token that grants one service limited, delegated access in a user's name to another service. There's a good example of how it could work in the Beginner’s Guide to OAuth – Part II : Protocol Workflow. OAuth is gaining significant traction in social networking applications.

For service to service communication I'd recommend SSL/TLS using mutual authentication by certificate. Since we are assuming that authentication is required we should also assume that confidentiality is necessary so the protection offered by SSL/TLS seems appropriate.

Certificate trust could just be established bilaterally between pairs of services, but the complexity of this grows with the square of the number of services involved. Better would be to establish an in-house Public Key Infrastructure with a central in-house Certification Authority (CA) that could issue certificates for this purpose. Some difficult policy decisions will be needed about who is allowed to apply for certificates in the name of which services, but once made it should be possible to largely automate the CA by providing a Raven-authenticated web interface for certificate management. Note that these certificates would need to identify 'services', rather than just computers, so the parties to a conversation could for example be the 'CS IP Register Database' and the 'Department of Important Studies Network Management system'. We'd need to sort out a naming convention. An important service provided by the CA would need to be the maintenance of a Certificate Revocation List.

Authorisation I'd leave to the services involved. Both OAuth and certificate authentication establish the 'identity' of a party in a conversation and it should be easy enough to use this identity within whatever authorisation system a service already uses. For example, Lookup could be adapted to allow certificate DNs to appear alongside user identifiers as members of groups used for access control. 

Finally we need to identify protocols by which services can communicate. I suggest something lightweight and vaguely 'REST'ish. Authorities differ on what exactly REST requires, but here I just mean a basic CRUD interface carried over HTTP and mapped onto HTTP primitives PUT, GET, POST, DELETE, etc. Data should probably be serialised using simple XML, though other formats such as JSON are a possibility. Existing XML schema can be used where appropriate, for example the  Atom Syndication Format can be used to represent lists (particularly search results), and the Atom Publishing Protocol is probably worth considering to support the creation and modification of resources (see Wikipedia for an introduction to Atom).

The advantage of this approach is that it provides a lightweight and technology neutral interface using tools (HTTP servers and clients) that are widely available and reasonably well understood. It even allows an amount of  experimentation using nothing but a web browser. It also opens the possibility of in-browser manipulation of data, especially if results are available in JSON. Against this there's the need for an API design for each new service and the requirement for programming work at both the client and server ends. One way  of supporting this is to distribute at least one example client library with each new API. An important selling point for this approach is the fact that it underpins almost all of the current 'cloud' offerings - see the Google Data Protocol, Amazon Web Services, Yahoo Social API, etc.

There are other posibilities for filling the various slots mentioned above - obvious ones being SSH to provide confidentiality and strong mutual authentication, and SOAP to provide interservice communication. I happen to think (today, as mentioned above) that the set I've listed here would currently provide the best solution. Why might be the subject of subsequent posts.