Indistinguishable from magic

All change: New job, new challenges

2017-07-18T13:22:00.003+01:00

I've worked for the University of Cambridge Computing Service and subsequently for University Information Services for almost 17 years. At the end of this month I'm going to step back from my current responsibilities and go and do something almost entirely new.

With the support of UIS's management and at my request I'm about to be seconded to work with Dr Ian Lewis, University Director of Infrastructure Investment, on the Intelligent City Platform that he's developing as part of the University's collaboration with Connecting Cambridgeshire's Smart Cambridge' project and the Greater Cambridge Partnership (formerly The City Deal). This involves a whole group of technologies (remote sensing, mapping, GIS, data handling, etc., etc.) that I've always been interested in but never had the time to work with. I'm looking forward to the new opportunities.

Over the years I've done quite a few things for the Computing Service and UIS, many of which I'm still involved with managing and developing. Plans are afoot for rearranging and reassigning those responsibilities, but at least in the short term this change is going to throw additional work on my already overstretched colleagues. If you are a consumer of any of the things I've previously been responsible for then please cut my colleagues a little slack while these reorganisations complete. One advantage of the agreed secondment arrangement is that for the short term I'll still be available to provide answers to all those questions that I've somehow failed to answer in the documentation.

Onward and upwards...

x509 certificate chaining

2017-06-26T10:16:00.002+01:00

Our x509 certificate supplier recently change the root and intermediate certificates needed to use them (without warning, which was unhelpful). Sorting this out caused me to have to re-learn how certificate chaining is supposed to work.

As far as I can see, the primary rule is that the 'Issuer' DN of one certificate must match the 'Subject' DN of a certificate corresponding to the key that signed the first certificate.As an optimisation or hint, certificates can contain a 'Authority Key Identifier' which should match the 'Subject Key Identifier' of a certificate corresponding to the key that signed the first certificate. Quite what a key identifier is is not defined though there are suggestions. In most cases it's some sort of digest of the corresponding public key.

If you really want to know how certificate paths are built and validated then see RFC 4158 and RFC 5280.

Updating SAML Service Provider keys for Shibboleth IdsP

2017-04-10T10:12:00.000+01:00

SAML Service Providers (SPs) maintain one or more key pairs for use when interacting with Identity Providers (IdPs). The public part of these key pairs are distributed to IdP via SAML metadata or otherwise.

SP keys are used for two purposes:

The SP uses them to sign authentication requests
IdP use them to encrypt attribute assertions in responses for decryption by the SP (encryption is optional, but commonly enabled)

Keys need to be replaced occasionally. This can be difficult to achieve without service disruption because the public parts are distributed and it's impractical to update all copies simultaneously. For a Shibboleth IdP, coping with replacement of an SP key used for signing isn't problematic -- a new key can be added to the SP's metadata ahead of deployment and the IdP will be happy with signatures made with either. However once two keys appear in the metadata the IdP may use either to encrypt assertions and this will only work if the SP is happy to perform decryption with either key.

For Shibboleth SPs this isn't a problem, because they can be configured to accept multiple keys for decrypting assertions while using a nominated key for creating signatures. There's a well known procedure for rotating the key on such a SP, see for example documentation on the UK federation site and in the Shiboleth Consortium Wiki. This amounts to:

Create new key pair on the SP
Configure the SP to accept either key for decryption but not use the new one for signing (by adding the new one with use="encryption")
Add the new key to the SPs metadata for both use="signing" and use="encryption" (or just without a 'use' attribute)
Wait for the metadata to propagate, or directly update all IdPs
Switch the SP to use the new key for signatures (by removing use="encryption" from the new one and adding it to the old one)
Completely remove the old key from the SPs metadata
Wait for the metadata to propagate, or directly update all IdPs
Remove the old key from the SP configuration

Some SPs don't support this. However it is only encryption that causes a problem. The main reason for encrypting assertions is to prevent users seeing what the IdP is saying about them, but in man cases there's nothing in the assertions that the user can't see by other means. So temporarily disabling encryption while rolling a SP key is a possibility. This means that the key can be rolled without downtime with the following revised procedure:

Setup custom relying party on the IdP for the SP that suppresses encryption. Something like this:

<rp:RelyingParty id="<SP entityID>"
   provider="<IdP entityID>"
   defaultSigningCredentialRef="IdPCredential" >

   <rp:ProfileConfiguration xsi:type="saml:SAML2SSOProfile" 
       encryptAssertions="never" 
       encryptNameIds="never" />

 </rp:RelyingParty>

Add the new key to the SP's metadata
Wait for the metadata to propagate, or just update all IdPs.
Switch the SP to use the new key
Remove old key from metadata
Wait for the metadata to propagate, or just update all IdPs.
Remove the custom relying party configuration

Note that this requires coordinated work on every IdP with which the SP works so it's not practical on a large scale. It will however work where there's essentially a one-to-one SP -> IdP relationship, as in many SaaS scenarios.

Tools such as the Firefox SAML tracer, the Chrome SAML Dev Tool extension and the SAML Chrome panel are really helpful for checking that each step of the process has completed before moving in to the next.

Mitigating recent TLS vulnerabilities

2015-07-09T17:10:00.000+01:00

Recently discovered vulnerabilities in TLS (also known as SSL -- the protocol implementing secure web browsing and securing other activities such as email) can and should be mitigated by appropriate server configurations. Existing defaults and previously-recommended configurations may require attention to address these issues. While these vulnerabilities have been addressed in recent versions of major clients, not everyone runs up-to-date versions and not all access is from major clients.

Vulnerabilities addressed by this advice include 'POODLE' (CVE-2014-3566), 'Freak' (CVE-2015-0204), and 'LogJam' (CVE-2015-4000).

What represents a 'best' configuration depends on the capabilities of the servers involved, and of their expected clients. The best security can only be obtained on up-to-date software and only with configurations that may exclude some older clients. The following advice will provide a reasonable level of security but should be reviewed in the light of specific requirements.

The following advice is intended to be generic; specific configuration advice for some platforms appears below.

Suggestions (in order of importance):

1) Ensure that the SSLv2 and SSLv3 versions of the protocol are disabled.

Note that IE6 on Windows XP will be unable to communicate with servers that don't support SSLv3, but given its age this should be acceptable -- many major services already disable SSLv2 and SSLv3.

2) Adjust the cryptographic suites supported to exclude the following:

all 'export' suites
any using symmetric encryption with keys less that 128 bits
any using signatures based on the MD5 hash algorithm
any using symmetric encryption based on the RC4 algorithm

3) Configure Diffie-Hellman (DH) key exchange to use at least 2048-bit groups. Additionally generate a unique 2048-bit group for use in Diffie-Hellman key exchange on each server. As an alternative, it may be appropriate to disable all cryptographic suites that rely on Diffie-Hellman key exchange. Plan to upgrade systems that can't be appropriately configured.

Note that this advice does not apply to Elliptic-Curve Diffie-Hellman key exchange (ECDH) which does not currently have any known vulnerabilities. Note also that Java 1.6 and 1.7 clients may be unable to communicate with servers offering Diffie-Hellman key exchange using groups over 1024-bits long.

4) Support TLSv1.2 - plan to upgrade any systems that can't do so.

One way to test your configuration is to use the SSL Labs server test page. Aim to eliminate any issues flagged 'VULNERABLE' or shown in red, and to reduce or eliminate any marked 'WEEAK' or shown in orange. It should be possible to achieve an overall rating of at least 'B' and preferably 'A' but don't be guided entirely by the overall rating shown. The 'Handshake Simulation' section of the report can be helpful when evaluating the impact of any configuration change on clients.

Implementations:

Apache:

Add the following directives to httpd.conf or equivalent and ensure that they are not being overridden elsewhere:

SSLProtocol all -SSLv2 -SSLv3
SSLCipherSuite ALL:!aNULL:!eNULL:!LOW:!EXP:!MD5:!RC4

TLSv1.2 is automatically available on systems running OpenSSL 1.0.1 or above but not otherwise - plan to upgrade systems that include only lower versions

By default Apache 2.2 only supports a fixed 1024-bit Diffie-Hellman group - plan to upgrade it. Apache from 2.4, and patched versions of 2.2 in some Linux distributions, support longer fixed groups. Unique groups can be created with the command

openssl dhparam -out dhparams.pem 2048

and loaded into patched versions of Apache 2.2, and into Apache 2.4 with the directive

SSLOpenSSLConfCmd DHParameters dhparams.pem

Restart Apache after making these changes.

IIS:

See elsewhere for instructions on disabling SSLv2 and SSLv3.

To set cryptographic suites:

Open the Group Policy Object Editor (i.e. run gpedit.msc in the command prompt).
Expand Computer Configuration --> Administrative Templates --> Network --> SSL Configuration Settings.
In the right pane, open the SSL Cipher Suite Order setting.
A reasonable cipher suite list (from Bulletproof SSL and TLS, Ch 15) would be:

TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256_P256
TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384_P384
TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA_P256
TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA_P256
TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256_P256
TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384_P384
TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA_P256
TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA_P256
TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256
TLS_DHE_RSA_WITH_AES_128_GCM_SHA256
TLS_DHE_RSA_WITH_AES_256_GCM_SHA384

If this excludes some old but necessary clients then consider adding:

TLS_RSA_WITH_AES_128_CBC_SHA
TLS_RSA_WITH_AES_256_CBC_SHA
TLS_RSA_WITH_3DES_EDE_CBC_SHA
TLS_RSA_WITH_RC4_128_SHA

Reboot after making these changes

nginx

The Nginx project have published instructions on how to disable SSLv3 on Nginx.

To configure cipher suites place the following in the website configuration server block in /etc/nginx/sites-enabled/default (see the LogJam pages):

ssl_ciphers 'ALL:!aNULL:!eNULL:!LOW:!EXP:!MD5:!RC4';

Custom Diffie-Helman groups can be created with the command:

openssl dhparam -out dhparams.pem 2048

and loaded into nginx with the following configuration:

ssl_dhparam dhparams.pem;

Tomcat

See elsewhere for instructions on disabling SSLv2 and SSLv3.

Configuring cipher suites (see Bulletproof SSL and TLS):

* With the APR/Native connector

Set the 'SSLCipherSuite' attribute of the 'Connector' XML element in your $TOMCAT_HOME/conf/server.xml file:

SSLCipherSuite = "ALL:!aNULL:!eNULL:!LOW:!EXP:!MD5:!RC4

* With the JSSE connector

Set the 'ciphers' attribute of the 'Connector' XML element in your $TOMCAT_HOME/conf/server.xml file:

ciphers = "TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,
TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,
TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA,
TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA,
TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,
TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384,
TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,
TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,
TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256,
TLS_DHE_RSA_WITH_AES_128_GCM_SHA256,
TLS_DHE_RSA_WITH_AES_256_GCM_SHA384,
TLS_RSA_WITH_AES_128_CBC_SHA,
SSL_RSA_WITH_3DES_EDE_CBC_SHA"

Information Systems - what we do

2015-01-31T10:13:00.002+00:00

Building on yesterday's list of technologies use by the Information Systems team here at the University, here are some of the services that we run. Much of this will probably only make sense to people inside the University.

Forum
The authentication wrapper for Google Apps @ Cambridge
Managed Web Server
Managed Wiki Service
Raven web authentication
TLS Certificate Scheme
University Map
Web Search (spidering back-end, the search.cam interface is run by others)
Webtools Web Server (hosting miscellaneous UIS infrastructure)

In addition, we contribute to a number of services run by other parts of UIS:

CamCERT
Falcon CMS
Git Repository Hosting Service
Network Traffic Analysis (for Traffic Charging and intrusion detection by CERT)
UIS News Service

So now you know.

An Information Systems colophon

2015-01-30T16:44:00.000+00:00

I've followed the work of Government Digital Service (GDS - the people behind GOV.UK) for a while. It seems to me that an organisation dedicated to “leading the digital transformation of government” probably knows a thing or two that's relevant to the digital transformation of a university.

GDS have a blog, and their list of technologies that they use is interesting. Work I've been doing recently means that I've created a similar list of the technologies used within my Information Services team here at the University. For what its worth this what it looks like:

Core servers

Most services run on virtual machines provided by an internal VMware cluster
Most run under Linux - SuSE Enterprise, Ubuntu, or Debian
High-availability clusters use Corosync & Pacemaker
Servers are built using our internal BES build system
We have one legacy service using Solaris

Base software products

Some services are based on existing software, including phpBB and Mediawiki
The Web Search service is provided by Funnelback

Technologies

We rely on, and in some cases also support, a range of technologies:

Ucam WebAuth, Shibboleth, SAML for web authentication
LDAP and the University's Lookup service (and its API) for authorisation
TLS, PKI and X509 certificates for web security
OpenStreetMap and its API for geographic information
AbFab, EAP, PAM, and RADUIS as part of investigations into Moonshot
Google APIs for Google Apps integration
Netflow for network traffic analysis

Applications (incl. Frameworks, etc)

Our current favorite development environment is Python and Django
Other systems use C, Perl (with HTML::Mason), Java, JavaScript, PHP, Ruby (with Rails)
System administration relies heavily on shell (normally Bash)
Our ‘Managed CMS’ service is based on Plone
Web Sites run under Apache and Tomcat

Database and other storage

MySQL
PostgreSQL with Slony for replication

Monitoring, managing and alerting

Services are monitored with Nagios
Collectd collects statistics and Graphite and Grafana visualise them
Servers are increasingly built and manage with Ansible
We are moving towards Git (with an internal Gitolite service) for version control, but also use (at least) SVN, CVS, RCS and SCCS!

Supporting Tools

Central RT system for support tickets
Trac and Mantis for bug tracking
We've been experimenting with LeanKit and Asana for program and project management

Why I think php is a bad idea

2013-04-27T10:52:00.001+01:00

Update: a friend reminds me of http://me.veekun.com/blog/2012/04/09/php-a-fractal-of-bad-design/ which covers the same topic from a different angle.

Getting your fonts from the cloud

2013-01-31T11:13:00.000+00:00

The University of Cambridge's latest web style, due for deployment RSN, uses Adobe Myriad Pro for some of its headings. This is loaded as a web font from Adobe's TypeKit service. As I understand it this is the only legal way to use Adobe Myriad Pro since Adobe don't allow self-hosting.

Typekit is apparently implemented on a high-availability Content Delivery Network (though even that isn't perfect - see for example here), but the question remains of what the effect will be if it can't be reached. Obviously the font won't be available, but we have web-safe fall-backs available. The real question is what sort of delay might we see under these circumstances. Ironically, one group who are particularly exposed to this risk are University users since at the moment we only have one connection to JANET, and so to the Internet and all the TypeKit servers.

TypeKit fonts are loaded by loading a JavaScript library in the head of each document and then calling a initialisation function:

<script type="text/javascript" src="//use.typekit.com/<licence token>.js"></script>
<script type="text/javascript">try{Typekit.load();}catch(e){}</script>

Web browsers block while loading JavaScript like this, so if use.typekit.com can't be reached then page loading will be delayed until the attempt times out. How long will this be?

Some experiments suggest it's very varied, and varies between operating systems, browsers, and types of network connection. At best, loss of access to TypeKit results in an additional 3 or 4 second delay in page loading (this is actually too small, see correction below). At worst this delay can be a minute or two. iOS devices, for example, seem to consistently see an additional 75 second delay. These delays apply to every page load since browsers don't seem to cache the failure.

Users are going to interpret this as somewhere between the web site hosting the pages going slowly and the web site being down. It does mean that for many local users, loss of access to TypeKit will cause them to loose usable access to any local pages in the new style.

Of course similar considerations apply to any 'externally' hosted JavaScript. One common example is the code to implement Google Analytics. However in this case its typically loaded at the bottom of each page and so shouldn't delay page rendering. This isn't an option for a font unless you can cope with the page initially rendering in the wrong font and then re-rendering subsequently.

I also have a minor concern about loading third-party JavaScript. Such JavaScript can in effect do whatever it wants with your page. In particular it can monitor form entry and steal authentication tokens

such as cookies. I'm not for one moment suggesting that Adobe would deliberately do such things, but we don' know much about how this JavaScript is managed and delivered to us so it's hard to evaluate the risk we might be exposed to. In view of this it's likely that at least the login pages for our central authentication system (Raven) may not be able to use Myriad Pro.

Update: colleagues have noticed a problem with my testing methodology which means that some of my tests will have been overly-optimistic about the delays imposed. It now appears that at best, loss of access to TypeKit results in an additional 20-30 seconds delay in page loading. That's a long time waiting for a page.

Further update: another colleague has pointed out that TypeKit's suggested solution to this problem is to load the JavaScript asynchronously. This has the advantage of allowing you to control the time-out process and decide when to give up and use fall-back fonts, but has the drawback that it requires custom CSS to hide the flash of unstyled text that can occur while fonts are loading.

Restricting web access based on physical location

2013-01-27T23:18:00.000+00:00

Occasionally people want to restrict access to a web-based resource based not on who is accessing it but on where they are located when they do so. This is normally to comply with some sort of copyright licence. In UK education this is, more often that not, something to do with the educational recording licences offered by ERA (but see update below).

Unfortunately this is difficult to do, and close to impossible to do reliably. This often puzzles people, given that the ERA licences expect it and that things like BBC iPlayer are well known to be already doing it. It's a long story...

Because of the way the Internet works it's currently impossible to know, reliably, where the person making a request is physically located. It is however possible to guess, but you need to understand the limitations of this guessing process before relying on it. Whether this guessing process is good enough for any particular purpose is something only people using it can decide.

A common approach is based on Internet Protocol (IP) addresses. When someone requests something from a web server, one of the bits of information that the server sees is the IP address of the computer from which the request came (much as your telephone can tell you the number of the person calling you). In many cases this will be address assigned to the computer the person making the request is sitting at. IP addresses are generally assigned on a geographic basis and lists exist of what addresses are used where, so it is in principle possible to ask the question 'Did my server receive this request from a machine in the UK', or even '...in my institution'.

But there are catches:

It's possible to route requests through multiple computers, in which case the server only see the address of the last one. This often happens without the user knowing about it (for example most home broadband set-ups route all connections through the house's broadband router, mobile networks route requests through central proxies, etc.), but it can also be done deliberately. Like many organisations, the University provides a Virtual Private Network service explicitly so that requests made from anywhere in the world can appear to be coming from a computer inside the University.

The lists saying which addresses are used where are inevitably inaccurate. From example a multi-national company might have a block of addresses allocated to its US headquarters but, unknown to anyone outside the company, actually use some of them for its UK offices. Connections from people in the UK office would then appear to be from the US.

So, the bottom line is that you can come close to knowing where connections are coming from, but it's nothing like 100% reliable. People will, by accident or design, be able to access content when they shouldn't, and some people won't be able to gain access when they should. Organisations (such as MaxMind) provide or sell lists which can, for example, provide a best-guess of which country an IP address is allocate to. Organisations will know what addresses their networks use - the network addresses used on the University network (and so by the majority of computing devices in the University) are described here. Though beware that increasingly people are using mobile devices connected by mobile data services such as 3G that may well appear to be 'outside' their institution even when they are physically inside it.

Another tempting approach is that modern web browsers, especially those on devices with GPSs such as mobile phones, can be asked to supply the user's location. This is used, for example, to put 'you are here' markers on maps. You might think that this information could be used to implement geographic restrictions. However the fundamental problem with this is that it's under the user's control, so in the end they can simply make their browser lie. Further it's often inaccurate or may not be available (for example in a desktop browser) so all in all this probably isn't a usable solution.

If you can setup authentication such that you can identify all your users then it seems to me that one approach would simply be to impose terms and conditions that prohibit them from accessing content when not physically in the UK, or wherever. You could back this up by warning them if IP address recognition or geo-location suggests that they are outside the relevant area. It seems to me (but IANAL) that this might be sufficient to meet contractual obligations (or at last to provide a defence after failing), but obviously I can't advise on any particular case.

Update July 2014: it appears that the ERA licence has changed recently in line with changes to UK copyright legislation to better support distance learning. This probably reduces the relevance of ERA to the whole geolocation question, but obviously doesn't affect the underlying technical issues.

Doing RSS right (3) - character encoding

2012-11-05T08:24:00.000+00:00

OK, I promise I'll shut up about RSS after this posting (and my previous two).

This posting is about one final problem in including text from RSS feeds, or Atom feeds, or almost anything else, into web pages. The problem is that text is made up of characters and that 'characters' are an abstraction that computers don't understand. What computers ship around (across the Internet, in files on disk, etc.) while we are thinking about characters are really numbers. To convert between a sequence of numbers and a sequence of characters you need some sort of encoding, and the problem is that there are lots of these and they are all different. In theory if you don't know the encoding you can't do anything with number-encoded text. However most of the common encodings use the numbers from the ASCII encoding for common letters and others symbols. So in practice lot of English and European text will come out right-ish even if it's being decoded based on the wrong encoding.

But once you move away from the characters in ASCII (A-Z, a-z, 0-9, and a selection of other common ones) to the slightly more 'esoteric' ones -- pound sign, curly open and close quotation marks, long typographic dashes, almost any common character with an accent, and any character from a non-European alphabet -- then all bets are off. We've all seen web pages with strange question mark characters (like this �) or boxes where quotation marks should be, or with funny sequences of characters (often starting Â) all over them. These are both classic symptoms of character encoding confusion. It turns out there's a word to describe this effect: 'Mojibake'.

Now I'm not going to go into detail here about what the various encodings look like, how you work with them, how you can convert from one to another, etc. That's a huge topic, and in any case the details will vary depending on which platform you are using. There's what I think is a good description of some of this at the start of chapter 4 of 'Dive into Python3' (and this applies even if you are not using Python). But if you don't like this there are lots of other similar resources out there. What I do want to get across is that if you take a sequence of numbers repenting characters from one document and insert those numbers unchanged into another document then that's only going to work reliably if the encodings of the two documents are identical. There's a good chance that doing this wrong may appear to work as long as you restrict yourself to the ASCII characters, but sooner or later you will hit something that doesn't work.

What you need to do to get this right is to convert the numbers from the source document into characters according to the encoding of your source document, and then convert those characters back into numbers based on the encoding of your target. Actually doing this is left as an exercise for the reader.

If your target document is an HTML one then there's an alternative approach. In HTML (and XML come to that) you can represent almost any character using a numeric character entity based on the Universal Character Set from Unicode. If you always represent anything not in ASCII this way then the representation of you document will only contain ASCII characters, and these come out the same in most common encodings. So if someone ends up interpreting your text using the wrong encoding (and that someone could be you if, for example, you edit you document with an editor that gets character encoding wrong) there's a good chance it won't get corrupted. You should still clearly label such documents with a suitable character encoding. This is partly because (as explained above) it is, at least in theory, impossible to decode a text document without this information, but also because doing so helps to defend against some other problems that I might describe in a future posting.

Doing RSS right (2) - including content

2012-10-26T15:13:00.000+01:00

In addition to the issues I described in 'Doing RSS right', there's another problem with RSS feeds, though at least this one doesn't apply to Atom.

The problem is that there's nothing in RSS to say if the various blocks of text are allowed to contain markup, and if so which. Apparently (see here):

"Userland's RSS reader—generally considered as the reference implementation—did not originally filter out HTML markup from feeds. As a result, publishers began placing HTML markup into the titles and descriptions of items in their RSS feeds. This behavior has become expected of readers, to the point of becoming a de facto standard"

This isn't just difficult, it's unresolvable. If you find

<strong>Boo!</strong>

in feed data you simply can't know if the author intended it as an example of HTML markup, in which case you should escape the brackets before including them in your page, or as 'Boo!', in which case you probably expected to include the data as it stands.

And if you are expected to include the data as it stands you have the added problem that including HTML authored by third parties in your pages is dangerous. If they get their HTML wrong they could wreck the layout of you page (think missing close tag) and, worse, they could inject JavaScript into your pages or open you up to cross site scripting attacks by others. As I wrote here and here, if you let other people add any content to your pages then you are essentially giving them editing rights to the entire page, and perhaps the entire site.

However, given how things are and unless you know from agreements or documentation that a feed will only ever contain text then you are going to have to assume that the content includes HTML. Stripping out all the tags would be fairly easy, but probably isn't going to be useful because it will turn the text into nonsense - think of a post that includes a list.

The only safe way to deal with this is to parse the content and then only allow that subset of HTML tags and/or attributes that you believe to be safe. Don't fall for the trap of trying to filter out only what you consider to be dangerous because that's almost impossible to get right, and don't let all attributes through because they can be dangerous too - consider <a href="javascript:...">.

What should you let through? Well, that's hard to say. Most of the in-line elements, like <b>, <strong>, <a> (carefully), etc. will probably be needed. Also at least some block level stuff - <p>, <div>, <ul>, <ol>, etc. And note that you will have to think carefully about the character encoding both of the RSS feed and the page you are substituting it into, otherwise you might not realise that +ADw-script+AD4- could be dangerous (hint: take a look at UTF7)

If at all possible I'd try to avoid doing this yourself and use a reputable library for the purpose. Selecting such a library is left as an exercise for the reader.

See also Doing RSS right (3) - character encoding.

Doing RSS right - retrieving content

2012-10-26T11:59:00.000+01:00

Feeds, usually RSS but sometimes Atom or other formats, are a convenient way of including syndicated content into web pages - indeed the last 'S' of 'RSS' stands for 'syndication' in one of the two possible ways of expanding the acronym.

The obvious way to include the content of a feed in a dynamically-generated web page (such as the 'News' box on the University's current home page) is to include in the code that generates the page something that retrieves the page's feed data, parses it, and then marks it up and includes it in the page.

But this obvious approach comes with some drawbacks. Firstly the process of retrieving and parsing the feed may be slow and may be resource intensive. Doing this on every page load may slow down page rendering and will increase the load on the web server doing the work - it's easy to forget that multiple page renderings can easily run in parallel if several people look at the same page at about the same time.

Secondly, fetching the feed on every page load could also throw an excessive load on the server providing the feed - this is at least impolite and could trigger some sort of throttling or blacklisting behaviour.

And thirdly there's the problem of what happens if the source of the feed becomes unreachable? Unless it's very carefully written the retrieval code will probably hang, waiting for the feed to arrive, probably preventing the entire page from rendering and giving the impression that you site is down, or at least very slow. And even if the fetching code can quickly detect that the feed really isn't going to be available (and doing that is harder than it sounds), what do you then display in your news box (or equivalent)?

A better solution is to separate out the fetching part of the process from the page rendering part. Get a background process (a cron job, say, or a long ruining background thread) to periodically fetch the feed and cache it somewhere local, say in a file, in a database, or in memory for real speed. While it's doing this it it might as well check the feed for validity and only replace the cached copy if it passes. This process can use standard HTTP mechanisms to check for changes in the feed and so only transfer it when actually needed - it's likely to need to remember the feeds last modification timestamp from every fetch to make this work.

That way, once you've retrieved it once you'll always have something to display even if the feed becomes unavailable or the content you retrieve is corrupt. It would be a good idea to alert someone if this situation persists, otherwise the failure might go un-noticed, but don't do so immediately or on every failure since it seems common for some feeds to be at least temporally unavailable. Since the fetching job is parsing the feed it could store the parsed result in some easily digestible format to further reduce the cost of rendering the content into the relevant pages.

Of course this, like most caching strategies, has the drawback that there will now be a delay between the feed updating and the change appearing on your pages - in some circumstances the originators of feeds seem very keen that any changes are visible immediately. In practice, as long as they know what's going on they seem happy to accept a short delay. There's also the danger that you will be fetching (or at least checking) a feed that no longer used or very rarely viewed. Automatically keeping statistics on how often a particular feed is actually included in page would allow you to tune the fetching process (automatically or manually) to do the right thing.

If you can't do this, perhaps because you are stuck with a content management system that insists on doing things it's way, then one option might be to arrange to fetch all feeds via a local caching proxy. That way the network connections being made for each page view will be local and should succeed. Suitable configuration of the cache should let you avoid hitting the origin server too often, and you may even be able to get it to continue to serve stale content if the origin server becomes unavailable for a period of time.

See also Doing RSS right (2) - including content and Doing RSS right (3) - character encodings.

Cookies and Google Analytics

2012-07-12T17:46:00.000+01:00

Recent changes to the law as it relates to the use of web site cookies has focused attention on Google Analytics. If by some freak chance you haven't met Analytics, it's a free tool provided by Google that lets web site managers analyse in depth how their site is being used. It can do lots that simple log file analysis can't, and many web site managers swear by it.

Analytics uses lots of cookies, and there's quite a lot of confusion about it. In the UK, the Information Commissioner has been quite clear that cookies used for this sort of purpose don't fall under any of the exemptions in the new rules (see the final question in 'Your questions answered'):

"The Regulations do not distinguish between cookies used for analytical activities and those used for other purposes. We do not consider analytical cookies fall within the ‘strictly necessary’ exception criteria. This means in theory websites need to tell people about analytical cookies and gain their consent."

However he goes on to say:

"In practice we would expect you to provide clear information to users about analytical cookies and take what steps you can to seek their agreement. This is likely to involve making the argument to show users why these cookies are useful."

and then says:

"Although the Information Commissioner cannot completely exclude the possibility of formal action in any area, it is highly unlikely that priority for any formal action would be given to focusing on uses of cookies where there is a low level of intrusiveness and risk of harm to individuals. Provided clear information is given about their activities we are highly unlikely to prioritise first party cookies used only for analytical purposes in any consideration of regulatory action."

which looks a bit like a 'Get out of jail free' card (or 'Stay out of jail' card) for the use of at least some analytics cookies. The recent Article 29 Data Protection Working Party Opinion on Cookie Consent Exemption seems to have come to much the same conclusion (see section 4.3). They even suggest:

"...should article 5.3 of the Directive 2002/58/EC be re-visited in the future, the European legislator might appropriately add a third exemption criterion to consent for cookies that are strictly limited to first party anonymized and aggregated statistical purposes."

Which is fine, but there's that reference to 'first party cookies' in both sets of guidance, and the reference to "a low level of intrusiveness and risk of harm to individuals"

Now that should be OK, because Google Analytics really does use first party cookies - they are set by JavaScript that you include in your own pages with a scope that means their data is only returned to your own web site (or perhaps sites, but still yours).

But there's a catch. The information from those cookies still gets sent to Google - it rather has to be, because otherwise there's no way Google can create all the useful reports that web managers like so much. But if they are first party cookies, how does that happen?

Well, if you watch carefully you'll notice that when to load a page that includes Google Analytics you browser requests a file called __utm.gif from a server at www.google-analytics.com. And attached to this request are a whole load of parameters that, as far as I can tell, largely include information out of those Google Analytics cookies. __utm.gif is a one pixel image, as typically used to implement web bugs. And the ICO is clear that:

"The Regulations apply to cookies and also to similar technologies for storing information. This could include, for example, Local Shared Objects (commonly referred to as “Flash Cookies”), web beacons or bugs (including transparent or clear gifs)." (emphases mine).

So while the cookies themselves may be first party, the system as a whole seems to me to be more like something that's third party. And third party using persistent cookies into the bargain (some of the Analytics ones have a 2 year lifetime), and one that gets my IP address on every request.

But it's not all bad. There's some suggestion that Google do understand this and are committing not to be all that evil. For example here they explain that they use IP addresses for geolocation, and that "Google Analytics does not report the actual IP address information to Google Analytics customers" (though I note they don't mention what they might do with it themselves). They also say that "Website owners who use Google Analytics have control over what data they allow Google to use. They can decide if they want Google to use this data or not by using the Google Analytics Data Sharing Options." (though the subsequent link seems to be broken - this looks like a possible replacement).

Further, the Google Analytics Terms of Service have a section on 'Privacy' that requires (section 8.1) anyone using Analytics to to tell their visitors that:

"Google will use [cookie and IP address] information for the purpose of evaluating your use of the website, compiling reports on website activity for website operators and providing other services relating to website activity and internet usage. Google may also transfer this information to third parties where required to do so by law, or where such third parties process the information on Google's behalf. Google will not associate your IP address with any other data held by Google."

which seems fairly clear (or as clear as anything you ever find in this area).

So what do I think? My current, entirely personal view is that Google Analytics is probably OK at the moment, providing you are very clear that you are using it. It might also be a good idea to make sure you've disabled as much data sharing as possible. But I do wonder if the ICO's view might change in the future if he ever looks too closely at what's going on (or if someone foolishly describes it in a blog post...), so it might be an idea to have a plan 'B'. This might involve a locally-hosted analyics solution, or falling back to 'old fashioned' log file analysis. Both of these could still probably be supplemented by cookies, but they still wouldn't be exempt so you'd still need to get consent somehow. But this should be easier if they were truly 'first party' cookies and the data in them wasn't being shipped off to someone else. Trouble is, most good solutions in this area cost significant money. There is, as they say, no free lunch.

Cookies - what the EU actually did

2012-06-27T17:35:00.004+01:00

In an earlier posting I managed to work out what had changes in the relevant UK law to implement the changes to how we all use cookies that we all know and love. At the time I didn't know how to track down the changes to the relevant EU directives that precipitated all this.

Well, now I think I do - thanks mainly to the references at the beginning of a recent Article 29 Data Protection Working Party Opinion on Cookie Consent Exemption which itself is well worth a read (here's Andrew Cormack's summary). For your delight and delectation, here's what I think the changes are - all in Article 5.3 of Directive 2002/58/EC as amended by Directive 2009/136/EC:

3. Member States shall ensure that the ~~use of electronic~~ ~~communications networks to store information or to gain access to~~ ~~information stored~~storing of information, or the gaining of access to information already stored in the terminal equipment of a subscriber or user is only allowed on condition that the subscriber or user concerned ~~is provided~~has given his or her consent, having been provided with clear and comprehensive information in accordance with Directive 95/46/EC, inter alia about the purposes of the processing~~, and isoffered the right to refuse such processing by the data~~ ~~controller~~. This shall not prevent any technical storage or access for the sole purpose of carrying out or ~~facilitating~~ the transmission of a communication over an electronic communications network, or as strictly necessary in order ~~to provide~~for the provider of an information society service explicitly requested by the subscriber or user to provide the service.

So there you have it.

SSL/TLS Deployment Best Practices

2012-03-01T12:30:00.000+00:00

What looks to me like a useful collection of SSL/TLS Deployment Best Practices from Qualys SSL Labs:

https://www.ssllabs.com/projects/best-practices/

Two sorts of authetnication

2012-02-02T13:28:00.000+00:00

It occurs to me that, from a user's perspective, every authentication sits somewhere between the following two extremes:

Authentications where it's strongly in the user's interest not to disclose their authentication credentials, but doing so has little impact on the corresponding service provider. For example I'm probably going to be careful about my credentials for electronic banking (because I don't want you to get my money) and for Facebook (because I don't want you to start saying things to my friends that appear to come from me).
Authentications where it's mainly in the service provider's interest that the user doesn't disclose their authentication credentials but it's of little importance to the user. For example authentication to gain access to institution-subscribed electronic journals, or credentials giving access to personal subscription services such as Spottify. In neither case is giving away my credentials to third parties likely to much immediate impact for me.

This is obviously a problem for service providers in case two, because it significantly undermines any confidence they can have in any authentications, and may undermine their business model if it's based on number of unique users. There's not much you can do technically to address this, other than using non-copyable, non-forgeable credentials (which are few and far between and typically expensive). It is of course traditional to address this with contracts and rules and regulations, but neither work well when the chance of being found out is low and the consequence small.

More interesting is what happens when you use the same credentials (SSO or single password, for example) for a range of services that sit in different places in this continuum. I suspect that there is a strong possibility, human nature being what it is, that people will make credential-sharing decisions based on the properties of an individual services and without really considering that they are actually sharing access to everything.

[I'd note in parsing a New Your Times article (Young, in Love and Sharing Everything, Including a Password) that suggests that young people will sometimes share passwords as a way of demonstrating devotion. I expect this is true too]

O2 changing web page content on the fly?

2012-01-25T15:26:00.002+00:00

We recently noticed an oddity in the way some of our web pages were appearing when viewed via some 3G providers, including at least O2. The pages in question includes something like this in the head section:



which should have the effect of including the ie6.css stylesheet when the document is viewed from IE6 but not otherwise.

When accessed over 3G on some phones, something expands the @import by replacing it with the content of the ie6.css style sheet before the browser sees it:



which, while a bit braindead, would be OK if it were not for the fact that the CSS file happens to contain (within entirely legal CSS comments) an example of an HTML end-comment:


*/
...more literal CSS Statements...
</style>
<![endif]-->

When parsed, this causes chaos. The first  inside the CSS then closes that comment, leaving WebKit's HTML parser in the middle of a stack of CSS definitions. It does the only thing it can do and renders the CSS as part of the document.

I don't know what is messing with the @import statements. I suspect some sort of proxy inside O2's network, perhaps trying to optimise things and save my phone from having to make an extra TCP connection to retrieve the @included file. If so its failing spectacularly since its inlining a large pile of CSS that my phone would never actually retrieve.

You can see this effect in action by browsing to http://mnementh.csi.cam.ac.uk/atimport/. You should just see the word 'Test' on a red background, but for me over O2 I get some extra stuff that is the result of their messing around with my HTML.

[There's also the issue that O2 seem to have recently been silently supplying the mobile phone number of each device in the HTTP headers of each request it makes, but that's a separate issue: http://conversation.which.co.uk/technology/o2-sharing-mobile-phone-number-network-provider/]

Computing Advent Calendars

2011-12-19T16:39:00.001+00:00

I have always been a sucker for Advent Calendars on computing topics. I find them a convenient way to catch up on developments that I would have otherwise missed, and they come out at a time when, with luck, things are a bit quieter so there's time to actually read them.

This year I've been enjoying SysAdvent. In a previous life when I did a lot of Perl development I was a great fan of the Perl Advent Calendar but I haven't read it in several years. Earlier today I came across PHP Advent which I didn't previously know about. For hard core web geeks there's always 24Ways. I'm sure there are lots more. For what it's worth I read these, and lots of other stuff, via RSS or equivalent feeds in my feed reader of choice (which at the moment happens to be Google Reader).

Most of these have been running for a while and the previous editions are also available (for some you may have to make the obvious edits to the URLs to see earlier versions). But beware - just as this year's postings are often topical and cutting edge, ones from the past can easily turn out to have been overtaken by events or simply no longer sufficiently fashionable.

However you celebrate it, I'd like to wish you a happy Christmas and a productive new year.

Cookies - progress at last?

2011-12-19T12:14:00.001+00:00

It looks as if there may at last be some progress on the vexed issue of how you can continue to use cookies (and similar) in the light of the new(ish) Privacy and Electronic Communications Regulations. The ICO has published new guidance on all this, and various commentators seem to think it looks promising:

There is however some concern that the proposed approach to dealing with analytics cookies seems to be limited to a hint that the ICO's office just won't look too closely. There's worry that some public sector organisations will still feel that they need to ensure total compliance rather than just take a risk management approach:

A crack in the cookie craziness?

Federated (and so SAML, and so Shibboleth) account management

2011-11-27T18:33:00.000+00:00

The Problem

In an earlier post (User-ids in a Shibboleth world) I described the problem of deriving a 'userid', as required by much third-party software, from typical attributes supplied in a SAML federated authentication (particularly in the UK federation, but probably equally elsewhere in Europe). This is actually part of the wider problem of account management in a federated environment (as touched on in Federated AUTH is less than half the battle), and this posting is an attempt to pull together some thoughts on the subject.

The context of this is a need to support web applications (for example MediaWiki, Plone, or Wordpress) which need to identify local users (members of the University of Cambridge) but which also need to support authenticated access from people outside the University (collaborators in other institutions, sponsors, representatives of funding bodies, etc.). Many of these have access to SAML identities in the UK federation (or elsewhere), and all three applications claim to have 'Shibboleth support', but in practice this support doesn't come anywhere near doing what we need. As mentioned in User-ids in a Shibboleth world, the primary problem seems to be an assumption that eduPersonPrincipalName (or something equivalent) will always be available. We can (and do) make this available from our own IdP to our own SPs, but we can't expect it from elsewhere, and negotiating its release on a case-by-case basis is impracticable and in any case we don't actually need it.

I asked about this issue on both the jisc-shibboleth@jiscmail.ac.uk and shibboleth-users@internet2.edu mailing lists. I had rather assumed this would be a well known problem with obvious solutions, but the only significant response was from Scott Cantor who said:

"I think they're written for "least effort" and to avoid fixing applications. That's not a strategy that will work long term."

The Options

So, what might work in the long term?

I think you first have to accept that this sort of federated environment differs from a typical single institution environment in a least two significant ways:

You will get a unique identifier for each user, but you must assume that it's totally opaque and that it's not suitable for display. Further, you have to assume that users will not be able to tell you in advance what this identifier is or is going to be.

Lots of people will be able to authenticate who you don't want to have anything to do with. So you can't rely on the 'If they can authenticate they are part of the organisation' approximation that works in other situations.

I see two approaches: Account Linking, and Account Creation:

Account Linking

Under this approach you start by provisioning ordinary, local accounts complete with locally-unique (and displayable!) user names and passwords and by transmitting the username/password information to the intended user in any appropriate fashion. But once the intended user has authenticated with this information you allow them to link this account with a federated identity which they can subsequently use to authenticate in place of the local password. You could even regard the initial password as just a bootstraping device and expire it after a few uses, or after time, or after a federated authentication has completed.

Anyone trying to authenticate using an unrecognised federated identity would get an error page refering them to whoever deals with account management. If accounts need non-default attributes (particular editing rights, administrator privilege, etc) then these can either be set manually by administrators or they could be added following a successful federated authentication with particular attribute sets.

Depending on the application you are protecting you might choose to restrict which federated identities you allow to be linked, based either on who asserts them and/or on attributes supplied with the authentication (i.e. you might require that a 'staff' resource can only be linked to an identity that includes an eduPersonScopedAffiliation value including 'staff'). You might also allow more than one identity to be linked to a single account. It all rather depends who has the strongest interest in the account being used only by the right person (you as application operator, or the user).

Account Creation

This is to some extent the process envisaged in the latter sections of Federated AUTH is less than half the battle. However software that needs a local 'user name' will have to create one that meets its own requirements when first creating a local account, and it can't assume that anything suitable will be available in the attribute values. This means that the 'Dynamic login' approach is going to be problematic since there probably won't be anywhere to store the association between this user name and the corresponding federated identity.

As mentioned in the other post, some rules will be needed, presumably based on attributes supplied, to decide who can have an account at all. Anyone trying to authenticate using a identity that doesn't match these rules would again get an error page refering them to whoever deals with account management. Enhanced account privilege can be automatically granted based on attribute values or, once the user has authenticated for the first time (but probably not before), by manual intervention by administrators and by reference to the locally generated user name..

Hybrid approach

There are almost always exceptions to rules. To allow for this you might want to adopt a hybrid approach. This might uses the Account Creation approach for the majority of potential users, because it totally avoids 'one more password' and needs little on-going user administration. But for the inevitable cases where you need to allow access by someone who doesn't match the standard account creation rules you could fall back to the Account Linking approach. Indeed, once you allow local accounts you have a tool that can if you wish use to allow access to people who don't themselves have access to suitable federated identities.

All this is quite a lot more work than just lifting a string out of something like a REMOTE_USER environment variable and convincing you application to treat this as evidence of authentication. But to get the best from federated authentication it's what we are probably going to have to do, though in time and with luck we may find that someone has already done it for us...

So how are we getting on with cookies?

2011-11-24T10:00:00.001+00:00

Remember all that fuss earlier in the year about the new cookie rules? Well, it's six months since the ICO's office gave us all a year's grace during which they didn't expect to actually be enforcing the new regulations. And how are you all getting on with becoming compliant? Worked out how to legally continue to use Google Analytics yet? Scheduled the redevelopments needed for compliance?

While things may be quite now, I confidently expect another flurry of publicity, and difficult questions, in the spring by which time it's going to be too late. Here's one recent blog posting on the subject:

http://www.storm-consultancy.com/blog/misc-web-stuff/the-cookie-monster/

301 Moved Permanently (to a new office)

2011-08-10T10:10:00.001+01:00

After 10 years and 11 months sitting at the same desk in the Computing Service I've finally ended up moving office (to P24 if you know your way around the UCS, otherwise I'm still as unfindable as I've always been). This is part of a plan to make space for a new member of the Information Systems Team who starts in a couple of weeks, and fulfils my ambition to get the entire team at least on the same floor, if not in the same office. But it's still a bit of a wrench and I'm worried that, if truth be told, all my best ideas may have been stolen from my former officemates...

The space left by my departure:

...and my new home:

IT Support from 300 ft

2011-07-06T08:25:00.002+01:00

At the (Cambridge) College IT Managers' 2009 Conference at the Imperial War Museum, Duxford I was lucky enough to win a flight in a tiger moth in the prize draw. I've been a bit slow getting around to taking this up, but finally got around to it last weekend. I got to fly from Duxford up to Madingley and back, and even got to fly the plane a bit myself (no doubt with the pilot's hands not far from his set of controls). Initial impressions: Tiger Moths are really small and feel as if they are made out of Mecarno, you can't see where you are going, and in the air they feel as if they are blowing around like a piece of paper! Flying the plane feels a bit like sailing, but in three dimensions rather than two (and, at least last Sunday, with less water).

Here's some photographic proof of the event. Thank you CITMG!

IPv6 Day in Cambridge - Success and Non-event!

2011-06-18T08:42:00.000+01:00

IPv6 day (8th June) in the University was largely a non-event, and so can be declared a success! This seems to match experiences reported elsewhere.

We are not aware of any significant problems experienced by University users in accessing any services on the day, including external ones known to be participating in IPv6 day such as Google and Facebook. The core of the University network is connected to the global v6 internet, but most distribution networks in departments and colleges only connect using v4 at the moment. Those networks that are connected to the global v6 internet (including the one connecting my desktop workstation) worked fine on the day as expected.

In the run-up to the day we enabled v6 on a number of central services, including the main University web server, the Streaming Media Service (both the web interface and the HTTP download service), the 'new' interface to our search engine, the University Training Booking System, and the central mail service (SMTP, POP, IMAP). On the day we published AAAA records for these services alongside the normal A records from about 08:30 to 19:00 BST.

With the exception of the web server, all these services were enabled more or less as they would be for a v6 production service, though a few features (such as automatic v6 address transition between cluster members, and adapting log analysis to recognise v6 addresses) were not completed in time. The web server used a seperate Apache reverse proxy to provide v6 connectivity to avoid having to disturb its configuration. While doing this, and subsequently, we identified various issues and surprises that I've already mentioned (here, here, and here).

The University web server received 8,981 requests from 280 distinct clients over v6. By comparison it received a total of 1,257,012 requests over both protocols for the entire 24 hour period, meaning that v6 requests probably represented about 1.5% of the total. The breakdown of 8,351 native v6 requests from 230 clients by approximate country of origin appears in the table below.

What was interesting was the relatively high number of clients (50) making requests over transitional 6to4 connections (630). Most of these (36 clients making 476 requests) were from inside the University. Most or all of these clients will have had perfectly good native v4 connectivity to www.cam and this confirms (if confirmation were needed) that rather a lot of systems prefer IPv6, even if provided by a transition technology such as 6to4, over IPv4. Interestingly we didn't see any Teredo traffc.

6to4 caused the only significant incident of the day, when a department mail server switched to using IPv6 over a 6to4 route being advertised by a user workstation elsewhere on the department subnet. This mailserver sends all its outgoing mail via the University's central internal mail switch, but that won't accept messages from machines with 6to4 addresses because it doesn't see them as 'inside'. The problem was quickly fixed, but it seems clear that, ironically, problems caused by 6to4 and Toredo 'transitional' connectivity may represent a significant barrier to further IPv6 roll-out.

Native IPv6 requests to http://www.cam.ac.uk/ on IPv6 Day, by approximate country of origin

[Here 'UCS STAFF' represents clients on the Computing Service staff network, 'UNIVERSIY' represents those elsewhere in the University, 'JANET' those elsewhere on JANET, and 'United Kingdom' those elsewhere in the UK].

2619 UCS STAFF
1373 China
1290 Brazil
   835 JANET
   630 UNIVERSITY
   420 United Kingdom
   293 United States
   171 Greece
   123 France
   110 Czech Republic
    97 Russian Federation
    81 Germany
    66 Japan
    48 Portugal
    47 Netherlands
    36 Finland
    33 Canada
    33 Serbia
    17 Spain
     7 Switzerland
     6 Ireland
     5 Saudi Arabia
     3 Hong Kong
     2 Italy
     2 Korea, Republic of
     2 Norway
     1 Australia
     1 New Zealand
Geolocation provided by Maxmind's free GeoLite IPv6 Country database. "This product includes GeoLite data created by MaxMind, available from http://www.maxmind.com/."

More IPv6 gotchas

2011-06-13T09:34:00.001+01:00

Our participation in IPv6 day (which I might get around to writing up one day) has lead me to identify three more 'gotchas' relating to IPv6 deployment:

IPv6 tunnels come up outside the wire

As predicted in advance, and born out by our experience on the day, it's clear that lots of clients will use transitional IPv6 connectivity (6to4 or Teredo) even when contacting services also available over native IPv4. Worse, some machines with 6to4 connectivity will advertise themselves as IPv6 routers and other machines on the same subnet will use their connectivity in preference to native IPv4.

In addition to the obvious problem that this transitional connectivity may be broken, or blocked, or massively sub-optimal, there the additional unexpected (to me) problem that machines doing this will be using 6to4 or Teredo IP addresses (2002::/16 or 2001:0000::/32 respectively) and so will appear to be outside you local network even if they are actually inside. This has serious implications for continued attempts to do access control by IP address.

Both addressing schemes actually embed local IPv4 addresses in the v6 addresses they use so you could - perhaps - choose to recognise these. But if you do you'll be in the interesting position of having 'internal' traffic coming into your network from the outside!

Fragmentation

IPv6 doesn't support packet fragmentation by routers, but does require that a sender reduces its packet size and retransmits in response to an ICMP6 type 2 'Packet too big' message. If this mechanism fails, perhaps because ICMP packets are being blocked but also for any other reason, you may find for example that users can connect to a web site but not get any content back.

This is because the initial connection establishment and HTTP GET request all use small packets but everything goes wrong the moment the web server starts sending full packets containing the data requested. Unhelpfully, web server access logs may look fine when this happens, with the only hint of problems being that too few bytes may have been transmitted (though given a big enough TCP window and a small enough document even this may not be obvious).

Old software

Even though IPv6 has been around for a while, support for it is still missing or broken in a lot of software (especially if you use 'stable' or 'Long Term Support' Linux distributions whose versions will inevitably be somewhat less that 'bleeding edge').

For example even though the SLAPD LDAP daemon supports IPv6, my colleagues failed to find a way to get the version included in SLES 10 to support both v4 and v6 at the same time, though it was happy to do one or the other. In addition, this version didn't seem to support IPv6 addresses in its access control list syntax.

I also had a problem geolocating the IPv6 clients that accessed our web server. The geolocation database I normal use (the free GeoLite Country and friends from Maxmind) does support IPv6, and the version of their C API supplied with the current Ubuntu LTS (10.04 Lucid Lynx) is just new enough (1.4.6) to cope. But the versions of the Perl and Python bindings needed to process IPv6 both need 1.4.7 of the C API, and since the library is used by quite a lot of Ubuntu utilities upgrading it isn't trivial. In the end I had to build a private version of the C API and the Perl and Python bindings but that was one more bit of work I wasn't expecting.