Indistinguishable from magic: 2011

Monday 19 December 2011

Computing Advent Calendars

I have always been a sucker for Advent Calendars on computing topics. I find them a convenient way to catch up on developments that I would have otherwise missed, and they come out at a time when, with luck, things are a bit quieter so there's time to actually read them.

This year I've been enjoying SysAdvent. In a previous life when I did a lot of Perl development I was a great fan of the Perl Advent Calendar but I haven't read it in several years. Earlier today I came across PHP Advent which I didn't previously know about. For hard core web geeks there's always 24Ways. I'm sure there are lots more. For what it's worth I read these, and lots of other stuff, via RSS or equivalent feeds in my feed reader of choice (which at the moment happens to be Google Reader).

Most of these have been running for a while and the previous editions are also available (for some you may have to make the obvious edits to the URLs to see earlier versions). But beware - just as this year's postings are often topical and cutting edge, ones from the past can easily turn out to have been overtaken by events or simply no longer sufficiently fashionable.

However you celebrate it, I'd like to wish you a happy Christmas and a productive new year.

Cookies - progress at last?

It looks as if there may at last be some progress on the vexed issue of how you can continue to use cookies (and similar) in the light of the new(ish) Privacy and Electronic Communications Regulations. The ICO has published new guidance on all this, and various commentators seem to think it looks promising:

There is however some concern that the proposed approach to dealing with analytics cookies seems to be limited to a hint that the ICO's office just won't look too closely. There's worry that some public sector organisations will still feel that they need to ensure total compliance rather than just take a risk management approach:

A crack in the cookie craziness?

Sunday 27 November 2011

Federated (and so SAML, and so Shibboleth) account management

The Problem

In an earlier post (User-ids in a Shibboleth world) I described the problem of deriving a 'userid', as required by much third-party software, from typical attributes supplied in a SAML federated authentication (particularly in the UK federation, but probably equally elsewhere in Europe). This is actually part of the wider problem of account management in a federated environment (as touched on in Federated AUTH is less than half the battle), and this posting is an attempt to pull together some thoughts on the subject.

The context of this is a need to support web applications (for example MediaWiki, Plone, or Wordpress) which need to identify local users (members of the University of Cambridge) but which also need to support authenticated access from people outside the University (collaborators in other institutions, sponsors, representatives of funding bodies, etc.). Many of these have access to SAML identities in the UK federation (or elsewhere), and all three applications claim to have 'Shibboleth support', but in practice this support doesn't come anywhere near doing what we need. As mentioned in User-ids in a Shibboleth world, the primary problem seems to be an assumption that eduPersonPrincipalName (or something equivalent) will always be available. We can (and do) make this available from our own IdP to our own SPs, but we can't expect it from elsewhere, and negotiating its release on a case-by-case basis is impracticable and in any case we don't actually need it.

I asked about this issue on both the jisc-shibboleth@jiscmail.ac.uk and shibboleth-users@internet2.edu mailing lists. I had rather assumed this would be a well known problem with obvious solutions, but the only significant response was from Scott Cantor who said:

"I think they're written for "least effort" and to avoid fixing applications. That's not a strategy that will work long term."

The Options

So, what might work in the long term?

I think you first have to accept that this sort of federated environment differs from a typical single institution environment in a least two significant ways:

You will get a unique identifier for each user, but you must assume that it's totally opaque and that it's not suitable for display. Further, you have to assume that users will not be able to tell you in advance what this identifier is or is going to be.

Lots of people will be able to authenticate who you don't want to have anything to do with. So you can't rely on the 'If they can authenticate they are part of the organisation' approximation that works in other situations.

I see two approaches: Account Linking, and Account Creation:

Account Linking

Under this approach you start by provisioning ordinary, local accounts complete with locally-unique (and displayable!) user names and passwords and by transmitting the username/password information to the intended user in any appropriate fashion. But once the intended user has authenticated with this information you allow them to link this account with a federated identity which they can subsequently use to authenticate in place of the local password. You could even regard the initial password as just a bootstraping device and expire it after a few uses, or after time, or after a federated authentication has completed.

Anyone trying to authenticate using an unrecognised federated identity would get an error page refering them to whoever deals with account management. If accounts need non-default attributes (particular editing rights, administrator privilege, etc) then these can either be set manually by administrators or they could be added following a successful federated authentication with particular attribute sets.

Depending on the application you are protecting you might choose to restrict which federated identities you allow to be linked, based either on who asserts them and/or on attributes supplied with the authentication (i.e. you might require that a 'staff' resource can only be linked to an identity that includes an eduPersonScopedAffiliation value including 'staff'). You might also allow more than one identity to be linked to a single account. It all rather depends who has the strongest interest in the account being used only by the right person (you as application operator, or the user).

Account Creation

This is to some extent the process envisaged in the latter sections of Federated AUTH is less than half the battle. However software that needs a local 'user name' will have to create one that meets its own requirements when first creating a local account, and it can't assume that anything suitable will be available in the attribute values. This means that the 'Dynamic login' approach is going to be problematic since there probably won't be anywhere to store the association between this user name and the corresponding federated identity.

As mentioned in the other post, some rules will be needed, presumably based on attributes supplied, to decide who can have an account at all. Anyone trying to authenticate using a identity that doesn't match these rules would again get an error page refering them to whoever deals with account management. Enhanced account privilege can be automatically granted based on attribute values or, once the user has authenticated for the first time (but probably not before), by manual intervention by administrators and by reference to the locally generated user name..

Hybrid approach

There are almost always exceptions to rules. To allow for this you might want to adopt a hybrid approach. This might uses the Account Creation approach for the majority of potential users, because it totally avoids 'one more password' and needs little on-going user administration. But for the inevitable cases where you need to allow access by someone who doesn't match the standard account creation rules you could fall back to the Account Linking approach. Indeed, once you allow local accounts you have a tool that can if you wish use to allow access to people who don't themselves have access to suitable federated identities.

All this is quite a lot more work than just lifting a string out of something like a REMOTE_USER environment variable and convincing you application to treat this as evidence of authentication. But to get the best from federated authentication it's what we are probably going to have to do, though in time and with luck we may find that someone has already done it for us...

Thursday 24 November 2011

So how are we getting on with cookies?

Remember all that fuss earlier in the year about the new cookie rules? Well, it's six months since the ICO's office gave us all a year's grace during which they didn't expect to actually be enforcing the new regulations. And how are you all getting on with becoming compliant? Worked out how to legally continue to use Google Analytics yet? Scheduled the redevelopments needed for compliance?

While things may be quite now, I confidently expect another flurry of publicity, and difficult questions, in the spring by which time it's going to be too late. Here's one recent blog posting on the subject:

http://www.storm-consultancy.com/blog/misc-web-stuff/the-cookie-monster/

Wednesday 10 August 2011

301 Moved Permanently (to a new office)

After 10 years and 11 months sitting at the same desk in the Computing Service I've finally ended up moving office (to P24 if you know your way around the UCS, otherwise I'm still as unfindable as I've always been). This is part of a plan to make space for a new member of the Information Systems Team who starts in a couple of weeks, and fulfils my ambition to get the entire team at least on the same floor, if not in the same office. But it's still a bit of a wrench and I'm worried that, if truth be told, all my best ideas may have been stolen from my former officemates...

The space left by my departure:

...and my new home:

Wednesday 6 July 2011

IT Support from 300 ft

At the (Cambridge) College IT Managers' 2009 Conference at the Imperial War Museum, Duxford I was lucky enough to win a flight in a tiger moth in the prize draw. I've been a bit slow getting around to taking this up, but finally got around to it last weekend. I got to fly from Duxford up to Madingley and back, and even got to fly the plane a bit myself (no doubt with the pilot's hands not far from his set of controls). Initial impressions: Tiger Moths are really small and feel as if they are made out of Mecarno, you can't see where you are going, and in the air they feel as if they are blowing around like a piece of paper! Flying the plane feels a bit like sailing, but in three dimensions rather than two (and, at least last Sunday, with less water).

Here's some photographic proof of the event. Thank you CITMG!

Saturday 18 June 2011

IPv6 Day in Cambridge - Success and Non-event!

IPv6 day (8th June) in the University was largely a non-event, and so can be declared a success! This seems to match experiences reported elsewhere.

We are not aware of any significant problems experienced by University users in accessing any services on the day, including external ones known to be participating in IPv6 day such as Google and Facebook. The core of the University network is connected to the global v6 internet, but most distribution networks in departments and colleges only connect using v4 at the moment. Those networks that are connected to the global v6 internet (including the one connecting my desktop workstation) worked fine on the day as expected.

In the run-up to the day we enabled v6 on a number of central services, including the main University web server, the Streaming Media Service (both the web interface and the HTTP download service), the 'new' interface to our search engine, the University Training Booking System, and the central mail service (SMTP, POP, IMAP). On the day we published AAAA records for these services alongside the normal A records from about 08:30 to 19:00 BST.

With the exception of the web server, all these services were enabled more or less as they would be for a v6 production service, though a few features (such as automatic v6 address transition between cluster members, and adapting log analysis to recognise v6 addresses) were not completed in time. The web server used a seperate Apache reverse proxy to provide v6 connectivity to avoid having to disturb its configuration. While doing this, and subsequently, we identified various issues and surprises that I've already mentioned (here, here, and here).

The University web server received 8,981 requests from 280 distinct clients over v6. By comparison it received a total of 1,257,012 requests over both protocols for the entire 24 hour period, meaning that v6 requests probably represented about 1.5% of the total. The breakdown of 8,351 native v6 requests from 230 clients by approximate country of origin appears in the table below.

What was interesting was the relatively high number of clients (50) making requests over transitional 6to4 connections (630). Most of these (36 clients making 476 requests) were from inside the University. Most or all of these clients will have had perfectly good native v4 connectivity to www.cam and this confirms (if confirmation were needed) that rather a lot of systems prefer IPv6, even if provided by a transition technology such as 6to4, over IPv4. Interestingly we didn't see any Teredo traffc.

6to4 caused the only significant incident of the day, when a department mail server switched to using IPv6 over a 6to4 route being advertised by a user workstation elsewhere on the department subnet. This mailserver sends all its outgoing mail via the University's central internal mail switch, but that won't accept messages from machines with 6to4 addresses because it doesn't see them as 'inside'. The problem was quickly fixed, but it seems clear that, ironically, problems caused by 6to4 and Toredo 'transitional' connectivity may represent a significant barrier to further IPv6 roll-out.

Native IPv6 requests to http://www.cam.ac.uk/ on IPv6 Day, by approximate country of origin

[Here 'UCS STAFF' represents clients on the Computing Service staff network, 'UNIVERSIY' represents those elsewhere in the University, 'JANET' those elsewhere on JANET, and 'United Kingdom' those elsewhere in the UK].

2619 UCS STAFF
1373 China
1290 Brazil
   835 JANET
   630 UNIVERSITY
   420 United Kingdom
   293 United States
   171 Greece
   123 France
   110 Czech Republic
    97 Russian Federation
    81 Germany
    66 Japan
    48 Portugal
    47 Netherlands
    36 Finland
    33 Canada
    33 Serbia
    17 Spain
     7 Switzerland
     6 Ireland
     5 Saudi Arabia
     3 Hong Kong
     2 Italy
     2 Korea, Republic of
     2 Norway
     1 Australia
     1 New Zealand
Geolocation provided by Maxmind's free GeoLite IPv6 Country database. "This product includes GeoLite data created by MaxMind, available from http://www.maxmind.com/."

Monday 13 June 2011

More IPv6 gotchas

Our participation in IPv6 day (which I might get around to writing up one day) has lead me to identify three more 'gotchas' relating to IPv6 deployment:

IPv6 tunnels come up outside the wire

As predicted in advance, and born out by our experience on the day, it's clear that lots of clients will use transitional IPv6 connectivity (6to4 or Teredo) even when contacting services also available over native IPv4. Worse, some machines with 6to4 connectivity will advertise themselves as IPv6 routers and other machines on the same subnet will use their connectivity in preference to native IPv4.

In addition to the obvious problem that this transitional connectivity may be broken, or blocked, or massively sub-optimal, there the additional unexpected (to me) problem that machines doing this will be using 6to4 or Teredo IP addresses (2002::/16 or 2001:0000::/32 respectively) and so will appear to be outside you local network even if they are actually inside. This has serious implications for continued attempts to do access control by IP address.

Both addressing schemes actually embed local IPv4 addresses in the v6 addresses they use so you could - perhaps - choose to recognise these. But if you do you'll be in the interesting position of having 'internal' traffic coming into your network from the outside!

Fragmentation

IPv6 doesn't support packet fragmentation by routers, but does require that a sender reduces its packet size and retransmits in response to an ICMP6 type 2 'Packet too big' message. If this mechanism fails, perhaps because ICMP packets are being blocked but also for any other reason, you may find for example that users can connect to a web site but not get any content back.

This is because the initial connection establishment and HTTP GET request all use small packets but everything goes wrong the moment the web server starts sending full packets containing the data requested. Unhelpfully, web server access logs may look fine when this happens, with the only hint of problems being that too few bytes may have been transmitted (though given a big enough TCP window and a small enough document even this may not be obvious).

Old software

Even though IPv6 has been around for a while, support for it is still missing or broken in a lot of software (especially if you use 'stable' or 'Long Term Support' Linux distributions whose versions will inevitably be somewhat less that 'bleeding edge').

For example even though the SLAPD LDAP daemon supports IPv6, my colleagues failed to find a way to get the version included in SLES 10 to support both v4 and v6 at the same time, though it was happy to do one or the other. In addition, this version didn't seem to support IPv6 addresses in its access control list syntax.

I also had a problem geolocating the IPv6 clients that accessed our web server. The geolocation database I normal use (the free GeoLite Country and friends from Maxmind) does support IPv6, and the version of their C API supplied with the current Ubuntu LTS (10.04 Lucid Lynx) is just new enough (1.4.6) to cope. But the versions of the Perl and Python bindings needed to process IPv6 both need 1.4.7 of the C API, and since the library is used by quite a lot of Ubuntu utilities upgrading it isn't trivial. In the end I had to build a private version of the C API and the Perl and Python bindings but that was one more bit of work I wasn't expecting.

Saturday 4 June 2011

IPv6 day - more problems than expected?

A couple of posts on the JANET Development Eye blog:

together with links from them to some useful pages on the ARIN wiki suggest that rather more people may experience problems on IPv6 day than I had perhaps previously expected. The main problem, ironically, seems to be the widespread deployment by default in many OSs and networks of workarounds intended to provide access to IPv6-only resources from machines with only v4 connectivity. The problem is that these workarounds are often broken, or blocked, or massively sub-optimal, but that applications may still try to use these in preference to v4 even when accessing dual-stack services.

Really worrying is that measurements by Google suggest that many University networks, with their 'light-touch' approach to regulating network-connected devices, may be badly affected by all this. I suppose we will see on Wednesday!

Thursday 26 May 2011

More cooking with cookies

Sorry, this blog isn't intended to be all about cookies. I'm not even very interested in them but this whole EU regulation thing just keeps growing and this is as good a place to keep notes about it as anywhere.

The Department for Culture, Media and Sport has published an open letter on the subject:

http://www.dcms.gov.uk/images/publications/cookies_open_letter.pdf

This seems to be trying to make some interesting assertions, including the idea that consent doesn't always need to be obtained in advance (based on the subtle difference of some instances being preceded by "prior" and some not). It does seem to say that whatever you do they don't think that relying on current browser cookie controls is enough.

The other interesting development is the addition of a banner to the ICO's web site that tells you about its cookie use and allows you to opt into more cookies (and incidentally get rid of the banner):

http://www.ico.gov.uk/

(Hint: if you want to go back to the default state after you've accepted their cookies then delete the cookie called ICOCookiesAccepted, or all cookies from www.ico.gov.uk).

I think they are implementing this in their CMS and so serving different versions of their pages to different people, so an identical approach won't work on a purely static site. Note BTW that this means they have to disable caching which may have performance and load implications. I have an idea that a pure JavaScript approach might be possible for a static site, but my JavaScript isn't up to it! I'm rather hoping that Google might provide a canned solution along these lines for Analytics.

Tuesday 24 May 2011

Further thoughts on cookies (or lack thereof)

The amendments to The Privacy and Electronic Communications (EC Directive) Regulations 2003 as they affect the use of cookies have now been published in The Privacy and Electronic Communications (EC Directive) (Amendment) Regulations 2011. What's surprising (to me) is how little has actually changed, though obviously a small change to legislation can clearly have a wide effect. What also fooled me is that if you don't read the ICO's guidance carefully (as I didn't) you might (as I did) think the change was bigger than it was. For example I thought paragraph (4) was new, and it isn't. In fact it hasn't changed at all.

One interpretation of all this is that many/most sites may already fail to comply with the 'old' regulations. Were you really giving subscribers "the opportunity to refuse the storage of or access to that information"? ... because if you were it should surely be trivial to turn the tests around and get consent instead. You could argue that all that's happened now is that, by requiring consent, the new regulations have made contraventions more obvious.

For your delight, here is paragraph 6 from the original regulations with (if I've got it right) the new amendments applied to it (removals struck through, new text underlined):

Confidentiality of communications

6.—(1) Subject to paragraph (4), a person shall not ~~use an electronic communications network to store information, or to~~ store or gain access to information stored, in the terminal equipment of a subscriber or user unless the requirements of paragraph (2) are met.

(2) The requirements are that the subscriber or user of that terminal equipment—

(a) is provided with clear and comprehensive information about the purposes of the storage of, or access to, that information; and

~~(b) is given the opportunity to refuse the storage of or access to that information~~

(b) has given his or her consent

(3) Where an electronic communications network is used by the same person to store or access information in the terminal equipment of a subscriber or user on more than one occasion, it is sufficient for the purposes of this regulation that the requirements of paragraph (2) are met in respect of the initial use.

(3A) For the purposes of paragraph (2), consent may be signified by a subscriber who amends or sets controls on the internet browser which the subscriber uses or by using another application or programme to signify consent.

(4) Paragraph (1) shall not apply to the technical storage of, or access to, information—

(a) for the sole purpose of carrying out ~~or facilitating~~ the transmission of a communication over an electronic communications network; or

(b) where such storage or access is strictly necessary for the provision of an information society service requested by the subscriber or user.

Not all that different, is it?

Thursday 19 May 2011

IPv6 Gotchas, and 8th June 2011

IPv6 has been 'imminent' for a very long time. However it's shortly going to have a bit of a boost. 8th June, 2011 is 'World IPv6 day' when various big players (Google, Facebook, Yahoo!, Akamai, ...) will enable IPv6 access to their services alongside the existing v4 access as a world-wide test flight for a wider deployment. In Cambridge, some of the services run by the University Computing Service will be participating.

For almost everyone this will make no difference at all - if, like most people, you only have have access to IPv4 then you will just go on using that as you always have. Likewise if you have working IPv6 you should see no difference even though you'll be using it rather more than normal. But if you have broken IPv6 support you are likely to have problems on the day. If you want to check, the test pages at http://test-ipv6.com/ and http://omgipv6day.com/ seem to be useful, and the University's My IP page will at least tell you if have IPv6 connectivity with the University.

It's possible to think of IPv6 as just IPv4 with longer addresses, but there are enough differences and gotchas to make life difficult for both clients and servers. Here are some that we've identified:

Auto-configuration. IPv6 natively supports auto-configuration, so if you connect most (recent) computers to an Ethernet that includes an IPv6 router the computer will acquire an IPv6 interface complete with address and default route without you having to do anything. A computer with an active IPv6 interface will normally try to use IPv6 when talking to a service that advertises a IPv6 address, so you may find ourself using IPv6 without knowing it. This has a couple of exciting consequences:

If the IPv6 router you contacted isn't actually connected to the v6 Internet then your traffic isn't going to go anywhere useful. Your computer will probably fall back to using IPv4, but probably after a long (multiple tens of seconds) delay. This is going to look to users as if the internet is going very s l o w l y. It's rumoured that some versions of Windows will under some circumstances spontaneously advertise themselves as IPv6 routers.
Even if you are successfully using v6 (knowingly or otherwise), unless you do something your address won't be in the DNS. So looking it up won't result in a .cam.ac.uk (or whatever) host name, and services that make access control decisions based on client host name aren't going to recognise you. [Even if the service makes decisions based on address rather than host name it will get things wrong if its access control lists haven't been extended to include the necessary IPv6 range(s)].

If a service advertises a v6 address but doesn't actually respond to requests on that address then again there could be a longish delay before your computer falls back to trying IPv4.
It's entirely possible (though probably not a good idea!) for a service to behave completely differently when accessed over IPv6 from how it does over IPv4. For a start, the process of resolving host names to addresses is entirely separate for IPv4 and IPv6 so there's no particular reason for the two addresses to end up on the same server. Further, web servers providing virtual hosts need a mapping between IP addresses and the corresponding virtual hosts and it's all to easy (as the operators of www.ja.net found some time ago) to forget to extend this mapping to include v6 addresses, with the result that v4 and v6 users end up seeing the content for different virtual hosts when requesting the same URL.
The IPv4 address 127.0.0.1 always corresponds to the local computer and conventionally has the name localhost. It's quite common for components of a server to communicate using this address (e.g. a web application and its database), and equally common to actively restrict communication to this address or name. The coresponding v6 address is '::1' - if this doesn't correspond to the name 'localhost' or if access lists only recognise 127.0.0.1 then, if IPv6 is enabled on a server, it may find that it can't tall to itself!
There are a lot of programs out there, in particular log analysis programs, that implicitly expect IP addresses to look like 131.111.8.46. Such programs may be 'surprised' to come across addresses like 2001:630:200:8080::80:0. How they behave will depend on how well they have been written, but in some cases this may not be pretty. Further, note that library calls that lookup v4 addresses to find the coresponding host name may not work with v6 addresses.
Firewalls and packet filters will need IPv6 configurations to match their IPv4 ones, and they are unlikely to be able to automatically derive one from the other. So, by default, they are likely to either block all IPv6 traffic or allow it - neither is likely to be what's wanted.
A lot of networks (especially home ones) use RFC 1918 'private' addresses and network address translation (NAT) when talking to the wider Internet. While not intended as a security measure, this does somewhat shelter machines on such network from active attack from the Internet. Use of private addresses and NAT are a response to the shortage of IPv4 addresses which isn't a problem in the IPv6 world, and so they are not widely supported (if at all). So enabling v6 on a previously 'private' network may expose previously hidden security vulnerabilities to the world, which may be unfortunate.

IPv6 is probably finally coming. Some of us may get an early brush with it on 8th June. It really is time to start thing about the consequences.

Thursday 14 April 2011

Why you shouldn't give away your password

So, you want to embed details of you Twitter, or Facebook, or <insert flavour of the month here> account on you blog. There's a widget that will do this (in fact the widget is the only way to do this) and 'all' you need to give it is you user name and password. You are using a blog provided by a big well known provider so of course you can trust them not to misuse your credentials. What can possibly go wrong?

This:
http://techcrunch.com/2011/04/13/hacker-gains-access-to-wordpress-com-servers/

Tuesday 15 March 2011

User-ids in a Shibboleth world

[This is a copy of a question posted to the JISC-SHIBBOLETH@JISCMAIL.AC.UK mailing list which, to date, hasn't seen much in the way of responses to the questions at the end]

A lot of existing web apps have wired into in the, often at a very low level, the idea that each 'user' has associated with them a unique, short-ish, fairly friendly identifier. Traditionally these were things like jw35 or 2006STUD42, though it's increasingly common to use an email address belonging to the user instead. Quite often these are (in fact or in practise) the primary key into the application's user database and they tend to get displayed (as in "Hello jw35", or in things like usage logs) and these two usages are often not distinguished.

When converting existing software for a SAML environment there's the question of what to use to fill this slot. If you have ePPN available then that will probably work (because it looks like an email address) but in the UK federation that's unlikely. The 'old' form of ePTID (rUL8A3M667VfsiCImQVFffN9cNk=@cam.ac.uk) might just about work, though it's ugly when displayed. The new form is close to unusable for display purposes:

https://shib-test.raven.cam.ac.uk/shibboleth!https://mnementh.csi.cam.ac.uk/shibboleth/ucam!rUL8A3M667VfsiCImQVFffN9cNk=.

Of course if you are creating accounts in advance and only later binding them to SAML identities you can use whatever user-id scheme you want, but what if you are creating accounts 'on the fly'?

We've noticed that 'off the shelf' Shibboleth adaptations (most recently for Plone and MediaWiki) tend to simply use what turns up in REMOTE_USER which, by default, will be the first of ePPN and the old and new forms of ePTID that isn't blank. In a UK federation context this doesn't really work, and I suspect many of these adaptations were written for contexts where ePPN is more widely available.

What have other people working in the UK environment done to address this problem, assuming you are seeing it too? Is there a 'best practise'?

Monday 14 March 2011

Consent to be required for cookies? (updated 2011-03-15 and 2011-05-11 and 2011-05-17)

An amendment to the EU Privacy and Electronic Communications Directive comes into effect on 25 May 2011. This makes it necessary to obtain consent before storing and retrieving usage information on users’ computers. Details are unclear, mainly because the UK government has apparently yet to publicise any information on how this requirement will be implemented in the UK, but in the limit it could require every web site to obtain informed user consent before setting any cookies, which is going to be interesting...

Here are some external references to the topic:

Updated 2011-03-15: More links

BBC News
Sitepoint
David Naylor (with helpful demonstration of best practice :-)
Andrew Cormack's blog
Out-LAw.com (again)

Updated 2011-05-11:

Guidance from the Information Commissioner's Office (2011-05-09)
DCMS Press Release (2011-04-15)

Updated 2011-05-17:

Andrew Cormack's blog (again) (2011-05-12)

Wednesday 2 March 2011

Other people's content shown to be dangerous

In Promiscuous JavaScript considered dangerous I said that including content from elsewhere on your pages was dangerous, not only because the people supplying the content might be malicious but also because they might fail to prevent third parties from injecting malicious content.

Judging by this BBC News article this is exactly what happened recently to the web sites of the London Stock Exchange, Autotrader, the Vue cinema chain and a number of other organisations as a result of displaying adverts provided by the advertising firm Unanimis. This will have caused problems for these various organisations' clients, and reputational damage and hassle for the organisations themselves.

Ideally you'd carefully filter other people's content before including it in your pages. But you may not be able to do this if, for example, the supplier requires you to let everything through untouched or if you are using the promiscuous JavaScript approach. In such cases you are entirely dependent on the competence of the supplier and, as demonstrated here, some are more competent than others.

Thursday 17 February 2011

Google Apps: SSO and IdM at #GUUG11

Here are some slides from a presentation I gave on Google Apps SSO and Identity Management at the inaugural 'Google Apps for Education UK User Group' meeting at Loughborough University on 15th February 2011:

Google Apps - SSO and Identity Management at the University of Cambridge

View more presentations from Jon Warbrick.

Wednesday 9 February 2011

Loops in PKI certificate hierarchies and Firefox bugs

In a previous posting, I mentioned being surprised to discover that PKI certificate hierarchies were more complicated than the strict trees that I had always assumed them to be. At the time I rather assumed that they must be directed acyclic graphs.

I've subsequently realised that there is nothing to prevent them from being cyclic graphs and have actually found a loop in an existing commercial hierarchy. Unfortunately it looks as if I wasn't the only person making false assumptions since it seems that certificate loops trigger an old-but-not-yet fixed bug in Firefox that prevents certificate chain verification.

Certificates contain within them the 'Distinguished Name' (DN) of a further certificate that contains the public half of the key used to sign the first certificate. Certificate are only identified by name, and there is nothing to stop multiple certificates sharing the same name (though all the certificates with the same name had better contain the same key or new and exciting bad things will probably happen). All this is what I worked out last time.

What I've discovered that's new is illustrated in the following diagram:

This represents part of the hierarchy under which certificates are issued to UK HE institutions by JANET (and I think by similar organisations in other countries) under a contract with Comodo. In this diagram:

The blocks with grey backgrounds represent key pairs. The numbers at the top of the box are the first half of the key's 'Key Identifier'.
The smaller blocks represent certificates containing the corresponding public keys.
The arrows link certificates to the keys that signed them (and so which can validate them).
Certificates with red backgrounds represent self-signed certificates that are trusted roots (at least on my copy of Firefox). The certificate with blue background represent an example server certificate.
The number in each certificate is the first half of the certificate's SHA-1 hash.
Certificates with a green border represent the recommended verification chain for JANET-issued certificates.

Copies of the certificates involved (and others) can be found here.

The problem here are the two certificates "31:93:..." and "9E:99:...", since each represents a potential verification route for the other. Neither are part of the 'official' verification chain for JANET-issued certificates. "9E:99:..." is distributed by Comodo in support of other of their certificate products. I don't know where "31:93:..." comes from, but I assume it appears in someone else's 'official' certificate chain. Both these 'intermediate' certificates will presumably be included in certificate bundles by particular web servers and, once serverd, tend to be cached by web browsers.

The problem is that, once a web browser has a copy of both of these there's a danger of it going into a spin since each is an apparently acceptable
parent for the other. It turns out that Firefox has exactly this problem, as described in bug 479508. Unfortunately this bug last saw any action in March 2008 so it's not clear when, if ever, it's going to be fixed. There are some other reports of what I suspect is the same problem here and here.

So who's problem is it? Clearly Firefox could and should be more careful in how it constructs certificate chains. It's possible that other SSL software is vulnerable to similar problems, though I've only seen this manifest in Firefox (and only then occasionally). But I also wonder what the Certification Authorities thought they were doing when they issued these certificates. As far as I can see they were both issued as 'cross-certification' certificates, intended to allow server certificates issued under one root certificate to be validated by reference to another. Issuing one of these isn't a problem. Issuing a pair clearly is.

A work around, should this problem bite you, should be to delete "31:93:..." and "9E:99:..." from Firefox's certificate store. Neither are roots, and any server that needs them to get its certificate verified should be providing them, so deleting them should be entirely safe. The work-around should last until you next pick up copies of both of these, at which point you'll need to delete them again.

Tuesday 1 February 2011

Root certificates for MacOS OpenSSL

In an earlier post I mentioned that, while MacOS includes OpenSSL it isn't preconfigured with any trusted root certificates. So before you can use it to do SSL properly you need to provide a set.

My previous post suggested extracting them from the bundle that comes with Firefox, but I've recently come across a useful article about Alpine on MacOS by Paul Heinlein <heinlein@madboa.com> in which he points out that the MacOS operating system already has a set of preconfigured roots and that these can be extracted using the Keychain Access utility for use by OpenSSL. See his posting for details, but to quote from it:

Open the Keychain Access application and choose the System Roots keychain. Select the Certificates category and you should see 100 or more certificates listed in the main panel of the window.
Click your mouse on any of those certificate entries and then select them all with Edit → Select All (Cmd+A).
Once the certificates are all highlighted, export them to a file: File → Export Items…. Use “cert” as the filename and make sure “Privacy Enhanced Mail (.pem) has been chosen as the file format.
Copy the newly created cert.pem into the /System/Library/OpenSSL directory

Now, I wonder why Apple didn't do this for us?

Saturday 29 January 2011

Thoughts on "Initial authentication"

"If it was easy we'd have already done it!"

I've recently contributed to some work on 'Password management". Here are some of the thoughts that I've managed to condense into words. Don't be disappointed - they raise at least as many questions as they answer.

1) Looking say five years ahead I think we need to talk about 'authentication' and not just 'passwords'. Given their vulnerability to keyboard sniffing it seems to me that within five years it will be necessary at least to support (though perhaps not require) some sort(s) of non-password based authentication for some systems.

2) While there will be pressure do so, it might be better not to try to solve everyone's problems. For example in some cases it might still be best to create a shared password system for use only on a limited set of systems and not allow anyone else to use it.

3) The understandable enthusiasm for SSO by some is at variance with an equal enthusiasm, in some cases promoted by the same people, for aggressive inactivity timeouts on individual systems. Meeting everyone's individual security requirements may result in an unusable service.

4) The group developing HTML5 have adopted a policy that says "In case of conflict, consider users over authors over implementers over specifiers over theoretical purity". It might be sensible to adopt something similar in this area (suitably adapted). Or perhaps not... Discuss.

5) A critical feature of any authentication system is who is able to reset its authentication credentials (i.e. reset passwords or equivalent), because they (all of them) can subvert the security of the systems (all of them) that use it. It looks to me to be difficult to simultaneously meet the expectations of people who would like easy local reset of passwords and the operators of some 'high risk' systems who want tight control of access.

6) Given the existence of a central password verification service overloading existing protocols (LDAP, Radius, ...), I don't see any technical way to restrict the clients that can use it since clients don't identify themselves. Such use could be restricted by rules, but they would be hard to enforce, and by non-100% perfect technical restrictions (e.g. client IP address filtering). So anyone providing such a service will have to accept that in practise it will be open to anyone to use. You could implement a central password verification service using something like SAML, where clients are strongly identified, but then there wouldn't be any clients to use it.

7) Accepting that we'll at least have to accept passwords for the foreseeable future (even if we accept other things in parallel), then the not-unreasonable idea that people will only willing accept using two passwords restricts us to a maximum of two authentication systems. So how about:

a) A 'low trust' service verifying a single password over one or more commonly used protocols (so LDAP, RADUIS, TACAS), intended for use in situations where we can't do better (3rd party service that can only do LDAP auth, WebDAV service that has to work with common clients that can only do username/password, etc).Document that this is a low trust service, that server operators can intercept passwords in transit, etc. Require good practive as a condition of using the service - don't ship passwords over unsecured networks, don't write them to disk, etc. Perhaps make token attempts to restrict client access (e.g. by IP address) but accept and document that this won't be perfect. This violates all my prerequisites for secure use of passwords, but perhaps on balance doing so is necessary to support what needs to be supported.
b) A 'higher trust' service where credential disclosure is limited to local workstations and a logically single central server. Web redirection based protocols (i.e. Ucam_WebAuth, Shibboleth) and Kerberos (and so Windows AD) meet this requirement and provide at least some single sign-on. Web redirect, and perhaps Kerberos could both use things other than a password for initial authentication. SPNEGO holds the possibility of transparently transferring a pre-existing Kerberos session into a web redirection-based system thus widening SSO for existing Kerberos users but leaving open password or other authentication for access to web systems in situations when Kerberos isn't available.

Item (b) violates my third prerequisite for secure use of passwords ("It must be possible for password holders to decide when it is safe to divulge their password"), but I'm coming to the conclusion that the price of adhering to this principle does not warrant the cost.

8) If you can't stomach the single shared password idea, an alternative might be a 'Managed Password Service' that extended the 'token' (in the current University Computing Service sense) idea by centrally managing multiple password sets (one for each 'system'). So the administrator of a new system somewhere could mint a new password set for their system and configure their system to do password verification against it using LDAP, RADIUS or anything else supported. Users could set and reset these passwords under the authority of their web redirect/Kerberos credentials. The end-system would have to it's own authorisation since in principle anyone could create a password for any system. This doesn't give 'two passwords', but it does at least allow one password to manage all the others.

Wednesday 12 January 2011

Microformats (lots of Microformats)

I've wanted to play with microformats for some time. The need to rework my (still somewhat minimal) work home page provided an ideal opportunity. To see the effect you'll need a browser plug-in such as Operator for Firefox, or to use something like the Google Rich Snippits testing tool.

Essentially microformats (and their friends - see below) provide a way of marking up HTML content with additional semantics to allow automatic parsing that wouldn't otherwise be possible. For example a human would know this this supplies my telephone number:


Jon Warbrick 
Tel: +44 1223 337733


but if I mark it up like this


Jon Warbrick 
Tel: +44 1223 337733


then microformat-aware processors should be able to reliably extract the number and associate it with my name - and then perhaps use this to create a contact list entry or put a call through to me. Similar microformats exist for events, reviews, licences, etc.

It turns out that there are (at least) three different, competing microformat-like systems out there:

Microformats
The original offering in this area. Aims to add semantic markup for various classes of 'thing' to standards-conforming HTML 4.01/XHTML 1.0. It largely does this using HTML structure and a range of pre-defined class names.

RDFa
("Resource Description Framework in attributes") defines a set of attribute-level extensions to XHTML which make it possible to add semantic markup using RDF syntax.

Microdata
This is a (proposed) feature of HTML5 that adds semantic markup in a similar way to microformats, but using new attributes itemscope, itemprop, itemtype and itemref rather than overloading class. As an experiment I've also tried marking up my contact details using microdata.