Saturday 14 November 2009

Re-using Raven's password database

[This posting was originally published on an internal wiki in early 2009 and is republished here to increase its exposure. What is says remains relevant today]

The University's Raven authentication system currently provides a usable authentication system for interactive, web-based applications but doesn't support non-interactive web-based applications, nor non-web ones. This excludes many potential uses: Windows/MacOS/Unix logon, IMAP, POP, SMTP, LDAP, WebDAV (and so CalDAV), etc., etc. Since Raven of necessity has a database containing username/password pairs for most people in the University, and most people know their Raven password, it is tempting to assume that extending it to support these other uses would be easy.

This post explores ways in which authentication based on 'Raven passwords' could be extended and tries to point out some of the advantages and drawbacks of each proposal. In reading this, you need to consider the security properties you might actually want from such a system (hint: it's not important that legitimate users have to provide their password before gaining access to something, what is important is that people can't realistically gain access using an identity that is not their own - these are not the same thing!). It is also important to consider who might be attempting to attack what, and how much we care about it: are we talking about a) a bored University student, b) a tabloid journalist, c) an organised criminal, or d) the American NSA, and are they trying to gain access to a) their mate's photo archive, b) the next heir to the throne's email account, c) the University financial system?

This paper only considers ways in which the existing, single Raven password might be usefully reused and ignores possibilities such as as using multiple passwords, one-time password lists, cryptographic smartcards and tokens, fingerprint readers, multi-tier authentication, etc. It also ignores two very real problems with 'static' passwords:
  • It is effectively impossible to prevent people giving their password away, and they do!
  • Users have to type their password into something - typically the workstation that they are sitting in front of. Depending on the workstation, it is entirely possible that it has been compromised, for example with a virus that installs a key logger or by a malicious or inept system administrator.
Option the first: forget passwords
Stepping back a bit, we could forget passwords and just ask people what their user-id is. This would even simplify Raven itself:

This would save people from having a password that they needed to remember, and would save the Computing Service from having to issue them.  Both of these would be significant advantages. Anyone could easily set up almost any system 'protected' in this way - the hard part might actually be ''stopping'' it from asking for a password.

The obvious downside is that we'd have to trust everyone who can get to a login prompt not to lie (and for many networked services this means everyone in the entire world), and so this is rather unrealistic.

Option the second: use a single fixed password
Rather than forgetting passwords completely, we could could use a single, fixed, 'well known' password to protect all accounts. This would be easy to remember (and easy to recover if you forgot it). It would also be fairly easy to set up a system protected in this way.

Now we only have to trust all the people who ever knew the password not to lie. It  is fair to assume that a 'secret' legitimately known by 55,000 (and rising) people (those who have ever had a Raven account) would not stay a secret for long so we'd still be trusting quite a lot of people not to lie, if not the whole world. It would also be next to impossible ever to change this password. Again, this is probably not a realistic option (though this doesn't actually stop people using it, though usually on a small scale - think of the codes used to arm and disarm intruder alarm systems).

Option the third: distribute a list of user name/plaintext password pairs
We could (in theory, though not currently in practice) extract a list of user names and corresponding plain text passwords from Raven, one for each user, and then distribute this list to every system that needs to authenticate users.

Users could then quote their user name and password, the system could look up the the user name and compare the offered password with the one on the list - if it matches they would get in, if not they wouldn't. Implementing authentication in this way would be fairly easy, and system administrators could post-process the list into whatever form their system actually needed.

This would avoid having to trust users not to lie - the vast majority of users would only be able to successfully authenticate as themselves.

The list itself, however, would now be a major problem. Anyone with access to it could authenticate as any user on any system relying on this authentication service. Worse, since many people use the same password in multiple places, anyone with access to the list could probably also forge authentication on entirely unrelated systems.

All of the managers of all of the systems using the authentication service would inherently have access to the list, so we would have to trust them. Even if we assume that none of these administrators would ever be actively malicious, there would still the danger that they might accidentally or recklessly leak the list (from a hacked server, on a memory stick, on a laptop left on a train, etc.). So the security of each system under this model would still depend on a whole group of people that each system's administrator has no particular reason to trust. In practice the only option would be to severely restrict the systems that could participate, to the point where such a service could never provide 'universal' authentication for the University.

Option the fourth: distribute a list of user name/'crypted' password pairs
Rather than distributing plaintext passwords, we could distribute them 'non-reversibly encrypted' (or 'hashed') - for example using the 'crypt' or 'md5' password format used in Unix password files. In principle this would make it possible to check that a proffered password is correct (by hashing it and checking that the hashed versions match), without exposing all the passwords on the list.

Unfortunately, hashed passwords can be recovered by a 'dictionary attack', in which an attacker generates a dictionary of words and their hashed equivalents and then searches the hashed passwords for matches. Since users have a bad habit of choosing common words (or trivial variations of them) as passwords, such an attack can be expected to recover a reasonable proportion of passwords. So even with hashed passwords the list would be vulnerable to compromise and would still have to be treated as confidential.

In addition, everyone logging-in would still have to offer their plaintext password for verification, at which point it would still be vulnerable given a malicious or negligent system administrator. Consider two systems 'A' and 'B', both using such a system and with some users in common. What grounds can system 'A's administrators have for believing that system 'B's administrators will not (accidentally or deliberately) capture, and disclose or use, user name/password pairs that would allow forged logins on system 'A'. What if system 'A' were on the Student Run Computer System web site and system 'B' was the University Financial System, or vice-versa? 

There is unfortunately a more insidious problem. If asked for their 'central authentication system password', how could a user know that it is safe to quote it? Passwords are only safe if you don't disclose them, but to use them that is exactly what you have to do. If you disclose them to a malicious system then you've lost. This is the basis of 'phishing' attacks aimed (with some success) at getting people to disclose the electronic banking or mail system passwords. Given all of the following, which are the bogus sites that is just trying to capture your password?

There is no obvious solution to any of this.

It's worth noting in addition that many authentication schemes (particularly those using some sort of challenge-response) need access to plaintext passwords, or at least particular hashes of the passwords, and so can't be supported by any scheme that distributes pre-hashed passwords.

Option the fifth: central password verification
Even if a list could be made to work,  distributing it would be difficult especially if it needs to be done in a timely manner. One solution would be to have a 'central password verification service'. In this model, a system using the authentication service solicits a user name and password for a user and forwards them to the central service which returns a match/no match response. Almost any network protocol could be used for this, but it is common to use LDAP and overload LDAP's 'user login' process to provide authentication. Alternatives include POP, IMAP, RADIUS, and home-grown protocols.

This approach successfully avoids exposing the list of everyone's plaintext or hashed passwords to system administrators (and potentially others) which significantly reduces the exposure. It unfortunately still requires that users's plaintext passwords be disclosed as they authenticate, and does nothing to help users decide when they should and should not quote their password. 

Protocols used for remote password verification need to be configured so that they don't expose plaintext passwords on the network, and so the authentication server can't be impersonated and used to collect passwords. Doing this correctly makes things considerably more complicated and is something which is often omitted.

Option the sixth: Kerberos (or similar)
Kerberos was designed precisely to overcome many of these problems. It allows a central verification service to assert that a user knows a password, and so has authenticated themselves, without the user having to disclose their password to anything other than their local workstation. It uses assorted cryptographic sleights of hand to do this and has numerous other important properties, such as preventing the recipient of an assertion from using that to impersonate the user on any other service.

In principle Raven could be extended to become a Kerberos central verification service (a 'KDC'). But using Kerberos comes at a cost. Firstly the user's local workstation needs Kerberos software installed and configured. Secondly the Kerberos protocol is significantly more complicated than a user name/password exchange, so unless the systems to be protected already supports it then there are likely to be difficulties.

One interesting development is that Microsoft have adopted Kerberos (or at least, a version of Kerberos) for authentication under Windows. For example every Windows Active Directory Domain is also a KDC. By establishing appropriate trust relationships between Windows KDCs throughout the University it might be possible to use a Raven password to authenticate to an Active Directory Domain and then to rely on Kerberos for further onward authentication as and when required. Similar Kerberos-based login arrangements are available for MacOS and Linux. It is possible that in this environment a malicious or inept AD administrator could compromise authentication - further research is required to establish if it is practical.

Kerberos reduces the number of times that passwords need to be entered and so reduces, but does not eliminate, the problem of educating users about when the should and should not provide their passwords when asked. Note that some software, for example this 'Kerberos' module for Apache, will misuse the initial Kerberos user authentication process by soliciting the user's user name and plaintext password and then seeing if it can authenticate as the user. In this they are just using the Kerberos system as a central password verification service with all the problems that this entails described earlier. Again, it's not obvious how to enable users to safely detect this.

Where does Raven currently fit into this?
The current Ucam WebAuth system used by Raven, and a whole range of similar web-redirect-based systems, depends on the client software in use being a web browsers and on browsers including as standard a fortuitous combination of features:
  • Support for HTTP redirects, allowing the client to be instructed to contact the authentication server direct, allowing communication that bypasses the server initiating authentication;
  • Support for the https: protocol, providing both security for the user's password on the wire and a way for users to positively confirm that they are communicating with the real authentication server and not an imposter (and so that it ''is'' safe to disclose their password); and
  • Provision of a user interface which can solicit a user and and password.
Ucam WebAuth has some features in common with Kerberos, but without the need to distribute or configure client software, though its use does require much more investment at the server end than a simple user name/password system. It does, crucially, only require the user to disclose their password to the central authentication server and provides a way for users to easily identify it. As a result it, just, manages to provide a reasonably reliable authentication system using passwords, but of course only in a web environment.

Friday 13 November 2009

Secure use of passwords

[This posting was originally published on an internal wiki in 2006 and is republished here to increase its exposure. I believe that what is says remains entirely relevant]

Passwords are about the best tool we have for identifying people on-line. There are alternatives, but they have financial and organisational costs that don't scale to the 30,000 people at the University of Cambridge. Unfortunately passwords are actually a very poor tool for this purpose and there are a number of prerequisites that must be met if they are to work at all. Three related ones interest me in particular - one that many people understand, two that they don't.
Prerequisite 1
Passwords must not travel in clear over insecure networks. Many networks are relatively easy to snoop. Most wireless networks are trivial to snoop and doing so doesn't even need physical access. Anyone who manages to capture a userid and password by snooping can impersonate the user for as long as the userid/password pair remains valid, and may be able to leverage this to gain further privilege. Most people understand this.

Prerequisite 2
Passwords must not be divulged in clear to untrusted systems. It's no good having a secure central password validation service if third-party systems collect userids and passwords and pass them on to the central service. Any one of these systems could maliciously capture userids and passwords, or accidentally or recklessly expose them. Even if access to the central validation system is itself secure, a common failing of such third-parties is for them to cause or encourage passwords to be sent in clear on insecure networks.

Of course it's tempting to say "My system's secure so it's safe for me to do this". The problem is that, given n such systems it's necessary for each to trust the other n - 1. Clearly this doesn't scale much beyond n = 1.
These two can actually be combined: Passwords must not travel in clear over insecure networks or through any system that anyone doesn't trust.
Prerequisite 3
It must be possible for password holders to decide when it is safe to divulge their password. They need to divulge their password to authenticate, but divulging it also makes it possible for others to impersonate them.

A system that only ever requires a password to be entered on one particular secure form on one particular web site goes some way towards meeting this requirement. Even if many people don't understand the issues it is likely that at least some will, and will identify and report other occasions on which they are asked for this password. Such other requests are likely to be dangerous or malicious. As the number of occasions on which a particular password is legitimatly requested rises, so does the difficulty of explaining and understanding what these occasions are. Beyond a small number, most people will reflexively enter their password whenever they are asked for it. This will result in an increase in likelihood of malicious or accident password interception and so a decrease in the reliability of the authentication system.
Remember, when considering password-based authentication systems, that the important thing isn't that legitimate users get challenged for a password before gaining access, even though this is the behaviour commonly checked by management. The important thing is that people can not realistically gain access using an identity that is not their own, and this isn't the same thing!

Monday 9 November 2009 terminally broken?

The email forwarding service seems to be terminally broken this morning, with all its registered mail hosts returning 'Not route to host' when accessed on port 25. Brief Googling suggests this has been going on for a while, and I've just had a message bounced that was originally sent last Wednesday which seems to confirm this.

While I have always used the basic service, and so haven't actually paid anything for it, this level of 'service' is unusable and suggests that Bigfoot don't care about mail forwarding any more. Time I think for my own domain name and the end of a long, and largely happy, relationship with Bigfoot.

Wednesday 4 November 2009

How come Google got it wrong?

We've noticed that various 'Points of Interest' on Google maps in Cambridge city centre are just plain wrong. For example Christ's College appears at the junction of St Andrew's Street and Downing Street when it's actually several hundred meters north on the east side of Hobson's Street. Emmanuel College and the Museum of Archaeology and Anthropology are also in the wrong place, though not so drastically. There are others.

I think Google have Christ's College in the wrong place because they think its post code is CB2 3AR when it should be CB2 3BU (even that's still not ideal for something as big as a college since it identifies the delivery entrance rather than anything suitable for visitors, but it's better than nothing).

For Christ's this seems to be a common error (try Googling for "Christ's College CB2 3AR" to see the extent of the propagation of this error) but I suspect the fact that have got it wrong may be at the root of the problem. It might be possible to get this fixed, though it's going to take a very long time for the fix to propagate.

Emmanuel has a similar problem - Google has their post code as just 'CB2' so they have been put where 'St. Andrews Street CB2' resolves to. It's harder to track down where this error originates, and I haven't tried.

The Museum of Arch and Anth is more interesting. Google have their correct address of "Downing St, Cambridge, Cambridgeshire, CB2 3DZ", and "CB2 3DZ" resolves to their correct location. However "Downing Street, CB2 3DZ" resolves to the incorrect location shown on the map so there is something odd about the way the Google geolocation service handles this particular address (perhaps it doesn't always find "CB2 3DZ" in its database and so falls back to "Downing Street, CB2"?).

These errors are not surprising, because Google isn't a traditional map company. They don't go out and survey anything but just purchase, collect and aggregate data from various sources (TeleAtlas for the streets, their own search index for address information, etc, etc.). Some of this information is wrong, which isn't surprising - what's really surprising is that the results are as good as they are. But it does cause us a very real problem, because some people feel that we can't use anything based on Google maps for as long as it shows well known University locations in the wrong place.

Addition 2009-11-05: A colleague has pointed out that part of the airport at Cambridge is labelled "London Luton Airport".