IEEE Internet Computing, March/April 2009:
One more take on identity

Jim Miller
Miramontes Interactive
March 2009

After having just written two columns about identity, I'd like to change course by writing about... identity. But there's a difference: the previous columns were mostly about how the kinds of information that are available about ourselves on the Internet has evolved in recent years. Here, I want to think a little more deeply about the methods we use to identify ourselves on the Internet — how we do it now, what's wrong with the current approach, and where this notion of identification might go in the future.

User name and passwords: A system at the breaking point

I was an early user of Gmail, and, because I got in early, I was able to claim the user name jim.miller — no extra numbers or letters or anything. Unfortunately, I now get more than a little e-mail — sometimes 50 messages a month — intended for people who have told other people (or web sites) that their e-mail address is jim.miller@gmail.com. Evidently, the "real world" names of these people all seem to be Jim Miller, and, although they're really registered as jim.miller4567, they've given jim.miller@gmail.com as their address when they've registered for other websites or services. When I've followed up on some of these mis-directed messages, the problem is usually that the people have a faulty mental model about e-mail addresses: they simply don't realize that the numbers in an identifier like jim.miller4367 are significant, and have to be included to properly identify themselves. They leave them out, and mail intended for them goes to the wrong person (i.e., me).

While it's probably true that many of these people have less Internet expertise than the typical reader of this magazine, I think it's better to think of them as canaries in the coal mine — their problems now indicate a more serious problem that will sooner or later affect all of us. That problem, I've become convinced, is that the entire system of user names and passwords is broken and is failing to scale to meet the demands of world-wide usage of the Internet, and that we need to look for a new and better approach.

Part of the problem lies in simple human memory limitations: the text strings that represent us on websites and e-mail systems are too similar and too easy to confuse. But beyond that, the whole approach of usernames and passwords simply doesn't scale. Websites are collecting millions of usernames, and very few of them get rid of the names that, for whatever reason, aren't being used anymore. From the user's perspective, there's little motivation for users to delete an account they aren't actively using, and other accounts simply get forgotten. It's also not that easy for a user to delete their account from most websites — go try to find a link on a major site where you can do this.

Companies also aren't particularly motivated to remove abandoned accounts. There's no reliable way to tell when an account has been abandoned, and the cost of doing nothing about them is in most cases just accepting a minor bump in the site's database and disk space requirements. Further, since all websites want to show that they have lots of registered users, they're unlikely to do something that would lower that number. In any case, even if an account were to be deleted, it's not clear that the username should be made available again — how would we distinguish the actions of the new owner of the account name from those of the old owner, given that references to both can be expected to show up in search engines and their cached copies of web pages?

But, ultimately, this simple approach to user identify fails the "100 year test" — whatever the Internet will look like in 100 years, it's hard to believe that my great-great-great-grandson will be registering on these future websites as jim.miller4353785734895. Things will have to change someday, and it's not too early to think about what those changes might be like, and what it will take to make them real.

What might a next-generation user identity system look like?

As an alternative to textual usernames, let's consider the use of some sort of computational token that will identify me in a number of critical ways:

  • Uniquely — no more fighting over the same username.
  • Privately — perhaps most importantly, this token needs to give its users explicit control where and when it's used for identity purposes. Verifiably — there should be one or more reputable third-parties who will guarantee the connection between a token and its owner.
  • Broadly — the token should work across multiple websites.
  • Permanently, or at least relatively so — it should outlast the lifespan of any website on which it might be used.
  • Securely against theft and hackery.
  • Repairably — there should be ways to recover from theft, hackery, and errors, should they occur.

A complete solution to this problem is well beyond the scope of this article. But we can at least talk about what a solution would look like, and, as we'll discover, there are a lot of things that would need to be done.

Technical standards would be needed for the tokens — format, transfer protocols, and the like. And standards committees don't come about without the creation of standards committees.

Token creation and registration. Governments might play a role here, but the idea of a government agency being in a position to track a large part of my Internet use would probably be way too scary for many people. It's also hard to imagine multiple governments getting together and agreeing on anything like this. Thus, private companies are more likely to be suppliers of these tokens, and probably multiple competing companies at that. The companies would presumably also serve as de facto registrars of the information, in effect guaranteeing that the information is valid and that the owner of a token really is who he or she claims to be. So, as a user interested in getting an identity token, I would choose one of these companies, provide whatever identifying information they require, and receive my token.

Of course, this system is only useful if the guarantees of identity can be believed — if an evil token company could issue tokens for anybody claiming to be anyone, the whole thing would fall apart out of lack of trust. Thus, there would need to be controls set on who can issue tokens (just as there are now controls on who can serve as domain name registrars), standards for the minimal level of identify verification to be done by registrars, and ways for tokens issued by non-compliant registrars to be detected and invalidated.

Finally, if we're relying on private industry for these tokens, there needs to be a viable profit model in place for these companies — perhaps they draw a fee from the user for issuing the token, or receive payments corresponding to their use, or from some other source.

User experience. The problem we're addressing here is fundamentally one of user experience, and, as we said before, privacy and security will be critical aspects of this experience. After all, if you could monitor my token usage, you could learn a lot about me. If you could steal my token, physically or otherwise, you could get a lot of access to my life. And if you could compromise a registrar's server, you could get control over a lot of stuff about a lot of people.

Assuming such a foundation (no small assumption!), we can consider big user experience questions that got us here. First, how would these tokens get used in e-mail, so I don't get confused with another person with a name similar to my own? What's needed here is a way for my token to act as an intermediary for myself and my e-mail account. That means that mail clients would need to send my tokens along with my messages, and mail servers would need to be ready to accept them. Similarly, incoming messages would contain tokens from their senders, and so mail clients would need to get a message's token and, by reference to the registrar who issued the token, use it to identify the sender.

Second, we also want to be able to use these tokens to register on websites, such that I can simply be "Jim Miller' on the site, as verified by my token, instead of something like jimmiller345745. Therefore, we would need web sites to support a registration process that would use tokens as the user's registration information. This doesn't seem all that different from distributed authentication systems like OpenID and Facebook Connect, except for the abstraction of the user's personal name from the token.

Then, once I've got an account, I need to be able to log in transparently with the token. My browser therefore needs to have access to the token and be able to send it to the server as part of an identification and authentication process, perhaps in ways similar to those used by modern operating systems to retain server passwords and automatically connect users to those servers.

Finally, note that I've been talking in terms of traditional web browsers, mail clients, and their respective servers. There would need to be similar discussions for other applications (instant messaging, for instance) and other devices, mobile phones in particular.

Could this happen? Or is this all a pipe dream?

So far so good, at least as an initial sketch. But we should now take some time to think about why it might not work, especially because there are some serious roadblocks that would stand in the way of a scheme like this. Very few of them are technical; rather, they lie in how the potential technologies fit into the current and future marketplaces and user communities. In particular:

Scary security and privacy issues. While it is probably possible to design and implement a system like this that would offer the kinds of security and privacy needed for it to be acceptable to its users, we also have to ask whether ordinary people, with little or no formal computer training, would believe that such a system would in fact be secure and private. Given the inevitable uncertainties and complexities of the system, there's a good chance that people would choose to stay away.

I'll worry about that tomorrow. For now, the pain inherent in the current system is manageable and perhaps not widely experienced. Thus, there are not large numbers of users demanding a solution, nor are they likely to be willing to pay either up-front or ongoing fees just to know that that invitation to their weekend barbecue went to exactly the right person. Similarly, it's not clear that the companies who would bear most of the cost and effort of adopting a system like this are desperate for a solution. Both users and companies would ultimately be asking themselves whether they're better off with the current flawed approach, or with a new approach that could be better if it works the way it's supposed to, but could also be much worse if it's broken or misused.

Questionable business models. There is, in principle, the opportunity for a token provider to act as something of a "toll booth" for secure Internet transactions. However, the necessary competition between token providers would likely keep the prices down to the point where participating companies would see profits, but not to the extent either they or their investors would like.

I still believe that the "100 year" version of this problem is real, but is the world configured to solve it any time soon? Perhaps not, and that, I guess, is the real lesson of this column: The Internet is more than just a collection of technologies, and a problem that might seem to require just a simple technical change can in fact demand changes in usage patterns, user psychology, business models, and even public policy. Problems like this can get resolved if there's a widely-experienced pain point or a huge profit potential for someone. But, otherwise, it's very hard to get all the players properly aligned and motivated.

So, is there any hope?

After working through this exercise, I guess I'm feeling both hopeful and skeptical: Hopeful, because this remains a problem that's not going to go away. Skeptical, because it's a problem that requires such a wide-ranging solution. In the meantime, the best advice I can give is to get to websites early and stake your claim to those simple names. And, if you're lucky enough to get one, get ready to be confused with someone in Minnesota or Florida or Texas. They may be all the way across the country, but they're really only a couple of characters away.