IEEE Internet Computing, November/December 2008:
Who are you, Part 2: The tradeoff between information utility and privacy

Jim Miller
Miramontes Computing
November 2008

In the first installment of this column, I discussed how people handle personal information on the Internet and how that information can be used for commercial purposes. In particular, my point was that the Internet has been evolving from a place where people are simply asked for information to one where the personal information is a fundamental part of the application or service someone is using: When you tell a search engine what you're looking for, you're also telling it what kinds of ads you might be interested in seeing. As a result, privacy is becoming something of a negotiation between people and service providers, one where people might be willing to give up information about themselves if they can get something of value in exchange.

There have been several different versions of how people get and provide personal information on the Internet, which we talked about last time:

  • I am who I say I am
  • I am who others say I am
  • I am what I do

But there are still more versions to consider; these push personal information even farther into the fabric of the applications and services that people are using. That's what I'll be talking about here.

Version 4: I am who my friends are

Social networking sites such as Facebook and MySpace are amassing a huge amount of personal information, and it's pretty easy to see how that information could be used for commercial purposes. As with contextual search, social networking information is honest (at least as honest as we're willing to be) and frequently updated. Millions of people pour substantial information about themselves into these sites — basic demographics, education and work history, political and religious attitudes, and preferences about music, movies, TV shows, and other product categories. This information isn't unlike what people write about in their blogs, but, in social networking systems, it's nicely compartmentalized — thus ready for the company and its affiliates to interpret and acts upon. If you include Bruce Springsteen in your list of favorite musicians, don't be surprised to see an ad offering you tickets to one of his upcoming concerts.

Of course, the key things social networks implement are links to your friends —another gold mine of information about you (and, in return, about them). Do you like what they like? Very likely. Are you more likely to click on an ad that one of your friends has also clicked on? Probably. Are you more likely to join a social network because your friends have already joined? Of course — that's the whole point. Thus, well-implemented and well-marketed social networking sites can exploit a positive feedback loop: more people go to the popular sites, which means they get more popular, so more people go there, and so on.

The huge amounts of information that can be collected from a social network can very quickly become a significant barrier to entry for potential competitors. Users tire of entering their personal information into multiple social networking sites, so the temptation to stay with one is great. The number of people who can manage and analyze these data is limited, and it takes a lot of money to collect these people, equip them, and let them do the work. That's good news for companies and investors active on the Internet, who, like all businesses, love to find ways that will keep other companies out of a market in which they're doing business. It also increases the likelihood that there will be a small number of companies collecting and controlling this information, which is where the creepiness factor emerges again.

The technical counterstrategy to one company acquiring such a monopoly on personal information is for several companies to define a standard for collecting and distribution personal information which any interested company can adopt. In this way, users only have to enter their information into one place, and any site that wants to offer social networking services can (with the user's permission) use it. Thus, we have recently seen such alternatives such Google Friend Connect, Facebook Connect, and MySpace Data Availability. This reduces the risks of a single company controlling all this information, but there's still considerable value to controlling the standard, and to being a dominant repository of this kind of information. Thus the presence of major players in the social networking world proposing their own standards: if a standard catches on, better it be one that you control than one you don't. There are technical and business issues that make creating a successful standard like this a challenging task; it's just now playing out, and it's too soon to tell how it will be resolved. But there's no doubt that we'll be hearing more about it.

In any case, social networking sites mark a new point of interest in our discussion of how user and business interests about personal information come together. There's no doubt that social networks are providing people with clearly understood value — people like to keep track of their friends and share information with them, and social networks are great at this. If they don't give up the information, they don't get the benefits, and that, so far, is a trade-off many people are willing to make. But from a privacy perspective, the real issue here is whether people have a clear understanding of what's being done with that information and the risks involved in disclosing it. Understanding the risks was easier when the main issue was whether registering on a site would get you onto a junk mail list — the cause and effect relationship of these events is easy to understand. But the risks inherent in large collections of personal information or Web activity are more subtle, and harder to understand, than that, and you can't be a rational user of these systems if you don't know what the issues are. Thus it's in everyone's interests to have an open discussion about them and work toward a rational plan for proceeding. Companies might hope that people will tire of trying to understand how these games are played and just blindly agree to whatever the companies want, but this must be compared to the risk that the public might reject the whole premise and walk away from the game.

Version 5: I am where I am

Finally, we can consider one more stage of access to personal information: Take everything we've just talked about and add in knowledge of your personal location, perhaps via a GPS addition to your cell phone. We've heard predictions about location-specific, ubiquitous computing for a long time — walk past a coffee shop, and your phone rings and offers you 50 cents off a latte. Or, as shown in the film Minority Report, walk into a store and be immediately greeted with a question about your last purchase. These capabilities are impressive, scary, and probably even doable; if not today, then maybe tomorrow.

The big question, once you get past the data's storage and processing requirements, is one of usability — whether these sorts of notifications can scale in a personal sense. Do they arrive exactly when you want them to, or would you be barraged with so many probes that they would become unbearable and, ultimately, rejected? Of course, there are more subtle uses of mobile information: a search for "Bora Bora climate" might yield different results if the search engine could determine whether I was currently in a travel agency or a government weather station. The question remains whether there's a way to use this information so that its users see it as understandable and valuable, rather than a frightening intrusion on their lives. My guess is that we're in for a lot of experiments and failures, and maybe a few successes.

Don't worry, be happy?

Ultimately, the ways that this information is acquired and used is up to us, and sometimes, that worries me. Let me tell you a story.

As part of my consulting work, I recently ran a usability study on a soon-to-launch Web site. As part of the study, I had test subjects register on the site. The registration process asked for the usual kinds of information — first and last name, birthday, gender, zip code, and state of residence. The site's designers had been careful to indicate that only some of these fields were required for registration, yet I was struck by the fact that all the subjects happily filled out all the parts of the registration form, even those that were marked as optional. No concerns, no avoidance of the non-required information, and no questions about why the required information was in fact required.

Once registered, the subjects were asked to link their newly created accounts to a test account at a popular social networking site. I then asked them what effect this link would have on their page on that site — would it report that they had joined the site, or present any of their registration information? They didn't know, and didn't seem particularly worried about it. They then worked with the site for about a half hour, during which the site collected some information that I (at least) considered to be pretty personal. I again asked about the presentation of this information to others, either on the site under development or the social networking site. Here, they thought it probably would be, but, again, it didn't particularly bother them. Displaying the information to others was, in fact, the site's point.

I'm still thinking about this — the participants' willingness to give up information for public display and consumption. This was a study for which participants were being paid, so perhaps we could attribute some of their willingness to enter and connect information to the study's demand characteristics; I was the authority figure in the study and was asking them to do it, and so they did. But that doesn't explain their lack of concern for what happens to the information, and their willingness to provide it.

I couldn't help but think of the old story about the best way to cook a frog. If you throw the frog into boiling water, it'll jump out instinctively. But if you put the frog into a pot of warm water and slowly raise it to the boiling point, the frog will never notice. Is that's what's happening here? Surely, many of the early complaints about privacy and advertising on the Internet were because of the sharp contrasts between the new and old worlds; I'll confess to having had my own share of outrage at early banner ads. (Does anyone else remember the early Prodigy system, where ads took up a quarter of the screen?). So, do we have a new generation of people growing up that have no, or at least very different, expectations of privacy, and how information about themselves fits into the world?

Perhaps this is nothing new — I wonder, from a history of technology perspective, what the reaction of the public was to the very early telephone books. Did people object to the fact that, for the first time, there was a book that told the entire world where they lived and how they could be contacted at any time of the day or night? What, they would ask, has become of privacy when anyone can have that information?

I'm not sure about those earlier times or about now. But what does seem clear is that a public dialogue is underway that addresses these issues. This dialogue is interesting in that it's perhaps based more in actions than in words, and it's taking place among parties that are sometimes invisible to each other. Moreover, some of the topics under discussion are equally invisible — it's not at all clear to the public who will ultimately use this information, and what they'd do with it. But that doesn't minimize its importance — if anything, it makes our awareness of it, and our participation in it, even more critical. We, as members of the computer and information industries, are playing a central role in it. We have much to say about these issues, and much responsibility to address and think carefully about how this dialogue progresses, and makes its way into products, services, and public policies.