IEEE Internet Computing, November/December 2006:
Usability testing: A journey, not a destination

Jim Miller
Miramontes Interactive
November 2006

In previous columns, I’ve had much to say about the importance of usability testing. This probably wasn’t a big surprise to you — any time you have a conversation with a user interface (UI) person, you’ll probably hear the same thing. In this installment of The Internet Experience, I’ll talk about the various ways in which usability testing can fit into a project — there are many! —and how good testing techniques can make your project run better and faster, and produce an improved result.

Testing at the start

The first time to talk about testing is at the project’s very beginning: You’ve identified the problem you’re concerned about, and are about to design and build a system to address it. Obviously, the first usability question on that day shouldn’t be what the menu bar looks like. Rather, it should focus on your problem from a human perspective:

How people think about it. What are the big concepts that make up the problem in the minds of your potential users, and what language do they use to talk about it?
How people currently work on it. What tools (both computational and physical) do they use, and how do they divide a big problem into smaller and more manageable tasks?
How human capabilities and limitations affect it. What parts of the problem can be traced back to specific things that people can and can’t do, and how do those human characteristics constrain potential solutions to the problem?

The goal in this phase is to identify the general parameters of a good solution to your users’ problems. If you like, think of this as user-centered requirements definition.

Not to oversimplify things, but the best way to answer these kinds of questions is to look. The questions you have regarding your project at this point are far too broad to permit any controlled usability tests. They’re also not amenable to focus-group-type studies looking at general user attitudes and preferences. Rather, they’re best addressed with studies focused on individual users’ activities, those that rely on observation, interviews, and careful study of the problem’s setting. You don’t need or even want a usability lab for these kinds of studies: you simply want to talk to people where they do this work, now and in the future.

In these kinds of studies, interviewers should ask broad, open-ended questions about the problem your system will address, and carefully observe what the participants say and do in response. “How do you deal with this problem now? Can you show me?” The interviewers should be talking much less than the study participants, and, when the interviewers do talk, it should be to keep the participants talking about their own activities. The most useful kinds of data coming from these kinds of experiments are typically video recordings of the sessions (don’t worry — both you and your participants will forget about the camera very quickly), photographs or sketches of desktops, notebooks, and workspaces, and notes taken by a second person assisting the interviewer.

As a rule of thumb, interviews with five to 10 people should be plenty for such studies. You’re asking high-level questions here, so if five people tell you significantly different stories, something’s going on that you don’t understand, and it’s best to back off and re-think things. In any case, after approximately five interviews, it’s a good idea to step back, look at your results so far, and confirm that the questions you’re asking are getting you the information you want. Sessions with your participants shouldn’t last more than an hour: interviewers and participants will both start to wear out after that.

Early design: Prototype testing

Let’s assume that, after conducting these interviews, your team has developed a potential solution that fits in well with what people need and will be able to use. You’ll have a rough interface design or maybe even a few alternatives. You’llalso have some fairly specific questions about the design, but development has just begun and there isn’t anything complete or reliable enough to test. How should you proceed?

It’s a good idea to take the proposed solution back to some prospective users and conduct new interviews focusing on how the solution would fit with how they do their work. You might also get a usability expert to carry out a preliminary design review and check the design against common interface-design guidelines. These are fast, inexpensive, and effective ways to identify potential problem areas in your design almost before you’ve begun.

But even after you’ve done this, you’re probably still faced with a design whose quality you’re unsure of. The best thing to do at this point is to build and test a prototype. The first step in doing this should be to plan out the study itself: Identify some real-life tasks that will address those parts of the design about which you’re most concerned. A test of a Web site, for example, might ask a user to change their address in their account; a test of a word processor might ask users to create a new style that uses Helvetica 10 point bold. The task descriptions should state the problem clearly but vaguely: you don’t want to give away the solution to the problem in the problem description.

The next step is to build a prototype of your system that will reasonably approximate using the system to carry out these tasks, even though there might not be any real code behind the prototype. The point is to test the experience, not the real system (which, again, might not even exist at this point). It’s always a good idea to define concrete performance goals for the tasks, such as “100 percent of test participants should be able to register on the site, and 95 percent of test participants should be able to do so within 2 minutes.” This way, problems aren’t just a matter of opinion — you have an objective definition of what a problem is.

Paper prototyping

These prototypes don’t have to be running on computers: you can often carry out surprisingly effective design evaluations using paper prototypes. These are just what they sound like: you make them by printing out images of the various displays that the user would see when doing the task: main pages, dialog boxes, pull-down menus, and so on. Print the different components separately and cut them to size. Then, run the participants through your testing tasks, asking them to “click” or “type” just as they would if they were using a real system. As they do this, the person running the study puts in front of them images corresponding to how the system would respond to their actions. This may sound phony and implausible, but if you and your participants can adopt the right level of suspension of disbelief, issues in the design will quickly become clear, and you can start to get a qualitative sense of how good a solution you’re developing. It’s also really easy to make and test changes in the design — just draw and print out some new images.

Programmed prototypes

Another way of testing prototypes is to use standard interface development tools to create executable simulations of the applications or sites. You can use Visual Basic and similar environments for desktop applications, or Web tools that create HTML, JavaScript, and the like for Web sites. In both cases, you can build these prototypes without having any real application code behind them. Rather, when the user clicks on something, they’re taken to a hard-coded screen or page that shows what would have happened if they’d done that on a real system. This approach offers the speed of paper prototyping with the greater realism of working with real (albeit simulated) software. Building this kind of prototype has the added benefit of producing a demo of the ultimate system, which can be a great way to give your project’s stakeholders an early look at what you’re building. Plus, you might find yourself well on the way to building the actual code for your interface.

What kinds of data should you collect in studies like these? Similar to the early interview studies, video can be very useful in this phase, and fully-equipped usability labs generally have equipment that can record images of both the participants and the prototype. If you don’t have such a lab, you might be able to find facilities that rent by the hour. But you shouldn’t skip this stage of testing because you don’t have a fancy usability lab: putting a camera up in an office, collecting what data you can, and taking good notes during the session can yield invaluable information.

If you’re working with programmed prototypes, another alternative is to add event logging to the software to capture all the mouse clicks and keyboard actions that make up a user’s interaction with the prototype. This can produce a rich body of data, but be sure you know how you’re going to look at the data before you start down this path. It’s possible to go way overboard and collect huge amounts of data that you’ll never be able to analyze.

The issue of how many people should be included in the study comes down to what questions you’re trying to answer. If the purpose of the study to make hard choices between alternative designs, you’ll want enough data to know that your findings and the decisions that come from them are valid; five or 10 people will probably not be enough. Ultimately, such a study calls for a much more rigorous experimental methodology than that described earlier, and discussions of those techniques are beyond this column’s scope. The point for now is that if used well, prototype-oriented systems are powerful enough to be viable tools for making those kinds of decisions.

Nearing completion

As the project comes to an end, the stereotypical notion of usability testing comes to mind, with labs and two-way mirrors and dozens of subjects working on something close to the final product. But this is also when you face the conundrum of usability testing: The code is far enough along that there’s finally a real system that you can test, but making changes at this point in the project can be costly, difficult, and politically problematic. This is not the time to think about wholesale interface redesign, just as it’s not the time for the software architect to call for a wholesale reworking of the architecture. The good news is that if you’ve done the preliminary studies discussed earlier, you’ll have resolved all the big questions that could lead to significant changes long ago.

At this phase in the project, testing the near-final product on an appropriate set of tasks can offer a final evaluation of the previous design work, this time with enough people that you can make real statements about how their performance matches goals you established earlier. This might be the first time that you can test performance issues and other matters that depend on a real implementation — does some part of the system perform correctly, but take so long that people won’t use it? This is also the time for final confirmation of text messages, menu structures, images, colors, and the like — the “finish work” that can have a big impact on success. Of course, you always need to be prepared for the “uh oh...” discovery — it’s not impossible that you’ll discover some serious problem that becomes evident only when real people start using the near-final product. But, again, the way to reduce the chances of this happening is by doing the groundwork of the earlier studies.

Post-release: Looking to the future

Just because the project is done doesn’t mean the project’s usability work is finished. Rather, it’s the right time to think about how to prepare for future releases. Because you now have a fully-implemented product, this is a great time to take it back into the lab, assign people specific tasks, and look for problems that made it into this release but that need to be fixed in the next one. You might also build a study around a group of people working with a specially instrumented version of your software, one that will log very finely-grained records of its use in real-world situations. You again need to be prepared to analyze the potentially large amounts of data you can get from such a study, but this approach can get you insights into your product’s use that are hard to get any other way. This is also a good time to go back into the field to see if people are really using the product the way you thought they were going to. The challenge here is to think broadly and not just in terms of simple fixes — you can now at least ask whether redoing the whole design is the right thing.

Another evaluation technique popular with Web-based software is to do live testing of alternative designs, sometimes known as “A/B testing.” These techniques are often used by companies like Google and Amazon, whose services are physically divided among servers, but they’re also applicable to smaller-scale sites. This approach is to create alternative designs for some part of the service — perhaps the description of a product being offered for sale. Then, the experimenters install each of the alternative designs on a small subset of the production servers, and track how the users who receive them behave. Does one design lead to more clicks on a search result, or to more product sales? This approach has many advantages to laboratory studies: The data are easy to collect and analyze; the data’s validity — real-world by definition — is hard to deny, as long as you have enough of it; it’s easy to control the real-world experimentation risks by being careful about how many servers display the alternatives. You can use similar techniques on smaller-scale services by conditionalizing the server code responsible for creating the pages; the point is to send the alternatives out to a limited set of real users and collect real data on their performance.

The primary point I hope I’ve made here is that there’s no incompatibility between usability testing and a rational product process. We can think of usability testing as simply another aspect of quality assurance, and can fold it into the product process in much the same way as software QA. By having usability people involved from the beginning, big problems get found earlier, and small problems get found in time. The fear of losing control of a project to a usability person who finds a last-minute interface design problem is therefore a red herring. You can indeed find usability problems that might affect project completion late in the development process, just like you can find system architecture problems. But the way to keep this from happening is to test your system and your ideas during each stage of development, gathering the kind of information you need at each of those stages. That’s why usability testing is a journey, not a destination.

Jim Miller is principal of Miramontes Interactive, an interaction design consultancy. His research interests include Web-based application design, Internet community development, consumer Internet appliances, intelligent interfaces, and usability evaluation methods. Miller received a PhD in psychology from UCLA. He is a member and past chair of SIGCHI, the ACM special interest group on human-computer interaction. Contact him at jmiller@miramontes.com.

For further reading

I’ve only been able to scratch the surface of usability evaluation here. The following books will take you further.

Hugh Beyer and Karen Holtzblatt, Contextual Design: A Customer-Centered Approach to Systems Designs, Morgan Kaufmann, 1988.

Deborah Hix and H. Rex. Hartson, Developing User Interfaces: Ensuring Usability Through Product & Process, John Wiley & Sons, 1993.

Deborah J. Mayhew, Usability Engineering Lifecycle: A Practitioner's Handbook for User Interface Design, Morgan Kaufmann Publishers, 1999.

Jakob Nielsen and Robert L. Mack, Usability Inspection Methods, John Wiley & Sons, 1994.

IEEE Internet Computing, November/December 2006: Usability testing: A journey, not a destination