Collaborative, programmable intelligent agents
We began our research on intelligent agents with the same romantic imagery that has fueled interest in agents from the beginning: Robbie the Robot from Forbidden Planet, HAL from 2001: A Space Odyssey, the Star Trek computers (including Data). The images they conveyed of an intelligent machine and the ways that we would interact with it — by talking as you would to an old friend who knew you so well it could finish your sentences for you — were so attractive that a whole generation imprinted on them. They became, at one level or another, the models for the intelligent interfaces community that emerged out of the artificial intelligence research in the 1980's, if not for much of the earlier work on AI itself.
But building anything approximating real intelligence into a computer has proven to be a painfully difficult task, and the powers of Robbie and HAL have remained beyond our grasp. We need to step back a bit, think carefully about what people and computers are each good at, understand how they can complement each other, and where we as system designers might be able to do some good.
The goal of our research on intelligent agents was to create something useful for our customers, but something with that sprinkling of pixie dust that would make it seem "intelligent." Shneiderman  observed that claims about intelligent software agents are vague, dreamy and unrealized. We started from a simple but focused approach to agents, that they should have the ability to infer appropriate high-level goals from user actions and requests, and take action to achieve those goals. Further, based on a study of reference librarians as exemplary human agents  we wanted to build a system in which the user would not have to state goals explicitly and in detail — we learned from librarians that a large part of the value they provide to clients is in working with imprecise requests. Beyond this, our general design strategy was to keep the user's question in front of us at all times: Will this software do something useful for me, in an intelligent way that makes me more productive? The system we describe here — Apple Data Detectors — meets our criteria of being unobtrusive, having the ability to infer user needs, and doing useful work. Apple Data Detectors will ship as a product in 1997.
Past work on intelligent agents has been multi-faceted, to the point where it is difficult to find consensus on exactly what constitutes an agent. Researchers have used machine learning techniques to track user actions and construct models of user preferences [13, 21], created agents that employ user models, consulting a set of parameters that describe the user , implemented planning systems to make the leap from a user's stated intention to the specific actions that are required to achieve that intention , and built agents that act as "eager assistants" . The locality of agents also varies across different agent-based systems: some act only within one's own machine, while others autonomously crawl the Web, searching for interesting content , for example. Apple Data Detectors works on the user's own machine, and falls into the eager assistant category, enabling rapid user action with minimal input on the part of the user.
Apple Data Detectors
The target: Working with information inside user documents
Our first step in designing intelligent software agents was to find a user problem that needed solving, and one where intelligent agents would add value. In an investigation of how people file information on their computer-based "desktops" , we discovered that a common complaint of users is that they cannot easily take action on the structured information found in everyday documents. By structured information we mean data recognizable by a grammar. Ordinary documents are full of such structured information: phone numbers, fax numbers, street addresses, email addresses, email signatures, abstracts, tables of contents, lists of references, tables, figures, captions, meeting announcements, Web addresses, and so forth. In addition, there are countless domain-specific structures such as ISBN numbers, stock symbols, chemical structures, mathematical equations and so forth. These structures are not only relevant to users, but are also recognizable by present day parsing technologies. The type of a structure can be used to identify appropriate actions that might be carried out on the structure — place a meeting on a calendar, add an address to an address book, dial a phone number, open a URL, find the current price of a stock, file an ISBN number, compile a list of abstracts, and so forth. The system we developed to enable people to work more fluidly with structured information is called Apple Data Detectors.
Apple Data Detectors supports a wide range of uses. Think of all the structured information in the documents that you work with: in addition to the ones already mentioned, there are bibliography items, forms (such as travel expense forms, non-disclosure agreements), executive summaries — and most important, many domain-specific kinds of data such as legal boilerplate, customer orders, library search requests, and so forth, for which specific detectors can be created.
To use Apple Data Detectors, the user selects a region of a document that has some information of interest (as in the examples mentioned above). Pressing a modifier key and the mouse button instructs Apple Data Detectors to analyze the data within the selected region and find all structures for which it has grammars. It then offers the user appropriate actions for each structure (see Figure 1). For example, if the user is reading email and comes across a seminar announcement that he or she would like to put on a calendar, Apple Data Detectors parses the relevant information within the selected text, including the meeting's meeting room, time, and date, and puts these data in the appropriate fields on the calendar.
The user can select a whole document or part of a document; he or she does not have to make a careful selection; the grammars will find any embedded structures they know about within the selection and make an appropriate offering of actions.
The use of anthropomorphism in an agent interface which comes up in so many discussions of agents  was incongruent with our goal of unobtrusiveness. We designed Apple Data Detectors to be invisible until needed (the butler who is only there exactly when you want him philosophy). Thus there is nothing like a "swiveling eyeball" "watching" the user as in the Selection Recognition Agent , or a character as in Microsoft's Bob. Apple Data Detectors act behind the scenes, emerging when summoned.
Architecture and implementation
Apple Data Detectors is an open extensible system that allows for the recognition and parsing of complex structures. Our recognition technology is a hybrid system that utilizes Earley's algorithm  and deterministic finite automata . The algorithms permit the recognition of not only simple structures such as pre-defined strings and "atomic" patterns like URLs and e-mail addresses, but also complex "composite structures" such as meeting announcements which are composed of smaller more atomic structures like date, time, and meeting room. The idea of detecting structures is not new  but our work is unique in providing an open system for creating complex new structures and actions. It is not difficult to hardcode a recognizer for an atomic structure such as a URL, but substantial work is required to craft an architecture that opens up the process of creating complex structures.
To make Apple Data Detectors work, grammars (which we call "detectors") and action scripts must be written (Figure 2). At the current time, this is a task for programmers, though one made as easy as possible through the use of a special-purpose editor and AppleScript . We are committed to providing facilities for end users to create their own detectors and actions and have begun work in this area. While Apple and third party developers will provide many detectors and actions, it is clear that enabling end users to write their own detectors and actions will make Apple Data Detectors much more powerful and useful by providing domain specific programming capability, appropriate for the specific needs of specific users .
Detectors parse the selected text, according to the grammars associated with them (Figure 3). For each structure found by a detector, a data record is produced that describes the structure. This record can then be passed to an action script (see Appendix 1) for execution, in much the same way that a subroutine is invoked with a specific set of parameters. These parameters, of course, depend on the kind of structure that was found by a given detector. Detectors for strings and atomic patterns typically create a record containing just the structure that was found — the name of a conference room or an e-mail address. Detectors for complex patterns, such as a meeting announcement, produce records containing each of the components that played a part in the recognition of the pattern. Note that a detector can have an action associated with it and also play a role in a more complex detector. For instance, a conference room detector could have a "Show on map" location that showed where that room was located, and also play a part in the definition of the more complex meeting detector.
End users can also specify simple detectors in the form of lists of strings such as lists of conference rooms, group members, customers, and so forth. These can then be used in other detectors, increasing the personalization of the system. For example, the meeting detector might refer to the conference room detector, which could be written as a list of strings by an end user or a local systems administrator. The meeting detector could then be used in a large number of organizations, without requiring that the meeting detector developer know the name of every conference room in every organization that might use it.
The Apple Data Detectors architecture separates detectors and actions so that more than one action can exist for any detector (without having to duplicate the detector for each action). Hence, a detector written by one person to support one task can be used by another detector for another task. Detectors can thus be easily reused and shared.
In addition to the general usefulness of being able to acquire new detectors through sharing, detectors must be shareable for compatibility. The "place meeting on electronic calendar" action might expect the fields "StartDate," "StartTime," and "EndTime" from the "Meeting" detector. If a new definition of "Meeting" is installed that does not export these three fields, the actions currently installed would not work properly.
The solution for shared, compatible detectors requires developers to register their detectors. Our registry is supported by Component Integration Labs, a company responsible for maintaining the Apple Event Suites (among other standards). Apple Data Detectors will make use of the registry which contains definitions for classes of data objects and for events that operate on the objects. Classes of detectors will be defined as data objects and developers must write detectors that detect required fields of the detector class. Detectors can detect more information than the class requires, but they must detect at least the data that the class requires.
Another way to look at Apple Data Detectors is that it inserts a new kind of programming capability right into the middle of user interaction. Unlike conventional scripting, Apple Data Detectors is working hand-in-glove with the data in the user's open applications, and interactively with the user as he or she works. The job of supplying data to the parameters to the scripts is taken care of simply by the user making a selection that contains the data — without the user having to leave the application, type anything in, understand order of parameters, or any other low level programming details necessary when using conventional scripting.
Unlike systems based on predefined recognizer/action pairs such as Pandit and Kalbag's Selection Recognition Agent , the Apple Data Detectors scripting capability allows any set of arbitrary actions to execute when the user selects a particular action (see Figure 3). Scripting is not limited to the parameters of a command line interface to the application; instead it and can do anything that can be expressed in the scripting language, including the manipulation of data structures inside the application, if the application's scripting model makes that possible.
Because Apple Data Detectors is a general-purpose programmable engine (not merely a collection of specific detectors and actions), it has the potential to transform the way users work. Without the need to modify data by changing them into objects, or database-readable data, or any other complex format, a powerful new capability is introduced into the system. Any Macintosh application can provide information to Apple Data Detectors, and any scriptable application can respond to Apple Data Detector actions, without having to change in any way. The capability to work within existing documents provides immediate user value and leverages the data the user is already working with. Of course structures in other formats such as objects (in the object-oriented sense) are amenable to the parsing and scripting capabilities in Apple Data Detectors; it is a flexible technology that can evolve and grow with changes in data formats.
Apple Data Detectors assumes a world of heterogeneous data in the user's machine — the data come from different applications, in different file formats. For example, it makes sense to put an address in an address book whether the address comes from a message sent in any one of several mail programs, or appears in a downloaded document, or is in another address book the user is maintaining. Apple Data Detectors is a pervasive technology, giving the user access to actions appropriate for the data in an entire set of documents.
Collaborative vs. autonomous agents
The current user interface of Apple Data Detectors in which the user selects data and actions results in a collaborative agent. The user participates by signaling that structures of interest occur in a document and by verifying that an action is appropriate by selecting it from the menu (Figure 1). Apple Data Detectors participates by recognizing structures, offering appropriate actions, sending data to the target application, opening the target application, and performing any other actions specified in the action scripts.
Apple Data Detectors therefore meets our criterion of agency: the ability to infer appropriate high-level goals from user actions and requests, and take appropriate action to achieve those goals. These capabilities are realized in a collaboration between the user and the agent. As we observed earlier, one of our guiding design principles was to let the computer do what it is good at and let people do what they are good at. When the user invokes Apple Data Detectors on a region of text in a document, he or she is saying, in effect, "Find the important stuff in here and help me do reasonable things to it." The user can be imprecise, throwing Data Detectors a broad hint that there's something of interest, then Data Detectors uses its knowledge of the structures and actions of interest to the user to "do the right thing." Users work on their tasks in terms of high-level goals, e.g., put this address in my address book — not at the level of opening folders, clicking on icons, cutting and pasting. Direct manipulation — at this level — is a wasteful, frustrating way for users to interact with machines that are capable of showing more intelligence.
The need to choose a particular action is actually an artifact of Apple Data Detectors' ability to find more than one structure in a selection and the need to offer more than one action for the same structure (e.g., open URL, or place URL in hotlist). But this is of benefit to users in terms of their being able to choose the structure that will be operated upon, and to choose the action that will be carried out — users remain in control of their work with the computer at all times.
Developer and user interest
The response of users and application developers to Apple Data Detectors has been extremely positive. The code required to implement detectors and actions is seen by developers as small enough that they are eager to link their applications into the technology. Users see it as a valuable way to rid themselves of annoying bits of detailed interaction. At the same time, our users and developers are surprising us with new uses for Apple Data Detectors. For instance, one customer considered how scientific analyses might be started on a Macintosh using Apple Data Detectors to find structures of interest, with subsequent stages of analysis carried out by opening a network connection and passing the detected information to a specialized application running on a Unix workstation. This example, and many others, emphasize the importance of the scripting layer in Data Detectors, and the need for actions to be able to do more than simply launch applications in response to the detection of a data type.
Making it real
Apple Data Detectors started out as a project in the Apple Research Laboratories (formerly the Advanced Technology Group), where the initial prototypes were designed and developed. As a research group, Apple Labs has no facilities for shipping products; hence, commercialization of the idea was only possible by working with the product development side of Apple. It is worth reflecting on how this happened, as some interesting perspectives on technology transfer follow.
First, Apple Data Detectors offered a very crisp statement of value to the product group. The capabilities of the system were easy to convey, and its value was easy to demonstrate, even with a limited demonstration system. The engineering required to implement the system was also easy to estimate, partly based on the properties of early demo systems and partly on the overall architecture of the system. There was nothing in Apple Data Detectors that would require unrealistically large amounts of memory or processing time, nor would the system impose unrealistic demands on Apple's application developer community. Hence, Apple Data Detectors was the right "grain size" for a successful technology transfer: large enough that significant value would be brought to Apple customers, but small enough that the implications on the operating system were limited and manageable. As a result, the question about Apple Data Detectors was not whether there was a viable product here, but rather what form that product would take.
We were fortunate that the technology transfer question took this form from the beginning — that we were able to start working with the product groups on the product implications of the technology from a very early point in the technology's life. The transition from technology to product is a long one, and exploring possibilities that ultimately turn out to be blind alleys is inevitable. In our case, something of a dance between research and product groups resulted . We both explored, from our own perspectives, opportunities for the technology; each group revised its understanding and implementation of the technology as these explorations advanced. From the perspective of the research group, we began to see how the product group thought of Apple Data Detectors, and we could adapt our own development in response: incorporating good ideas coming from the other side, and pushing back when misinterpretations of the work occurred. Meanwhile, however, both groups were working from the assumption that some sort of product would emerge. The product group agreed that we should take free rein in defining what the technology would be and they could look for opportunities for this technology while not actively investing engineering resources in development, knowing that our work would ultimately become available to them. That is, we were not in intellectual competition with our own product group. Overall, the process that emerged here meant that we were able to avoid the "not invented here" problem that often plagues technology transfer activities.
The major challenges we encountered in our productization work lay in the changes to the system architecture and the user experience that Apple Data Detectors would impose on the Macintosh. As the long-term owner of both of these aspects of Apple's work, the product group needed to insure a smooth integration of this new technology into the existing system, and it was clear from the beginning that our success in moving Apple Data Detectors into product was dependent on doing this well.
Again, our early contact with the product group served us well here. During the development of Apple Data Detectors, we experimented with a number of different user interfaces to the basic Apple Data Detectors technology. This paid off for us in a number of ways. The different interfaces made different assumptions about various aspects of the system: the interface techniques themselves, the demands the techniques would place on the underlying operating system, the demands imposed on developers who wanted to adapt Apple Data Detectors to their own uses, and so on. It was important for us to explore and come to understand this space, so we could advise the product group on the different possibilities that the technology offered.
Agents of alienation?
Jaron Lanier has written an interesting and thought-provoking diatribe against intelligent agents . He asserts that the very idea of intelligent agents is "both wrong and evil." He argues that while very few intelligent software agents have shipped (Bob and the Apple Newton being two of this rare breed), the idea of agents is still harmful, that it leads to alienation. Lanier finds the idea that the computer has its own intelligence threatening and undermining of human values.
Lanier argues that agents must necessarily force users to "change yourself to make the agent look smart" and that users will thereby "diminish" themselves. But we do not see that users in any way diminish themselves with agents such as Apple Data Detectors. They are simply interacting with a program that recognizes data that it has been instructed to recognize, taking actions that it has been instructed to take. There is less effort needed to make these everyday actions happen, just as there is when you ask a secretary to, "Fax this to Sally right away." You don't have to say what you mean by "right away" or "this" or "Sally" or specify Sally's fax number or instruct the secretary to fill out the top sheet of the fax. That's all understood.
We can imagine users of Apple Data Detectors changing their behavior to take advantage of Data Detectors' capabilities, without any pursuant "evil." It is easier to write grammars when the data are regular and structured. Parsing is easier and faster when the data are regular. For example, it may be that in some environments groups of users will establish form-like templates for announcing meetings, so that Data Detectors can be used with great speed and accuracy. This doesn't compromise or diminish anyone's humanity; it is a natural co-evolution of tools and people. Users themselves will decide to change their own behavior if they perceive sufficient value in doing so. There is nothing coercive inherent in agent software. (Virtual reality, Lanier's specialty, could arguably be far more potentially alienating and coercive in seducing the user into an all-encompassing reality of someone else's design.)
Our prediction about intelligent software agents is that the best of them will be malleable, programmable tools that empower, rather than diminish users, giving them control over tasks necessary for everyday life. Collaborative, programmable agents enable users to get more value out of the data they are already using, for tasks they are already doing, with less effort, at a higher level of goal specification. Such agents will also provide new functionality as people discover new uses for them. Apple Data Detectors is designed to support an evolutionary process in which users shape their own tools to the greatest extent possible. Lanier asked an important question, " How does [information technology] affect our definition of what a person is?" Technologies such as Apple Data Detectors assume that people are molders and shapers of their everyday tools, controlling their own environments through their tools.
The opportunities offered by Apple Data Detectors do not stop with the current implementation. Apple Data Detectors offers a first step towards extracting semantics from everyday documents without asking users to create documents in new ways. Intelligent agents such as Apple Data Detectors redefine "document" from a stream of characters to a data structure containing specific, known kinds of structures that can play specific, known roles in users' interactions. Such an approach can provide a foundation for more powerful analyses going beyond our current recognition and parsing technology. Our future work will explore the use of more sophisticated kinds of recognition and parsing such as those that rely on advances in finite state technology  and linguistically-informed context analysis  as well as integration with statistical techniques of data analysis such as relevance-based techniques .
An obvious future step for our research will be to build knowledge into the system about the structures we're recognizing and how those structures are related to user goals and tasks. Doing so will provide the basis for more flexible and powerful task support. For instance, if we can attribute some reasonable set of "e-mail address" semantics to the textual presentation of an e-mail address in a document, the system can use an e-mail address as a pointer to the person who possesses that e-mail address. We can then carry out system actions intended for people (e.g., "Place a phone call to this person") on an e-mail address, and let the system figure out the person implied by the e-mail address. (It is possible to do this now with Apple Data Detectors but requires writing a script, rather than relying on inferencing as we are suggesting here.) Such an interaction might require a different user interface than that in use at present, and this, of course, is a further area for future work: one can imagine many different kinds of user interfaces to the basic structure detection technology that underlies the current system.
Our next immediate goal is to complete a prototype of an end user programming facility that will enable end users to program detectors and actions, opening up full Apple Data Detectors capability up to all our users. Apple Data Detectors is, after all, a technology that is meant to empower end users, and it is our intent to enable people to play as active a role in the creation of their tools as in their use of them.
Many thanks to Bran Boguraev, Tom Bonura, Eric Hulteen and Alan Samuel for their comments on earlier drafts. Errors and omissions are our own.
1. Aho, A., Sethi, R., Ullman, J. Compilers — Principles, Techniques, and Tools. Reading, Mass., Addison-Wesley Publishing Company (1988).
2. Apple Computer Inc. Inside Macintosh Addison-Wesley, Reading, Mass., (1993).
3. Barreau, D. and Nardi, B. Finding and reminding: File organization from the desktop. SIGCHI Bulletin (July 1995).
4. Bennett, J. Building relationships for technology transfer. Communications of the ACM, 39, 9 (Sept. 1996), 35-36.
5. Benyon, D., and Murray, D. Developing adaptive systems to fit individual aptitudes. Proceedings of the 1993 International Workshop on Intelligent User Interfaces. Orlando, Florida. (1993).
6. Boguraev, B., Garigliano, R. and Tait, J., eds. Natural Language Engineering 1, 1 (March, 1995).
7. Cohen, P., Cheyer, A., Wang, M. and Baeg, S. An open agent architecture. O. Etzioni, editor, In Proceedings of the AAAI Spring Symposium Series on Software Agents, (Stanford, California, March 1994), American Association for Artificial Intelligence, p. 1 — 8.
8. Cypher, A. EAGER: Programming repetitive tasks by example. Proceedings CHI 91 Conference. (1991). ACM Press, New York, pp. 33-39.
9. Earley, J. An efficient context-free parsing algorithm. Communications of the ACM 6, 8 (February 1970), 94-102.
10. Etzioni, O. and Weld, D. A Softbot-based interface. Communications of the ACM 37, 7 (July 1994), 72-76.
11. Kennedy, C. and Boguraev, B. Anaphora in a wider context: Tracking discourse referents. Proceedings 12th European Conference on Artificial Intelligence (Budapest, 11-16 August, 1996), pp. 582-586.
12. Lanier, J. Agents of alienation. ACM interactions 2, 3 (1995), 66-72.
13. Maes, P. Agents that reduce work and information overload. Communications of the ACM 37, 7 (July 1994), 31-40.
14. Nardi, B. A Small Matter of Programming: Perspectives on End User Computing. MIT Press, Cambridge, Mass., (1993).
15. Nardi, B and O'Day, V. Intelligent agents: What we learned at the library. Libri 46, 3, 59-88 (September 1996).
16. Pandit, M. and Kalbag, S. The selection recognition agent: Instant access to relevant information and operations. Proceedings of Intelligent User Interfaces '97. (1997). ACM Press, New York.
17. Rose, D. and Stevens, C. V-Twin: A Lightweight Engine for Interactive Use. Proceedings of the Fifth Text REtrieval Conference (TREC-5). National Institute of Standards and Technology, (Nov. 20-22, 1996, Gaithersburg, MD).
18. Rus, D. and Subramanian, D. Multi-media RISSC Informatics. Proceedings of the 2nd International Conference on Information and Knowledge Management. (1993, Washington, DC.). ACM Press, New York, pp. 283-294.
19. Schneider, D. The Tao of AppleScript. Indianapolis, Indiana, Hayden Books, I1994).
20. Shneiderman, B. Looking for the bright side of user interface agents. interactions 13-15 (January 1995).
21. Schlimmer, J. and Hermens, L. Software agents: Completing patterns and constructing interfaces. J. of AI Research, 1, (1993) 61-89.
The following script — addressLetterTo — demonstrates the generality of Data Detectors' use of a scripting language and external applications as both information repositories and end-user tools. This script can be activated when Data Detectors detects a telephone number. When activated, it generates a piece of word processor letterhead addressed to the person possessing that telephone number, with appropriate date and salutation information. Two applications are used by this script. First, a "personal information manager" (Now Contact 3.5) is opened and used as a database. The first part of the script in effect asks Now Contact to return the name and address information corresponding to the person with the provided phone number. The person's name and address are saved as strings (thePerson and theAddress) for future use. Then, the script opens an empty word processor document (via Corel WordPerfect) and writes the date, name, address, and salutation information into the document, leaving the user ready to write the letter. Several things should be noted about this script:
- The personal information manager is used by the script as a source of information; it is not intended to be used directly by the user.
- Both applications take advantage of AppleScript's ability to extend the language in ways that make sense for their needs: scripts written for Now Contact can refer to the work phone of a person, and scripts written for Corel WordPerfect can refer to the paragraph at the beginning of the document. This at least eases the process of writing these scripts, and, in many cases, allows the scripts to control the application in ways that would not be possible if such extensions did not exist.
- This use of the personal information manager is possible because it returns to the script a complex record of information about the user (in the result of the statement "set thePerson to the first person whose (work phone is phoneNumber)". This is a much more powerful use of scripting and applications than is launching an application in response to seeing a particular datatype, as in the Selection Recognition Agent (1996) .
This combination of scripting, applications, and structure detection by Data Detectors results in a significant increase in the support we are able to provide to users in the accomplishment of their tasks. This is done while keeping users fully in control of their interaction with the computer, and without imposing a significant development burden on the creators of the applications or the scripts.
(Error-handling code, such as watching for the condition when no person has the specified phone number, has been omitted from this example script in the interest of brevity.)
--addressLetterTo: Given a phone number, find the person with that --phone number and address a letter to him/her. on addressLetterTo(phoneNumber) -- Open Now Contact and find a person with the indicated phone number tell application "Now Contact 3.5" --find the person with the supplied e-mail address set thePerson to the first person whose (work phone is phoneNumber) --get the address information for this person set firstAndLastName to (the first name of thePerson) & " " & (the last name of thePerson) set theAddress to firstAndLastName & return & (the company of thePerson) & return & (the work address of thePerson) & return & (the work city of thePerson) & ", " & (the work state of the Person) & " " & (the work zip of thePerson) & return & return end tell --Open WordPerfect and write the address information into a new document tell application "Corel WordPerfect" --open an empty piece of WordPerfect stationary open (path to system folder as string) & "DD template" --get today's date set theDate to (the current date as string) & return & return --get the salutation: something like "Dear John," set theHeader to "Dear " & the first word of firstAndLastName & ", " & return & return --write this as a new paragraph at the beginning of the document make paragraph at the beginning of the document with data theDate & theAddress & theHeader end tell end addressLetterTo