Building Expert Systems

This module explores the development of expert systems (ES). Much of the material contained in this module is summarized from:
Jones, D.D. and J.R. Barrett. 1989. Building expert systems. In J.R. Barrett and D.D. Jones (eds). Knowledge Engineering in Agriculture. ASAE Monograph No. 8, ASAE, St. Joseph, MI.

When to Use Expert Systems

ES are not suited to all types of problems. Initially, many developers actively sought problems amenable to ES solution or tried to solve all problems encountered using ES. As experience has been gained, attention has become more properly focused on the problems to be solved rather than on the solution technique. Note that in this course, we are focusing on systems engineering techniques and tools and thus have been and will be quite concerned about the solution technique.

Some problems can be described using existing algorithms, or by using a statistical evaluation method. Other problems, that are not as well defined, that are ill-structured and that currently require the help of a human expert, may appropriately be solved using an ES. In effect, the techniques are rapidly becoming, along with simulation and other conventional programming, important tools available to solve a wide range of problems. Incompleteness of information is characteristic of problems suitable for solution with ES.

The "telephone test" can often be used to help determine if a problem that cannot readily be solved using traditional methods is amenable to ES solution. If the domain expert can solve the problem via a telephone exchange with the end-user, an ES program can probably be developed to solve the problem. On the otherhand, if the user is unable to describe the problem verbally, or if the expert is unable based on the telephone interview consistently to conclude a reasonable solution, then ES development will likely be unsatisfactory. The telephone test assures that the expert is not gaining additional information about a problem from other senses and insures that the user is able to adequately describe the problem in words (important since the user of an ES will be required to describe the problem adequately).

ES Development Stages

If an ES solution is appropriate, one should approach the development in a systematic fashion much like the systems methodology steps and the model development steps examined earlier in the semester. The process is largely one of refinement and expansion of a prototype. The knowledge base increases in both depth and breadth with organizational and representational improvements while helping guide successive stages of development. The prototype becomes the basis for further development, whether it is refined or discarded and the process restarted. It helps identify approaches that have the most merit and others that should be discarded. These decisions can be made early, minimizing the cost of development.

Rapid prototyping provides a glimpse of what the completed product will be like. It is important to communicate and follow progress in any project, not only for funding agencies and supervisors, but also for the domain expert who is interested in making best use of valuable time. Prototypes should be documentable indicators of progress. This is a primary strength of ES in comparison to conventional programming approaches.

Several general approaches for developing ES have been proposed. Waterman (1986) has provided the most widely accepted approach:

Identification
Conceptualization
Formalization
Implementation
Testing

These stages are highly interrelated and interdependent. An iterative process continues until the software consistently performs at an acceptable level. Note that the above steps are essentially those of model development and methodology of systems analysis.

Identification

Identification is the requirements analysis step carried out in traditional software development. It involves a formal task analysis to determine the external requirements, form of the input and output, setting where the program will be used and determines the user (Very important!). The participants, the problems, the objectives, the resources, the costs and the time frame need to be clearly identified at this stage.

The participants are the group sponsoring the effort, the domain expert and the knowledge engineer. Choosing an appropriate domain expert is essential to the success of the project. The domain expert should be a legitimate authority in the subject matter area, as software must possess high quality knowledge, and this person must have time and interest to commit to the project. Not only is a personal commitment by the expert required, but the administrative support of the employer is needed to relieve the person of some existing duties. Development is not trivial, especially when attempted on top of an already full-time job assignment.

Although use of human domain experts is the typical method of development, it should be noted that several successful programs have been produced using reference materials only, or with minimal involvement of a human domain expert. This may seem to contradict the more or less accepted ES definition, however, these programs do make use of programming techniques such as backward chaining to find values for program parameters, explanation of program logic, etc. It is subjective whether a first-hand expert per se must exist, or whether an interpreter of knowledge can suffice. Programs that do not rely on an expert are commonly referred to as knowledge-based systems, knowledge systems or rule-based systems.

Most of these interpretations are as database queries, used to enhance the finding of relevant information in a thick reference manual such as of weed or chemical information, or perhaps to locate specific information in a large diagnostics manual. It can be argued that ample human expertise was involved, not only in preparing the initial reference material, but also by a programmer knowledgeable in the subject matter area in converting the information into the format and sequence needed to solve the problem or answer the question. This blurring of definition of the traditional ES development process will continue as knowledge engineering techniques pioneered by AI researchers are incorporated alongside conventional programming languages into programs and database management software.

To justify the time and cost of development, the problem must be important to a funding organization and be clearly defined. Although a developer can't ignore interactions between the problem and the rest of the subject matter domain, efforts should be made to limit the problem domain so that the recommendations of the program will be specific and valuable instead of generally educational. Choosing depth over breadth not only makes the program more powerful and useful, but also more efficient by minimizing the amount of information that must be obtained from the user before a recommendation can be made. For example, it would be more efficient for a user who has a problem with soybean pests to run a program dealing with pests in soybeans, rather than a program dealing with soybean production in general, being forced through a lengthy series of questions, or menus, before finally arriving at a subset of the program that deals with pests.

Specific goals or purposes of the software must be accepted by all parties. Objectives need to be more than problem solving. It is essential to carefully consider the background and needs of the end-user.

As important and as obvious as a properly designed user interface may seem to be, it is often neglected. Often the struggle to complete the knowledge base is so difficult and time consuming, developers have little energy left for the user interface.

Funding and time are major resources to be considered. Additional resources to be identified include the knowledge sources, computer hardware and development software. As with all programming projects, these estimates are difficult, but they must be realistic. Budgeted costs should include the cost of lost productivity by the expert and the programmer who will be devoting time to the effort and the ongoing cost of maintaining the knowledge base. By the same token, the expected benefits must include an estimate of the savings of valuable time in future years.

Some estimate of the useful life of the program should be made. Additional questions include how frequently the expertise will be needed, the cost and availability of alternate methods of solving the problem and the likely acceptability in the workplace. A realistic appraisal of the costs and benefits can help establish the level of program detail that can be justified.

The hardware available for delivery can greatly affect the choice of computer used for development, since the developer must determine the extent of help messages, graphics, the form of question asked, the extent and format of output and the need to interact with other programs and databases. Many troubleshooting and classification problems require input based on results of sensory examination (visual, smell, feel, etc) of an environment.

High resolution color graphics should be especially useful in agriculture troubleshooting or classification applications. High quality, inexpensive PC graphics as well as high resolution color scanners and video capture devices should be used where advantageous to reduce potential confusion on the user's part in answering questions posed by the program or in interpreting program output. The less experienced the end-user is with computer hardware and software, the more effort must be taken in the deign of the user to machine interface. ES have the added advantage of being more transparent (program flow can be presented to the user on demand) than conventional programs, an ability that should be exploited if the user is likely to be skeptical of "black box" computer output.

Conceptualization

The second stage of ES development, conceptualization, involves designing the proposed program to ensure that specific interactions and relationships in the problem domain are understood and defined. The key concepts, relationships between objects and processes and control mechanisms are determined. This is the initial stage of knowledge acquisition. It involves the specific characterization of the situation and determines the expertise needed for the solution of problem.

The following questions may be used by the knowledge engineer to help understand what the expert does:

Exactly what decisions does the expert make?
What are the decision outcomes?
Which outcomes require greater reflection, exploration or interaction?
What resources or inputs are required to reach a decision?
What conditions are present when a particular outcome is decided?
How consistently do these conditions predict a given outcome?
At what point after exposure to influential inputs is a decision made?
Given the particulars of a specific case, will the outcome predictions of the knowledge engineering team be consistent with those of the expert?

One of several or combinations of several knowledge acquisition methods are used. Additional details are provided in the Knowledge Acquisition module.

A typical approach would be to characterize the questions the end-user might pose to the domain expert and the range of possible solutions. One method of getting started is to begin with a range of final recommendations, and then build pathways to these. For example, in ES development to troubleshoot environmental problems in animal production facilities (simplified for the example), the top level of programming might involve the following typical symptoms and recommendations:

animals too cold == > add insulation and/or space heater
high humidity == > add space heater and/or increase ventilation rate
animals too hot == > increase ventilation and/or add insulation and/or decrease animal density

The development process beyond this point is mainly one of refinement and addition of detail once this top level is in place. For instance, in number one above, additional information would be added to help determine whether the hypothesis "animals too cold" is true. This is not as simple as it might seem on the surface, since the temperature of the building alone is not an accurate index of animal comfort. Other considerations include whether the floor is dry and well bedded, the flooring material in use, whether the building is drafty, where in the pen the animals tend to stay, whether all animals in the building have similar symptoms or if the problem is an isolated occurrence, whether animals are stretched out or huddled next to one another, if their hair is laid back or on end, or if they are noticeably shivering.

Additionally, greater detail is needed to determine a specific remedy. The final recommendation in item number one will depend on the answers to questions that prove or disprove the hypothesis that the animals are too cold, and if they are cold, what is the cause. For example, if it is established that there are low insulation levels in the building, final recommendations will depend on the type and age of animal housed, climatic conditions in summer and winter for the building location, whether the animals will be in physical contact with the wall containing the insulation material, and on state and local building regulations and fire codes. Similarly, the type of heater recommended depends on the type and age of animals housed, the type and condition of building, local regulations, type and cost of fuel available, climatic conditions, type of ventilation system used, etc. As can be seen, the knowledge base evolves during this refining process to provide a recommendation as accurate as that made by the human expert.

The job of the knowledge engineer is to identify the knowledge sources required by the domain expert when making a specific recommendation, i.e. determine the reference books to be consulted, calculations to be made (or other computer programs executed) and what rules-of-thumb (heuristics) come into play. Information the user will likely not know should be determined and represented by additional rules or other knowledge structures. Additional information needed to apply these rules can then be obtained from the user or additional rules created. This structure is typically created through frequent and intensive interview sessions with the domain expert.

Opportunities to group, rank and order knowledge should be sought. In the ventilation problem for example, once the expert knows that the housed animal is farrowing or nursing, he automatically discards large portions of the knowledge base dealing with larger animals, thus narrowing the search space. Often, the expert is presented with 3-5 potential problem scenarios at each interview session with the knowledge engineer who poses as the end-user, perhaps as an inquisitive user who continually asks the expert the purpose of his question and detailed justification of his answers. This is somewhat like a persistent child asking Why?

The information that is collected an analyzed forms the basis of the scenarios to be presented in the next session with the expert. Correctly and completely describing the expert's problem solving logic is difficult because true experts usually do not know exactly how they reach a decision and are therefore, often unable effectively to verbalize their own problem solving process. The careful study of detailed cases often reveals consistent patterns in the solution process that are still obscure. Needed refinements to the concepts and relationships will become apparent during in-depth analyses. In addition, tape recordings of interviews between the expert and clients can be useful, if all parties agree to the taping. This can identify points that might normally be overlooked by a controlled session between the expert and the knowledge engineer acting out the role of the user. It can also help prevent the process from becoming an academic exercise and ensure that the needs of the end-user are met.

It is easy to document everything that is known about a subject and in the process lose sight of the original problem intent. For example, to develop a system to make recommendations on weed control, it is tempting to create a program that specifically identifies the genus of all possible weeds found in a region. This would require extensive amounts of input from the user that may not be necessary. Perhaps the only relevant information is whether the weed is a broadleaf or a grass, whereon one of two herbicide types approved for the specific crop would be recommended.

Several ES development tools have inductive features that allow the creation of rules based on examples created by the expert. Such approaches to development are often useful for classification problems. Neural networks also function in a somewhat similar manner and will be explored in future assignments.

Formalization

Formalization involves organizing the key concepts, subproblems and information flow into formal representations. In effect, the program logic is designed at this stage. It is often useful to group or modularize the knowledge collected, perhaps even attempting to display the problem solving steps graphically.

In effect, it is the job of the knowledge engineers to build a set of interrelated tree structures for representing the knowledge base. They must decide the attributes to be determined to solve the problem and then which of these attributes should be asked of the user or represented by an internal set of decision trees. While decision trees are appealing in their simplicity and are a good way to begin formalizing knowledge into a knowledge representation scheme that can be visualized, things are rarely this simple in practice and rigid adherence to a tree structure is seldom satisfactory.

The representation of knowledge is important for credibility and acceptance by the user. The questions asked and the rules examined should be in the same sequence as used by the human expert. The questions and their order are determined by presenting the expert with several detailed scenarios. The granularity and structure of the concepts, including how the concepts relate into a logical flow and how uncertainties are involved, are coordinated in making recommendations.

The problem domain is analyzed to uncover obscure behavioral and mathematical models that may exist within the decision making process. The characteristics of the information needed are recognized. It follows that as the uncertainties are defined and explained, the relationships involved become better understood and ultimately may be explained using conventional programming techniques in a more expedient manner. Correspondingly, the program development process functions as a knowledge gatherer that can be used to explore poorly understood relationships.

It is difficult to separate the conceptualization phase from the formalization phase and, in reality, knowledge-base design proceeds almost in parallel with knowledge acquisition. The two items that are the most important in the formalization stage are: (1) refinement of the knowledge pieces into their specific relationships and hierarchy and (2) more accurate determination of the expected user interaction with the system.

Implementation

During the next stage, implementation, the formalized knowledge is mapped or coded into the framework of the development tool to build a working prototype. The contents of knowledge structures, inference rules and control strategies established in the previous stages are organized into suitable format. Often, knowledge engineers will have been using the program development tool to build a working prototype to document and organize information collected during the formalization stage, so that implementation is completed at this point. If not, the notes from the earlier phases are coded at this time.

Consideration must be given to long-term maintenance. Modifications to the knowledge base over time must be anticipated. The knowledge base should be extensively documented as it is coded. The potential for later misunderstanding and confusion should be minimized wherever possible. Furthermore, extensive justifications and explanations should be included to assist the end-user in fully understanding questions posed to them by the program, so that the user can effectively use the program output, and to show the user, on demand, how the recommendation was logically derived.

The amount of help to be incorporated will depend on the ability of the anticipated user. While a consultant may be interested in quickly obtaining an answer to a question, an ES intended to be used by those who must accomplish the recommendation is different. Typically, to believe the recommendation the end-user needs access to the assumptions underlying the recommendation and desires a credible justification for program recommendations.

This is also the point where the developer must decide how the program will interact with other computer programs and databases. The first generation of ES were stand-alone programs. Many had no facilities to communicate with the operating system or to read from, or write to databases.

Testing

The last stage, testing, involves considerably more than finding and fixing syntax errors. It covers the verification of individual relationships, validation of program performance and evaluation of the utility of the software package. Testing guides reformulation of concepts, redesign of representations and other refinements. Verification and validation must occur during the entire development process. Verification proves that the models within the program are true relationships. It ensures that the knowledge is accurately mimicked by having the domain expert operate the program for all possible contingencies.

Perhaps the most difficult aspect of testing is accurately handling the uncertainty that is incorporated in most ES in one way or another. Certainty factors are one of the most common methods for handling uncertainty. Verification of the certainty factors assigned to the knowledge base is largely a process of trial and error, refining the initial estimates by the domain expert until the program consistently provides recommendations at a level of certainty that satisfies the expert. To ensure program accuracy, all possible solution paths must be painstakingly evaluated.

An effective validation procedure is critical to the success and acceptance of the program. During validation the following areas are of concern: (1) correctness, consistency and completeness of the rules; (2) ability of the control strategy to consider information in the order that corresponds to the problem solving process; (3) appropriateness of information about how conclusions are reached and why certain information is required; and most critical, (4) agreement of the computer program output with the domain expert's corresponding solutions.

How the sequence of questions and output are presented to the end-user may have as much to do with acceptance and use as does the accuracy of the recommendations. The lessons learned from human engineering cannot be ignored if the program is to be successful.

Validation is an ongoing process requiring the output recommendations be accurate for a specific user's case. Validation is enhanced by allowing others to review critically and recommend improvements. A formal project evaluation is helpful to establish whether the system meets the intended original goal. The evaluation process focuses on uncovering problems with the credibility, acceptability and utility. This can be determined from the program accuracy that is determined from comparisons with the real-world environment. Included are the understanding and flexibility of the program, ease of use, adaptability of the design and the correctness of solutions.

Waterman, D.A. 1986. A guide to expert systems. Addison-Wesley Publishing Co., Inc., Reading, MA.