Generalization

1.6 Generalization

Generalization identifies commonalities among a set of entities. The commonality may be of attributes, behavior, or both. For example, a statement such as "All windows have a title" expresses a common attribute among all entities that are considered windows. Similarly, the statement, "All windows can be resized." expresses a common behavior that all windows provide. Generalizations are usually easy to recognize as they contain words like "all" and "every".

Generalization may be defined as:

Generalization: the identification, and possible organization, of common properties of abstractions.

This definition shows that generalization is not abstraction although the two are often confused. Abstraction aims at simplifying the description of an entity while generalization looks for common properties among these abstractions.

Generalizations are clearly important and prevalent in many disciplines of study. In science and mathematics, for example, the statements of "laws" and "theorems" are often generalizations - they state some property that holds over a group of things: the more powerful the generalization, the more things to which the generalization applies. The search for the basic forms of matter represents the physicists' quest for a generalization that applies to everything in the physical universe.

Generalizations are equally important to software. Much of the effort in building software systems is to allow parts of the system to operate in the most general way possible. In some cases this might mean designing the system so that it can handle any number of things of the same kind. For example, the system might be expected to process any number of lines of input. In other cases the major design problem is how to handle things of different kinds or types. For example, the system might be expect to process input that comes from files of different formats or from both local as well as remote locations. An approach to solving some of these problems is to be able to capture more completely in software the idea of generalization.

One of the four forms of generalization is hierarchy. In the case of hierarchy, the commonalities are organized into a tree structured form. At the root of any subtree are found all the attributes and behavior common to all of the descendents of that root. This particular kind of tree structure is referred to as a generalization/specialization hierarchy because the root provides more general properties shared by all its descendents while the descendents typically add specializing properties which make them distinct among their siblings and their siblings' descendents.

The second form of generalization is genericity. In the case of genericity, the commonality is expressed with the aid of a parameter. Various specializations are distinguished by what they provide for the parameter. For example, using genericity it is possible to represent the common properties of a "stack" through the generalization of a "stack of anything", where "anything" represents the parameter. Specialized forms of this generalization are "stack of integers" and "stack of characters".

The third form of generalization is polymorphism. Polymorphism captures commonality in algorithms. An algorithm may have a nested if-then-else (or case statement) logic which tests for the exact type of an object which it is manipulating. the algorithm performs some operations on the object based on the exact type of the object. However, in many algorithm the operations to be performed are the same, only the type of the object on which they are performed varies. Polymorphism allows this nested logic (or case statement) to be collapsed to a single case in which the different object types are treated in a uniform manner. Through a mechanism called dynamic binding, the algorithm allows the object to determine which of its operations to perform in response to the algorithms invocation. Thus, the algorithm need not know the exact type of the object. The algorithm only needs to know that the object can respond to the invocation in some manner.

The fourth form of generalization is patterns. A pattern expresses a general solution (the key components and relationships) to a commonly occurring design problem. The attributes and behavior of the individual components are only partially defined to allow the pattern to be interpreted and applied to a wide range of situations. For example, a "wheeled vehicle" pattern might be defined in terms of the components "wheel", "axle", "frame", "body" and "power source". The pattern would also show how these components would be arranged in relation to each other (e.g., the axle must connect two wheels). The pattern could be interpreted in many different ways to solve particular problems that differ in their requirements for speed, durability, payload, fuel source, available materials, and other factors. Example of the wheeled vehicle pattern are "automobile", "horse-drawn carriage", "ox cart", "moon buggy" and many others.

Hierarchy

A hierarchical organization of components based on a relationship of generalization/specialization is an important device in object-oriented programming. While the power of such an organization can only be fully appreciated after more study, it is useful to at least hint at the role it will play. The figure below shows how the components in a typical graphical user interface system might be organized. The most general component, a Window, has attributes of location (the coordinates denoting where the window appears on the screen) and shape (the height and width), and a behavior that allows the window to be repositioned and resized. Specialized kinds of windows are those that include a pull-down menu bar (Frame), support drawing graphical shapes (Canvas), are interactive (Items), maintain a prescribed layout of Items (Panel), and allow the scrollable display of simple text (TextWindow). Various kinds of interactive components are shown as specialized kinds of Items.

A generalization and its specializations are often said to be related by an "is-a" relationship. In ference to the figure below, the statement "An Item is a Window" suggests why the term "is-a" is used in this way. The "is-a" terminology reflects that the specialization has all of the attributes and behavior of the generalization.

The notion of hierarchy in object-oriented programming can be defined as follows:

Hierarchy: an organization of abstractions into a directed graph in which the arcs denote an "is-a" relation between a more generalized abstraction and the one or more specializations derived from it.

The most common case of hierarchy uses a tree structure to organize the abstractions, although more general organizations are possible.

A Generalization/Specialization Hierarchy

A generalization/specialization hierarchy serves at least four major purposes. First, it provides a form of knowledge representation. A higher (more generalized) level in the hierarchy encodes an understanding of the general attributes and behavior possessed by all of its specialized descendents. Thus, it is possible to make statements such as: "All windows can be resized." and "All windows can be repositioned.". Second, the names of the intermediate levels in the hierarchy provide a vocabulary that can be used among developers and between developers and domain experts (those knowledgeable about the application domain but not necessarily about computing). This vocabulary allows discussions to be less ambiguous because the terms in the vocabulary identify specific, clearly defined concepts. Third, the hierarchy can be extended by adding new specializations at any level. These additions are easier to make because they occur within an existing framework that defines some, perhaps, many of their attributes and behavior. For example, to add a new specialized kind of Item, it is not necessary to redefine all of the attributes and behavior of a Window and an Item, these are assumed to be part of the more generalized nature of the specialized Item being added. Fourth, new attributes and behavior can be added easily to the proper subset of specializations. For example, any new attribute or behavior that might be needed for all Items can be added to Item. These additional attributes or behavior are then "automatically" part of all specialized kinds of Items, but not of anything else.

Genericity

Genericity is a partial generalization that is variously referred to by the terms generic, template, parameterized class, or generic class. The generalization is "partial" because at least some of the properties captured by the generalization are expressed in terms of other properties that are not part of (are missing from) the generalization. For example, a generic BubbleSort class can capture very exactly the bubble sort sorting strategy. A basic step in the bubble sort strategy is the comparison such as

if ( A < B
)...

, where A and B are two elements being sorted. This strategy can be captured perfectly without identifying exactly what is meant by the "<" operator. In different uses the "<" operator may take on different interpretations. For example, in bubble sorting integers a numeric interpretation is applied; in bubble sorting character strings, a lexicographic interpretation is applied; in bubble sorting a user defined type, a user defined interpretation may be applied.

Genericity can be defined as follows:

Genericity: a named generalization expressed in terms of other unspecified abstractions that are denoted by parameters.

Pictorially a generic class can be drawn as a class with a "hole" in it. The hole, also referred to as the "parameter" of the generic class, represents the missing part of the generalization. Filling the hole completes the generalization. The completed generalization is then a fully-formed class from which objects can be instantiated. It is not possible to create objects directly from the generic class because the generic class is incomplete.

A Generic Class

Generic classes may have more than one "hole" or parameter. One common type of data organization where this occurs is a generic LookupTable that maintains an association between a "value" and a "key" by which the value can be value is identified. The fundamental operation of the LookupTable is to return a value given a key. Tables of this form could be used to maintain (key, value) associations such as (name, phone number), (account number, balance), or (name, account number). A generic class of this form is shown below.

A Generic Class

The two parameters of this generic class represent the specific kind of "key", in this case a persons name, and the specific kind of "value", in this case a phone number. This combination of parameters yields a PhoneBook. Other combinations of keys and values would yield other useful forms of LookupTables.

Polymorphism

Polymorphism is a means of generalizing algorithms. The generality is achieved by allowing the algorithm to manipulate uniformly objects of different classed provided that they algorithm uses only common properties of the different classes. Any object known to possess the required common properties may be manipulated by the algorithm. Some object-oriented languages require that the compiler be able to verify at compile-time that an object possesses the required common properties. Other languages allow this verification to be deferred until run-time, risking a possible run-time error if, during execution, the object is discovered to be lacking one ofo the required common properties.

Polymorphism can be defined as follows:

Polymorphism: the ability to manipulate objects of distinct classes using only knowledge of their common properties without regard for their exact class.

The meaning of polymorphism is reflected in the root phrases from which the term is derived. The phrase "poly" denotes "many" or "several" and "morph" refers to "shape", "form",or "appearance". Thus, polymorphism refers to things of many shapes or many forms. In the object-oriented programming sense, the "shape, form, or appearance" is taken to mean the interface or properties of an object. The "poly" aspect implies that the interfaces or properties are different or varied across the objects being considered. The challenge is how to manipulate these varied and different objects in a uniform manner.

Humans use polymorphism in many activities. One example is a store check-out clerk who determnes the total amount of a customer's purchases. Each item being purchased has a bar code label that the clerk scans by passing it over a bar code reader. The bar code reader reads the identifying information on the bar code label, consults a database of merchandise and reports the information to the cash register for totaling. The clerk follows a generalized algorithm which might be written as:



	while ( more storeItems ) {
		pick up next storeItem;
		scan storeItem;
		put scanned storeItem in bag;
	};

This algorithm does not refer to any of the many specific types of merchandise sold by the store. The algorithm only refers to a general "storeItem". The storeItem is only required to have a bar code label so that the item can be scanned. Any item of merchandise having such a label can be handled by the clerk. Conversely, any item not possessing a bar code label cannot be handled by the clerk. In this example, compile-time checking is equivalent to making sure that items hav a bar code label when they are put on the shelf (i.e., when the "store" is "programmmed"). Run-tme checking is equivalent to having the clerk check for the presence of a bar code label and, if the item does not have such a label, the clerk calls a manager to report the problem. There are many every-day examples similar to the store clerk in which people act in a polymorphic way: shelving books in a library, driving different kinds of cars, using different kinds of computers are a few of these.

Polymorphism enables open-ended software. New classes possessing the required common properties can be handled by a polymorphic algorithm, the algorithm need not be changed in any way to accomodate the additional classes. The same is true for the store clerk algorithm. The store clerk does not need any additional training (reprogramming) when new kinds of merchandise are added to the store's inventory.

A mechanism called dynamic binding is needed to implement polymorphism in object-oriented systems. In this context, the term "binding" refers to the association of an invocation made by the polymorphic algorithm with the code of a method implemented by the receiving object. Because the polymorphic algorithm is unaware of the exact class of the receiver, the binding must be done dynamically because the same invocationmay be bound to different methods on different executions depending on what object is presented to the algorithm. The figure below depicts the act of dynamic binding.

Dynamic Binding

In this example the polymorphic algorithm invokes a method F() that is one of the common properties shared among a set of classes. The algorithm is unaware of the exact type of the object to which the invocation is directed. In the example, it is not known whether the object is of class X or class Y. The dynamic binding mechanism determines the type of the object and maps the invocation to the correct method in class X, if the object is of class X, or to the correct method in class Y, if the object is of class Y.

Patterns

A pattern is a generalization of a solution for a commonly occurring kind problem. A pattern is a generalization because the pattern does not give an immediately usable solution to a particular problem but instead gives the general form of a solution for any problem of the particular kind. The user of the pattern must adapt the pattern to the particular case at hand and supply the missing details not given in the pattern. Experienced designers are believed to possess, perhaps even at an intuitive or subconscious level, a rich repertoire of patterns and the ability to recognize when a current problem can be solved by adapting a pattern used successfully as a solution for one or more previous problems. Lacking the collection of previous designs, novice designers are forced to solve each problem that is new to them as if it were a completely unsolved problem, often reinventing what previous generations of designers have already created.

Patterns are recognized at many level of scale and in many disciplines. In computer science, a large scale pattern is often presented as an architecture or a model. Examples of such large scale patterns are the client-server model, layered architecture, and micro-kernel architecture. Small scale patterns in computing are oten called plans or idioms because they represent a common arrangement of programming language constructs. An example of a pattern at this scale is the "counted loop with early exit" plan. This plan might be used to scan an array of fixed length and terminate when the end of the array is reached or earlier if a given search criterion is satisfied. The plan specifies the initial conditions, the arrangement of the elements of the loop construct, and the termination conditions. The plan may be specified in a graphical or pseudo-code form so that it can be mapped by the user to different programming languages. Patterns are also common in other disciplines. The authors of the book Design Patterns cite as part of their inspiration the role of patterns in the architecture of buildings and in literary forms.

A pattern can be defined as follows:

Pattern: a named generalization describing the elements and relationships of a solution for a commonly occurring design problem.

A pattern contains four essential parts: a name, a problem, a solution, and the consequences. The four parts of a pattern are illustrated by a simple client-server pattern. The name of the pattern is intended to convey briefly and succinctly the subject matter of the pattern. For this example the name client-server is appropriate. The problem portion of the pattern identifies the conditions under which the pattern is applicable (i.e., for what problem is this pattern a solution). The client-server problem is one of providing a service to possibly multiple clients in a loosely-coupled manner. The solution specifies the elements that comprise the solution, their individual responsibilities, and the manner in which they collaborate to achieve the solution. The elements of the client-server pattern could be given in pictorial form as shown in the figure below.

Client-Server Pattern

The client-server pattern identifies five elements in the pattern. The client would be given the responsibility of generating a request that is sent to the server which, in turn, performs its service and delivers a response in the form a reply. The responsibility for conveying the requests and replies between the client and server is assigned to an intermediary knows as the connection. The client and server each collaborate directly only with the connection (but only indirectly with each other). The collaboration between the elements is defined by the sequence of events beginning with the generation of a request by the client and its transmission to the server followed by the generation at the server of a reply and its transmission to the client.

Notice that the pattern does not specify the nature of the service provided, it could be a name service, a time service, a location service, a file service, a security service or any other. Also the pattern does not specify how the connection is to be implemented. The connection could be a memory buffer connecting two procedures within the same process, a memory buffer connecting two different processes on the same machine, or a network link between two processes on different machines. While these details vary, the pattern remains the same.

The pattern also specifies the positive and negative consequences of using the pattern. Some of the positive consequences are that the client and server may be implemented on different machines allowing each to take advantage of local specialized hardware or software resources, the client and server may be totally or largely unaware of and insensitive to the actual location of each other, and the server may be made available to many clients at the same time. Two negative consequences of the client-server pattern are that the client may be left hanging if its request or reply is lost or if the server crashes, and the client cannot demand or control the service from the server - it can only request such service.

A pattern for an object-oriented design is expressed in terms of classes and objects. The elements of the pattern are represented by classes. The relationships among the elements are defined by association, aggregation, and/or hierarchy. The responsibilities and collaborations are understood in terms of the behavior of each class and the interactions maong the classes as defined by their associations.

A pattern is a distinct form of generalization. A pattern is similar to genericity in that it is a partial generalization, details are surpressed or omitted. However, genericity leads to, at least partial, code while a pattern need not be expressed in code at all. A pattern may also use hierarchical relationships, but there is no aspect of a generalization/specialization relationship that is at the center of a pattern.

Last Updated: June 21, 1996 / kafura@cs.vt.edu